What's the difference between this and Design Distributed Log?

A log preserves events for multiple independent consumer groups to replay from any offset; a queue removes a message after one consumer acks it. Log designs focus on retention and replay; queue designs focus on visibility timeouts, DLQ routing, and delivery semantics.

Is exactly-once delivery actually possible?

Exactly-once at the application level requires either idempotent processing or a queue feature that combines producer-side deduplication with FIFO consumer delivery. The deduplication window is typically minutes, not unbounded. Pure exactly-once across infinite time is not a tractable distributed-systems problem.

How long is the Design Distributed Message Queue interview?

45-60 minutes. Expect a deep drill on the visibility-timeout contract and at least one of (DLQ routing, exactly-once, FIFO ordering). 'At-least-once with idempotent consumers' is the default answer to every delivery-guarantee question.

What is the most important concept for Design Distributed Message Queue?

The visibility-timeout contract. Every senior interview asks the candidate to walk through what happens when a consumer crashes, takes too long, or successfully processes but loses the ack network call. The answer is the visibility-timeout state machine.

System Design Questions

Design Distributed Message Queue — System Design Interview Guide

Design Distributed Message Queue is a system-design interview that asks you to build a managed queue service: producers publish messages, consumers pull or receive pushed messages, and the system handles visibility timeouts, retries, dead-letter routing, and configurable delivery guarantees (at-least-once or exactly-once). The hard part is the visibility-timeout contract — once a consumer takes a message, no other consumer sees it until the first one acks or times out.

By Sam K., Founder, InterviewChamp.AI · Last verified 2026-05-25

Reported in interviews at

Amazon
Google
Microsoft
Meta
Apple

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

Send a message to a named queue; the message is persisted and made available for delivery
Receive a batch of messages from a queue; each message is marked invisible for a configurable visibility timeout
Acknowledge (delete) a message after successful processing; un-acked messages reappear after the timeout
Dead-letter routing: messages that exceed N receive attempts are moved to a separate DLQ for inspection
Configurable per-queue retention (1 minute to 14 days) and per-message delay (delayed delivery)
Per-queue FIFO option: strict in-order delivery within a message group; default is best-effort order

Non-functional requirements

Throughput: ~100K messages/sec per queue for standard queues, ~3K msg/sec for FIFO queues
Latency: <50ms p99 send-to-deliver; <100ms p99 for receive batch of 10 messages
Durability: every accepted message persists to disk on multiple nodes before ack; no data loss on single-node failure
Availability: 99.99% for send and receive operations; the queue must never be unavailable for both sides at once
At-least-once delivery as default; exactly-once is an opt-in FIFO-queue feature
Scale: millions of queues per region, billions of messages per day

Capacity estimation

Anchor scale for a managed queue platform: ~10M active queues across a region, ~1M messages/sec aggregate send rate at peak, ~10x that on receive (because receives include long-polls that return empty).

Message size cap typically 256 KB. Average message size much smaller — most messages are small JSON payloads of 1-5 KB. Storage write rate: 1M msg/sec × 2 KB average × 3 replicas = 6 GB/sec across the cluster. Per-broker write rate at 100 brokers: 60 MB/sec — comfortable.

Storage per day: 1M × 2 KB × 86400 = ~170 TB/day raw, ~500 TB/day with 3x replication. Default retention 4 days = ~2 PB. Most queues retain for hours not days, so steady-state storage is much smaller (~500 TB realistic).

Message metadata per message: ~200 bytes (message_id, queue_id, receive_count, visibility_deadline, group_id for FIFO, body_pointer). At 1M msg/sec × 200 bytes = 200 MB/sec metadata write, indexed by queue_id and visibility_deadline.

Receive QPS: ~10M/sec because of long-polling clients. Most polls return empty (queue idle). Long-poll mechanic: client sends Receive with wait=20s; broker either returns immediately if messages are available or blocks up to 20s. Idle polls consume a TCP connection but nearly zero CPU.

Visibility timeout state: each in-flight message has a visibility deadline. At any moment ~10M messages may be in-flight (the count is bounded by visibility_timeout × send_rate). Visibility state is hot — every receive sets a deadline, every ack clears it, expired deadlines requeue the message. Indexed by deadline timestamp for efficient sweep.

High-level design

Architecture: a frontend API layer, a sharded message store, and a visibility-timeout manager. Plus a control plane for queue creation, configuration, and authorization.

Frontend API: stateless HTTP/gRPC servers that authenticate requests, route to the right shard for the queue, and return results. Each queue is sharded across N storage nodes by queue_id hash. Within a queue, messages are distributed across the shard's replicas; receive operations fetch from any healthy replica.

Message store: each shard is a triplet of replicas using leader-follower replication. Sends go to the leader; the leader writes to local disk, replicates to followers with ack=all (for the durability contract), then acks the producer. Messages are stored in a write-optimized log per queue — append on send, range scan on receive.

Receive flow: the consumer sends Receive(queue, max=10, visibility=30s, wait=20s). The frontend picks a leader for the queue's shard. The leader scans the queue's log for messages not currently invisible, returns up to max messages, and for each one writes a visibility-deadline entry (deadline = now + 30s, receive_count++). If no messages are available, the leader waits up to 20s for new sends before returning empty.

Visibility-timeout manager: an internal sweeper periodically scans the deadline index for expired entries. An expired entry means the consumer didn't ack in time — the message is requeued (visibility cleared), available to the next Receive. Receive_count is incremented; if it crosses the queue's max-receive threshold, the message is moved to the dead-letter queue instead of being requeued.

DLQ routing is a queue-to-queue move: the message is written to the configured DLQ name and removed from the source queue. DLQs are normal queues that an operator inspects manually.

Delayed delivery: when a send specifies delay=N, the message is written with a not-visible-until timestamp; the sweeper makes it visible when the time arrives. The same mechanism that handles visibility timeout handles delayed delivery.

FIFO queues: a separate mode with strict in-order delivery within a message-group-id. Sends with the same group_id are serialized at the leader; one in-flight message per group at a time. Different groups can be processed in parallel by different consumers. FIFO throughput is much lower than standard because of the serialization.

Deep dive — the hard problem

Three deep dives: the visibility-timeout contract, exactly-once delivery, and hot-queue throughput.

Visibility-timeout contract — this is the heart of the design and the most-asked deep-dive topic.

When a consumer receives a message, the broker marks it invisible for the configured timeout (e.g. 30 seconds). If the consumer acks within the timeout, the message is deleted. If the consumer crashes or takes too long, the timeout expires and the message becomes visible again to the next Receive — usually delivered to a different consumer.

This gives at-least-once delivery: every message is delivered at least once, but a slow consumer can cause duplicate delivery (the original consumer finishes processing after the timeout expired, but the broker has already requeued and another consumer also processed the message). Consumers must be idempotent.

Two edge cases. (1) The consumer processed the message but the ack network call was lost. The broker requeues, another consumer processes again. Idempotency keys at the application layer handle this. (2) The consumer is alive but slow. The visibility timeout expires; the message is delivered to a second consumer; both now process in parallel. Solution: consumers can extend visibility (heartbeat) for long-running jobs.

The visibility timeout is a knob with real tradeoffs. Too short: duplicate delivery is common. Too long: a crashed consumer ties up the message for the whole timeout before another consumer gets a chance. Production default is 30s with consumer-side extension for known-long jobs.

Exactly-once delivery — only available for FIFO queues, only within a configured deduplication window (typically 5 minutes).

Producers attach a deduplication_id to each send. The broker indexes recent sends by deduplication_id; if the same id is sent again within the window, the second send is silently dropped (the producer still gets a success ack, but no second message is delivered). This handles the common case of a producer retrying after a network error.

Within a message-group-id, FIFO delivery means one message in-flight at a time. The consumer must ack the current message before the next is delivered. Combined with deduplication on send, this gives exactly-once: the producer can retry without creating duplicates, and the consumer sees each unique message exactly once in order.

The cost is throughput. A single message-group is bounded to ~300 messages/sec because of the in-flight=1 constraint and the round-trip latency. Applications use many message-groups in parallel to scale.

Hot queue throughput — what if a single queue receives 1M msg/sec?

The shard becomes a bottleneck. Production systems detect a hot queue and resplit it: the queue's underlying partitions are increased from N to 2N, and new sends hash to the larger ring. Reads continue to drain the old partitions until they're empty, then those partitions retire. The frontend API hides the resplit from clients — the queue name doesn't change.

FIFO queues can't resplit as easily because the group-id ordering must be preserved. One message-group is bounded to one partition; resplitting means re-assigning groups, which can break order for in-flight messages. The standard advice is to choose group-ids with enough cardinality up front so no single group ever becomes hot.

Fourth tradeoff: long-polling vs short-polling. Short-polling (Receive returns immediately, empty or not) is wasteful at idle queues — millions of empty responses per minute. Long-polling (Receive blocks up to 20s waiting for messages) reduces empty responses by 100x and lowers receive latency for idle queues. Default is long-polling; some legacy clients still short-poll.

Common mistakes

Treating receive as a destructive read — receive must mark invisible, not delete; delete only happens on ack
Forgetting the visibility-timeout knob — without it, a crashed consumer permanently loses messages
Designing exactly-once without a dedup window — exactly-once across all time would require permanent storage of every dedup key
Ignoring DLQ routing — production queues need a place for poison messages so they don't loop forever
Using FIFO mode without acknowledging the throughput cost — FIFO is 100x slower than standard per group_id

Likely follow-up questions

How would you support priority queues where high-priority messages are delivered before low-priority ones?
What changes if a single message can be up to 1 GB instead of 256 KB?
How would you implement a 'message visibility extension' API that lets a long-running consumer keep extending the timeout?
How would you support fan-out (one message published to one queue is delivered to N independent subscriber queues)?
How would your design handle a producer that sends 100K duplicate messages/sec by accident — what protects the consumer?
How would you implement a 'replay last 1 hour of messages' feature on top of a queue that normally deletes on ack?

Related system design scenarios

Frequently asked questions

What's the difference between this and Design Distributed Log?: A log preserves events for multiple independent consumer groups to replay from any offset; a queue removes a message after one consumer acks it. Log designs focus on retention and replay; queue designs focus on visibility timeouts, DLQ routing, and delivery semantics.
Is exactly-once delivery actually possible?: Exactly-once at the application level requires either idempotent processing or a queue feature that combines producer-side deduplication with FIFO consumer delivery. The deduplication window is typically minutes, not unbounded. Pure exactly-once across infinite time is not a tractable distributed-systems problem.
How long is the Design Distributed Message Queue interview?: 45-60 minutes. Expect a deep drill on the visibility-timeout contract and at least one of (DLQ routing, exactly-once, FIFO ordering). 'At-least-once with idempotent consumers' is the default answer to every delivery-guarantee question.
What is the most important concept for Design Distributed Message Queue?: The visibility-timeout contract. Every senior interview asks the candidate to walk through what happens when a consumer crashes, takes too long, or successfully processes but loses the ack network call. The answer is the visibility-timeout state machine.