Is Design WhatsApp easier or harder than Design Twitter?

Different shape. Twitter is read-heavy fanout; WhatsApp is connection-heavy bidirectional. Most interviewers consider WhatsApp slightly harder because real-time delivery + e2e encryption + offline queueing all need to be discussed within 45 minutes.

Do I need to know the Double Ratchet protocol in detail?

No — naming it and describing the property (forward secrecy, key rotation per message) is enough. Drawing the actual key derivation is beyond scope for a 45-minute round.

Should I use WebSocket or QUIC for the persistent connection?

Either is correct. WebSocket is the safe default and is well understood by interviewers; QUIC is bonus signal (faster reconnect on mobile network changes, multiplexed streams). State your choice and one reason.

How do interviewers grade the Design WhatsApp round?

Top signals: persistent connection (not HTTP), explicit offline-queue design, message-ordering tradeoff, end-to-end encryption mention. Missing any single one drops a level; missing two is a no-hire signal at Meta E5+.

System Design Questions

Design WhatsApp — System Design Interview Guide

Design WhatsApp is a system-design interview that asks you to build end-to-end encrypted real-time messaging for billions of users: 1-to-1 chats, group chats up to ~1024 members, delivery within hundreds of milliseconds, offline message queueing, and last-seen presence. The hard part is connection management and message ordering.

By Sam K., Founder, InterviewChamp.AI · Last verified 2026-05-19

Reported in interviews at

Meta
Google
Snapchat
Uber

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

Send and receive 1-to-1 text messages with delivery and read receipts
Group chat with up to ~1024 members; messages fan out to all online members
Offline queueing: if recipient is offline, message is delivered when they reconnect
Presence: show 'last seen' or 'online now' for contacts
Media: attach photo, video, voice note (uploaded separately, reference passed in message)

Non-functional requirements

End-to-end latency: <500ms p99 from sender send to recipient receive on a warm connection
Availability: 99.99%; messages must never be lost (delivery acknowledged or persistently retried)
Scale: 2B users, ~100B messages/day, peak ~3M messages/second
End-to-end encryption: server stores only encrypted blobs; cannot decrypt content

Capacity estimation

Public 2024 scale: ~2B monthly users, ~2.5B daily users sending ~100B messages/day. Average message is small (~100 bytes including envelope + ciphertext for text). 100B × 100B = 10 TB/day of message bytes; with media metadata and 30-day retention it's ~300 TB hot storage (encrypted media bytes themselves live in object storage, ~5 PB/year).

Peak send rate: 100B / 86,400 sec × 3 (peak multiplier) ≈ 3.5M messages/sec. Each delivery is fan-out (1 send → N recipients in group chat), so the egress rate is higher: at avg group size 5, that's ~17M deliveries/sec peak. Concurrent open connections during peak: ~500M long-lived TCP/QUIC sockets — this drives the connection-server fleet size more than message bytes do. A typical chat-server box handles ~1M concurrent sockets, so ~500 servers handle peak connections; add 3× for redundancy and geographic distribution.

High-level design

Clients hold a persistent long-lived connection (typically WebSocket or QUIC) to the nearest chat gateway. The connection servers are the heart of the system — stateful from a routing perspective, holding a map of user_id → which server hosts that user's open socket. This map is replicated through an in-memory routing tier so any other connection server can look up where to send a message bound for any online user.

The write path: sender encrypts the message with the recipient's public key (e2e — server can't read content), publishes to its connection server, which writes the ciphertext envelope to a durable persistent store and looks up the recipient. If the recipient is online on another connection server, the message is forwarded directly via an internal RPC and delivered; the persistent store is the audit/retry log. If offline, the message stays in the recipient's per-user inbox queue until their next connect, where the connection server drains the inbox to them in order. Read receipts and 'delivered' acks travel back through the same path as small control messages.

Group chat is fanout in the connection tier: the group_id maps to a member list, and the sending server forwards the same ciphertext envelope (re-encrypted per recipient, since each member has a distinct key) to each member's hosting connection server. Presence is stored in a TTL-based in-memory store (last-seen timestamp refreshed on every heartbeat). Media is uploaded to object storage by the client directly before the message send; the message envelope carries only the media reference and decryption key. End-to-end encryption uses the standard Double Ratchet protocol — discussing key exchange and forward secrecy is bonus signal.

Deep dive — the hard problem

The deep dive is connection management at billions-of-sockets scale and message ordering under network reordering. Connection problem: each connection server holds millions of long-lived TCP/QUIC sockets. When a user roams or reconnects, they may land on a different server (load balancer + DNS); the routing tier must be updated atomically so the next message bound for that user reaches the new server, not the dead one. The standard pattern: connection servers register their (user_id → server_id) mapping in a fast in-memory routing store with a TTL refreshed by heartbeats. On disconnect, the mapping expires; messages arriving in the gap are written to the per-user persistent inbox and delivered on reconnect. Choosing the TTL is a tradeoff — too short and re-registration storms during network flaps; too long and zombie mappings forward to dead sockets.

Ordering problem: messages in a group chat sent from different senders can arrive at the server in different orders than they were sent (network reordering, sender clock skew). Solutions cluster around two choices. Server-assigned monotonic sequence per chat: the connection server gives each new message in a chat a strictly increasing seq number drawn from a per-chat counter; clients render by seq. Tradeoff — requires a single ordering authority per chat (a shard or hash-routed leader). Hybrid logical clock (HLC): each message carries a (wall-clock, logical-counter) pair; clients merge-sort. Tradeoff — slightly worse intuitive ordering when wall clocks skew, but no per-chat leader. Most real systems pick server-assigned seq for small/medium groups and accept the leader-per-chat constraint.

Third hard problem: idempotent delivery. Network retries can cause a message to be delivered twice; clients dedupe by a sender-generated UUID carried in the envelope. Acks are also idempotent — re-sending a delivered-ack is a no-op.

Common mistakes

Designing a request/response HTTP API for messaging — kills latency and presence; persistent connection is mandatory
Ignoring offline-recipient queueing; the question explicitly assumes recipients can be offline
Treating group fanout as a database broadcast (write 1024 rows) instead of as connection-tier forwarding
Missing end-to-end encryption — interviewer expects you to say 'server only stores ciphertext' and gesture at Double Ratchet or X3DH
Sizing the system from message bytes alone and missing that concurrent open connections drive infrastructure cost

Likely follow-up questions

How would your design handle a user with 10 devices logged in simultaneously?
What changes if you have to support a 1M-member broadcast channel (one sender, 1M passive listeners)?
How would you guarantee a message order across two senders in the same group?
How would you handle a user whose phone has been offline for 30 days?
How would you build last-seen presence without leaking 'reading this chat right now' state?

Related system design scenarios

Frequently asked questions

Is Design WhatsApp easier or harder than Design Twitter?: Different shape. Twitter is read-heavy fanout; WhatsApp is connection-heavy bidirectional. Most interviewers consider WhatsApp slightly harder because real-time delivery + e2e encryption + offline queueing all need to be discussed within 45 minutes.
Do I need to know the Double Ratchet protocol in detail?: No — naming it and describing the property (forward secrecy, key rotation per message) is enough. Drawing the actual key derivation is beyond scope for a 45-minute round.
Should I use WebSocket or QUIC for the persistent connection?: Either is correct. WebSocket is the safe default and is well understood by interviewers; QUIC is bonus signal (faster reconnect on mobile network changes, multiplexed streams). State your choice and one reason.
How do interviewers grade the Design WhatsApp round?: Top signals: persistent connection (not HTTP), explicit offline-queue design, message-ordering tradeoff, end-to-end encryption mention. Missing any single one drops a level; missing two is a no-hire signal at Meta E5+.