Design WhatsApp's real-time chat system for 2B users: WebSocket routing, offline message queuing, and group fan-out at 100B messages per day.
What is WhatsApp?
WhatsApp is an end-to-end encrypted messaging app where users exchange text messages in real time, in 1-to-1 conversations and groups of up to 1,024 people. At 2 billion registered users sending 100 billion messages per day, the interesting engineering challenge is not storing the messages. It is routing each one across a fleet of thousands of stateful servers to find the right WebSocket connection, guaranteeing delivery to devices that may be offline for days, and fanning out a single group message to 1,024 members without the sender waiting for all of them.
This makes it a rich interview question because it tests real-time connection management, distributed routing without a lookup table, delivery guarantees under failure, and large-scale fan-out. I'd open any WhatsApp interview by framing it as a routing problem, not a storage problem, because that is where the interesting constraints live.
Functional Requirements
Core Requirements
- Users can send and receive 1-to-1 text messages in real time.
- Users can see delivery receipts: sent (single tick), delivered to device (double tick), and read (blue ticks).
- Users who are offline receive messages when they reconnect.
- Users can participate in group chats with up to 1,024 members.
Below the Line (out of scope)
- Media file sharing (images, video, voice notes, documents)
- End-to-end encryption key management (Signal Protocol)
- Voice and video calls
- Status and Stories (ephemeral broadcast)
The hardest part in scope: Routing messages across a fleet of thousands of stateful Chat Servers. Each server holds WebSocket connections for a fraction of active users. When a sender is on Server A and their recipient is on Server B, the message must cross servers in under 200ms without any server scanning the full fleet or maintaining a global connection table.
Media file sharing is below the line because it uses the same routing path but introduces a separate binary storage and CDN delivery tier. To add it, I would use a pre-signed S3 URL pattern: the client uploads the raw file directly to S3, then sends a message containing the S3 object key. The receiving client fetches the media from a CDN edge node, not from the Chat Server.
End-to-end encryption is below the line because it is a property of the message payload, not the routing layer. WhatsApp uses the Signal Protocol: keys live entirely on the devices and the server stores and forwards ciphertext it cannot read. Adding E2EE does not change the routing, storage, or delivery architecture described in this article.
Voice and video calls are below the line because they require a separate WebRTC signaling layer and media relay infrastructure. The signaling messages flow through the existing WebSocket path, but media streams bypass routing entirely via TURN/STUN servers.
Status and Stories are below the line because they introduce a separate ephemeral broadcast model with a 24-hour TTL. To add them, I would store story metadata in Redis with a 24-hour TTL and fan out to followers using the same Kafka-based fan-out infrastructure described in the group messaging deep dive.
Non-Functional Requirements
Core Requirements
- Availability: 99.99% uptime. Availability over consistency: a message delivered slightly out of order is acceptable; a dropped connection or lost message is not.
- Latency: End-to-end message delivery under 200ms p99 when both users are online. Offline queue delivery within 5 seconds of reconnection.
- Delivery guarantee: At-least-once delivery to device. The client deduplicates using a client-assigned
client_msg_idto prevent displaying the same message twice. - Scale: 2B registered users, 500M DAU. 100B messages per day at baseline (~1.16M messages per second, peaking at ~3.5M per second).
- Group fan-out: Groups up to 1,024 members. At average group participation of 30 members per message, fan-out generates roughly 35M per-member delivery events per second across the fleet.
Below the Line
- Sub-50ms delivery for same-region users via edge routing
- Multi-device message sync and session ordering
Read/write ratio: Roughly 1:1 at the message level (every send creates one receive). Group fan-out is the amplifier: a single message to a 1,024-member group triggers 1,024 delivery writes. That fan-out multiplier, not raw message throughput, determines the size of the async pipeline.
I target 200ms p99 for online delivery. A WebSocket round-trip to a nearby data center takes 20-50ms. That leaves roughly 150ms for storage writes, routing, and delivery across the server fleet.
Synchronous fan-out for large groups does not fit this budget. Any delivery path that fans out beyond a single routing hop must go through an async message queue.
Core Entities
- Message: The core unit. Carries a server-assigned
message_id, client-assignedclient_msg_id(for deduplication),chat_id,sender_id,content,created_at, and a mutablestatusfield (sent,delivered,read). - User: Account with a
user_id, a device push token for offline notification delivery, and a presence timestamp for last-seen. - Chat: A 1-to-1 or group conversation identified by a
chat_id. For 1-to-1 chats,chat_idis derived deterministically from the two participant IDs. For groups,chat_idmaps to Group metadata. - Group: Group metadata including
group_id,name, and a member list. The member list is the input to every fan-out operation in the system.
Schema details, indexes, and Cassandra partition key design are deferred to deep dive 4. The four entities above are sufficient to drive the API design and High-Level Design.
API Design
WhatsApp uses a persistent WebSocket connection for real-time delivery. REST endpoints handle connection setup, offline message fetching, and group management.
Establish a WebSocket connection:
GET /ws
Upgrade: websocket
Headers: Authorization: Bearer {token}
Send a message (WebSocket frame):
// Client → Server
{ type: "send_message", chat_id, content, client_msg_id }
// Server → Sender (acknowledgment)
{ type: "ack_sent", message_id, client_msg_id }
// Server → Recipient Client
{ type: "new_message", message_id, chat_id, sender_id, content, created_at }
Acknowledge delivery and read (WebSocket frames):
// Recipient Client → Server (auto-sent on message receipt)
{ type: "ack_delivered", message_id }
// Client → Server (sent when user views the conversation)
{ type: "ack_read", chat_id, last_read_message_id }
// Server → Original Sender (status propagation)
{ type: "status_update", message_id, status: "delivered" | "read" }
Fetch offline messages (REST, on reconnect):
GET /chats/{chat_id}/messages?after={cursor}&limit=50
Response: { messages: [...], next_cursor }
Create a group:
POST /groups
Body: { name, member_ids }
Response: { group_id, chat_id }
WebSocket over HTTP polling: Long polling holds one request thread per client while it waits for a message. WebSocket maintains a persistent TCP connection at a fraction of the resource cost. At 500M DAU with many users actively chatting, HTTP polling is not viable. WebSocket is the only practical choice for bidirectional real-time delivery at this scale.
client_msg_id: Each client generates a unique
client_msg_idbefore sending. The server stores it alongside the server-assignedmessage_id. If the client sends a message, loses theack_sent, and retries the send, the server recognises the duplicateclient_msg_idand returns the existingmessage_idrather than inserting a second row.
High-Level Design
1. Users can send and receive 1-to-1 messages in real time
The write and deliver path: the client sends a message, the server stores it in Cassandra, and routes it via Redis Pub/Sub to whichever Chat Server holds the recipient's WebSocket.
The core architecture is a fleet of stateful Chat Servers. Each server maintains WebSocket connections for a slice of online users. Routing between servers happens through Redis Pub/Sub: each Chat Server subscribes to channels keyed by the user_ids of its connected clients.
Components:
- Client: Mobile or web app holding a persistent WebSocket connection to a Chat Server.
- Chat Server: Stateful service managing WebSocket connections. Writes messages to the Message DB, then routes delivery via Redis Pub/Sub.
- Redis Pub/Sub: Each Chat Server subscribes to
user:{user_id}for every connected client. Publishing touser:{recipient_id}delivers to whichever server holds that connection, without any lookup table. - Message DB (Cassandra): Durable message store partitioned by
chat_id. Source of truth for all message content and status.
Request walkthrough:
- Sender sends
{ type: "send_message", chat_id, content, client_msg_id }over WebSocket. - Chat Server A generates a Snowflake
message_idand inserts{ message_id, chat_id, sender_id, content, status: "sent" }into Cassandra. - Chat Server A publishes
{ message_id, chat_id, content }to Redis channeluser:{recipient_id}. - Chat Server B (subscribed to
user:{recipient_id}) receives the event and pushesnew_messageto the recipient via WebSocket. - Chat Server A returns
ack_sentto the sender.
Chat Server A returns ack_sent to the sender once Cassandra commits and Redis acknowledges the publish. The Redis Pub/Sub publish returns the subscriber count; zero means the recipient is offline. The offline delivery path fires on that signal, covered in the next requirement.
I always write to Cassandra before publishing to Redis Pub/Sub. I learned this ordering the hard way: reversing it means a delivered message might never be persisted, and there is nothing to retry from. If those two operations were reversed, a message could be delivered to the recipient but never persisted, leaving nothing to retry from on reconnect.
2. Users receive messages when offline
The offline path: when Redis Pub/Sub returns zero subscribers, the message_id is queued in Redis, a push notification wakes the device, and the full message is delivered on reconnect.
When the recipient is not connected to any Chat Server, the Redis channel has no subscriber. The message must persist durably until the recipient reconnects, then be delivered reliably without polling.
Components:
- Chat Server (updated): After a Pub/Sub publish returns 0 subscribers, writes the
message_idto an Offline Queue and triggers a push notification. - Offline Queue (Redis List): A per-user FIFO list keyed
offline:{user_id}. Storesmessage_ids of undelivered messages. Bounded at 1,000 entries with a 30-day TTL. - Push Notification Service: Sends an APNs or FCM push to wake the device and trigger a reconnect. Fires only after the offline queue write, never before.
Request walkthrough (offline path):
- Chat Server A publishes to
user:{recipient_id}. Redis returns 0 subscribers. - Chat Server A appends
message_idtooffline:{recipient_id}withLPUSHand a 30-day TTL. - Chat Server A calls Push Notification Service with the sender name and a message preview.
- Push Notification Service sends an APNs or FCM notification to the recipient's device.
- Recipient reconnects. Chat Server B calls
LRANGE offline:{recipient_id} 0 -1. - Chat Server B batch-fetches full message content for the returned IDs from Cassandra.
- Chat Server B delivers the messages in order and sends
ack_deliveredback upstream per message.
The diagram above shows the offline storage path only. On reconnect, the recipient's Chat Server runs LRANGE offline:{user_id} to retrieve pending message IDs, batch-fetches full content from Cassandra, delivers them in order, and clears the queue. The reconnect delivery path reuses the same components (Chat Server B, OQ, Cassandra) left to right.
The offline queue stores message_ids only, not full content. All content lives in Cassandra. This keeps the Redis queue compact and avoids duplicating message payloads in an in-memory store.
I set the offline queue TTL to 30 days rather than shorter windows as a deliberate tradeoff: shorter TTLs lose messages during device replacement, and longer ones grow Redis memory unboundedly for dormant accounts. Thirty days is the practical cutoff.
There is a race condition: the recipient could connect to a new Chat Server between the Pub/Sub publish and the offline queue write. To handle this, Chat Server B always checks the offline queue on every reconnect regardless of whether a push was sent, and the client deduplicates by message_id to avoid showing a message that arrived via both paths.
3. Delivery receipts and read status
The receipt path: acknowledgment frames from the recipient flow through Redis Pub/Sub in reverse, updating the message status in Cassandra and notifying the original sender's Chat Server.
Delivery receipts travel the reverse path: from the recipient back to the original sender. The recipient's Chat Server updates the message status in Cassandra and notifies the sender's Chat Server via Redis Pub/Sub.
Components:
- Chat Server B (recipient's server): Receives
ack_deliveredandack_readframes. Updates thestatusfield in Cassandra. Publishes a status event touser:{sender_id}. - Cassandra (updated): The
statuscolumn on each message row is updated in place on delivery and read acknowledgment. - Redis Pub/Sub (updated): Status update events use the same
user:{sender_id}channel, so the sender's Chat Server receives the tick update with the same sub-millisecond routing mechanism.
Request walkthrough (delivery receipt):
- Recipient client receives a
new_messageframe and auto-sends{ type: "ack_delivered", message_id }. - Chat Server B executes
UPDATE messages SET status = 'delivered' WHERE message_id = ?in Cassandra. - Chat Server B publishes
{ type: "status_update", message_id, status: "delivered" }touser:{sender_id}. - Chat Server A receives the event and pushes it to the sender's WebSocket. The sender sees a double tick.
Request walkthrough (read receipt):
- User views the conversation. Client sends
{ type: "ack_read", chat_id, last_read_message_id }. - Chat Server A updates all unread messages in the chat up to
last_read_message_idtostatus = 'read'in a single batch update. - Chat Server A publishes one
status_updateevent per unique sender whose messages are now read.
I deliberately reuse the same Redis Pub/Sub mechanism for status updates rather than adding a dedicated status channel. A separate channel would double the Redis subscription count and add routing complexity for no architectural benefit.
Read receipts use a cursor, not per-message acks. A single ack_read with last_read_message_id marks every earlier unread message as read in one batch update. This reduces the number of status write operations from N (one per unread message) to 1 per conversation-open event.
4. Users can participate in group chats
The group path: the sender gets an immediate ack_sent after writing to Cassandra and publishing to Kafka; fan-out to all members happens asynchronously via Fan-out Workers.
Group messaging multiplies the routing problem by up to 1,024. A synchronous loop over all members on the Chat Server would make the sender's ACK latency proportional to group size. Fan-out must be asynchronous, decoupled from the send path entirely.
Components:
- Chat Server (updated): Writes the message to Cassandra and publishes a
GroupMessageEventto Kafka. Returnsack_sentto the sender immediately. Does not fan out directly. - Kafka: Durable
group-messagestopic. Each event carriesgroup_id,message_id, andsender_id. Partitioned bygroup_idto keep delivery order consistent per group. - Fan-out Workers: Consumer group reading from Kafka. Each worker fetches the group member list from Redis, then publishes to each member's
user:{member_id}channel or enqueues to their offline queue. - Group Member Cache (Redis): Set keyed
group:{group_id}:memberswith a count atgroup:{group_id}:member_count. Updated on every join and leave event.
Request walkthrough:
- Sender sends
{ type: "send_message", chat_id: group_chat_id, content, client_msg_id }. - Chat Server inserts the message into Cassandra and publishes
GroupMessageEvent { group_id, message_id, sender_id }to Kafka. - Chat Server returns
ack_sentto the sender. The sender's experience is already complete. - Fan-out Worker consumes the event from Kafka.
- Fan-out Worker calls
SMEMBERS group:{group_id}:membersto retrieve all member IDs. - For each member (excluding sender): publish to
user:{member_id}if online, orLPUSH offline:{member_id}if not.
The sender gets ack_sent the moment Cassandra commits and Kafka acknowledges the publish. Chat Server returns it immediately before the Kafka event is even consumed. Delivery to the other members is async. The naive synchronous alternative and its failure modes are the subject of deep dive 3.
I pick the group fan-out threshold empirically. Starting at 100 is conservative; most teams raise it to 200 or 300 once they profile Chat Server thread latency under real load.
The Fan-out Worker illustrated here processes each member serially. For a 1,024-member group, that is 1,024 sequential Redis publishes. Deep dive 3 covers the hybrid strategy that parallelises large-group fan-out to stay under the 200ms delivery budget for most members.
Potential Deep Dives
1. How do we route messages across thousands of Chat Servers?
Three constraints define the routing problem:
- Any sender can be on any Chat Server in the fleet.
- Any recipient can be on any Chat Server, or offline.
- The routing decision must complete in under 50ms to fit within the 200ms end-to-end delivery budget.
2. How do we guarantee message ordering and exactly-once display?
Two constraints make this non-trivial:
- Client clocks can be skewed by several seconds. Clock-based ordering breaks across devices.
- At-least-once delivery (required for reliability) means duplicates are possible. They must be deduplicated before display.
3. How do we scale group message fan-out?
At 1,024 members per group and 1.16M messages per second, the fan-out layer must process roughly 35M per-member delivery events per second. The naive loop fails in two ways: it blocks the send path and it creates sender ACK latency that scales linearly with group size.
4. How do we store 100 billion messages per day?
Three access patterns define the storage requirements:
- Recent history: Fetch the last 50 messages in a chat on conversation open, newest first.
- Message lookup: Retrieve a specific message by
message_idfor status updates and retries. - Offline delivery: Bulk-fetch all messages delivered while the user was offline, by
chat_idafter a cursor.
All three must complete under 50ms at 1.16M writes and several million reads per second.
Final Architecture
Redis Pub/Sub is the architectural keystone: it provides implicit message routing across the Chat Server fleet, signals offline status with a single return value, and adds under 1ms to end-to-end delivery. Cassandra distributes 1.16M writes per second across hundreds of nodes without a write hotspot by co-locating each conversation's messages in one partition. The hybrid fan-out strategy keeps small-group messages on the fast 1ms path while offloading large groups to the async Kafka pipeline, so the sender's ACK latency stays bounded regardless of group size.
Interview Cheat Sheet
- Open by separating the 1-to-1 routing problem from the group fan-out problem. They have different bottlenecks and different failure modes.
- Redis Pub/Sub is the routing layer. No routing table lookup needed. The server holding the recipient's connection self-identifies by holding the subscription to
user:{recipient_id}. - A
redis.publishreturning 0 subscribers is the offline signal. The Chat Server writes to the offline queue on that exact return value. - Redis Pub/Sub is at-most-once. Cassandra is the durable source of truth. The Pub/Sub is the fast path; the offline queue is the safety net. Never rely on Redis alone for durability.
- Snowflake IDs embed a millisecond timestamp, so
ORDER BY message_idequalsORDER BY created_at. No separate timestamp column needed for ordering. - Clients generate a
client_msg_idbefore sending. The server stores it and deduplicates on it. This gives exactly-once display with at-least-once delivery, without a distributed lock or transaction. - Cassandra partitions by
chat_id. All messages in one conversation are co-located. A recent-history read is a single-partition range scan returning 50 rows in one I/O pass. - Use a composite partition key
(chat_id, bucket)where bucket is the creation month to prevent a single chat from growing an unbounded partition over years. - Small groups (under 100 members): inline synchronous fan-out from the Chat Server. 100 publishes at 0.1ms each totals 10ms, well within the 200ms budget.
- Large groups: publish a
GroupMessageEventto Kafka and returnack_sentimmediately. The sender's ACK latency does not scale with group size. - Fan-out Workers deliver in parallel batches of 50 Redis publishes. A 1,024-member group requires ~21 batches at 0.1ms each, totalling roughly 2ms of parallel fan-out time.
- The offline queue stores message IDs only, not content. Full content lives in Cassandra. On reconnect, the server dequeues IDs and batch-fetches content in one round-trip.
- Delivery receipts travel the same Redis Pub/Sub path in reverse: recipient's Chat Server publishes to
user:{sender_id}. The sender's server delivers the tick update without polling or a separate notification channel. - Name the tradeoff on group consistency explicitly: a user who joins a group and immediately receives a message may see it arrive before the fan-out worker has updated the group member cache. Accept eventual consistency on group membership; the worst case is a missed delivery on the first message after join.