Design WhatsApp's real-time chat system for 2B users: WebSocket routing, offline message queuing, and group fan-out at 100B messages per day.
What is WhatsApp?
WhatsApp is an end-to-end encrypted messaging app where users exchange text messages in real time, in 1-to-1 conversations and groups of up to 1,024 people. At 2 billion registered users sending 100 billion messages per day, the interesting engineering challenge is not storing the messages. It is routing each one across a fleet of thousands of stateful servers to find the right WebSocket connection, guaranteeing delivery to devices that may be offline for days, and fanning out a single group message to 1,024 members without the sender waiting for all of them.
This makes it a rich interview question because it tests real-time connection management, distributed routing without a lookup table, delivery guarantees under failure, and large-scale fan-out. I'd open any WhatsApp interview by framing it as a routing problem, not a storage problem, because that is where the interesting constraints live.
Functional Requirements
Core Requirements
- Users can send and receive 1-to-1 text messages in real time.
- Users can see delivery receipts: sent (single tick), delivered to device (double tick), and read (blue ticks).
- Users who are offline receive messages when they reconnect.
- Users can participate in group chats with up to 1,024 members.
Below the Line (out of scope)
- Media file sharing (images, video, voice notes, documents)
- End-to-end encryption key management (Signal Protocol)
- Voice and video calls
- Status and Stories (ephemeral broadcast)
The hardest part in scope: Routing messages across a fleet of thousands of stateful Chat Servers. Each server holds WebSocket connections for a fraction of active users. When a sender is on Server A and their recipient is on Server B, the message must cross servers in under 200ms without any server scanning the full fleet or maintaining a global connection table.
Media file sharing is below the line because it uses the same routing path but introduces a separate binary storage and CDN delivery tier. To add it, I would use a pre-signed S3 URL pattern: the client uploads the raw file directly to S3, then sends a message containing the S3 object key. The receiving client fetches the media from a CDN edge node, not from the Chat Server.
End-to-end encryption is below the line because it is a property of the message payload, not the routing layer. WhatsApp uses the Signal Protocol: keys live entirely on the devices and the server stores and forwards ciphertext it cannot read. Adding E2EE does not change the routing, storage, or delivery architecture described in this article.
Voice and video calls are below the line because they require a separate WebRTC signaling layer and media relay infrastructure. The signaling messages flow through the existing WebSocket path, but media streams bypass routing entirely via TURN/STUN servers.
Status and Stories are below the line because they introduce a separate ephemeral broadcast model with a 24-hour TTL. To add them, I would store story metadata in Redis with a 24-hour TTL and fan out to followers using the same Kafka-based fan-out infrastructure described in the group messaging deep dive.
Non-Functional Requirements
Core Requirements
- Availability: 99.99% uptime. Availability over consistency: a message delivered slightly out of order is acceptable; a dropped connection or lost message is not.
- Latency: End-to-end message delivery under 200ms p99 when both users are online. Offline queue delivery within 5 seconds of reconnection.
- Delivery guarantee: At-least-once delivery to device. The client deduplicates using a client-assigned
client_msg_idto prevent displaying the same message twice. - Scale: 2B registered users, 500M DAU. 100B messages per day at baseline (~1.16M messages per second, peaking at ~3.5M per second).
- Group fan-out: Groups up to 1,024 members. At average group participation of 30 members per message, fan-out generates roughly 35M per-member delivery events per second across the fleet.
Below the Line
- Sub-50ms delivery for same-region users via edge routing
- Multi-device message sync and session ordering
Read/write ratio: Roughly 1:1 at the message level (every send creates one receive). Group fan-out is the amplifier: a single message to a 1,024-member group triggers 1,024 delivery writes. That fan-out multiplier, not raw message throughput, determines the size of the async pipeline.
I target 200ms p99 for online delivery. A WebSocket round-trip to a nearby data center takes 20-50ms. That leaves roughly 150ms for storage writes, routing, and delivery across the server fleet.
Synchronous fan-out for large groups does not fit this budget. Any delivery path that fans out beyond a single routing hop must go through an async message queue.
Core Entities
- Message: The core unit. Carries a server-assigned
message_id, client-assignedclient_msg_id(for deduplication),chat_id,sender_id,content,created_at, and a mutablestatusfield (sent,delivered,read). - User: Account with a
user_id, a device push token for offline notification delivery, and a presence timestamp for last-seen. - Chat: A 1-to-1 or group conversation identified by a
chat_id. For 1-to-1 chats,chat_idis derived deterministically from the two participant IDs. For groups,chat_idmaps to Group metadata. - Group: Group metadata including
group_id,name, and a member list. The member list is the input to every fan-out operation in the system.
Schema details, indexes, and Cassandra partition key design are deferred to deep dive 4. The four entities above are sufficient to drive the API design and High-Level Design.
API Design
WhatsApp uses a persistent WebSocket connection for real-time delivery. REST endpoints handle connection setup, offline message fetching, and group management.
Establish a WebSocket connection:
GET /ws
Upgrade: websocket
Headers: Authorization: Bearer {token}
Send a message (WebSocket frame):
// Client β Server
{ type: "send_message", chat_id, content, client_msg_id }
// Server β Sender (acknowledgment)
{ type: "ack_sent", message_id, client_msg_id }
// Server β Recipient Client
{ type: "new_message", message_id, chat_id, sender_id, content, created_at }
Acknowledge delivery and read (WebSocket frames):
// Recipient Client β Server (auto-sent on message receipt)
{ type: "ack_delivered", message_id }
// Client β Server (sent when user views the conversation)
{ type: "ack_read", chat_id, last_read_message_id }
// Server β Original Sender (status propagation)
{ type: "status_update", message_id, status: "delivered" | "read" }
Fetch offline messages (REST, on reconnect):
GET /chats/{chat_id}/messages?after={cursor}&limit=50
Response: { messages: [...], next_cursor }
Create a group:
POST /groups
Body: { name, member_ids }
Response: { group_id, chat_id }
WebSocket over HTTP polling: Long polling holds one request thread per client while it waits for a message. WebSocket maintains a persistent TCP connection at a fraction of the resource cost. At 500M DAU with many users actively chatting, HTTP polling is not viable. WebSocket is the only practical choice for bidirectional real-time delivery at this scale.
client_msg_id: Each client generates a unique
client_msg_idbefore sending. The server stores it alongside the server-assignedmessage_id. If the client sends a message, loses theack_sent, and retries the send, the server recognises the duplicateclient_msg_idand returns the existingmessage_idrather than inserting a second row.
High-Level Design
1. Users can send and receive 1-to-1 messages in real time
The write and deliver path: the client sends a message, the server stores it in Cassandra, and routes it via Redis Pub/Sub to whichever Chat Server holds the recipient's WebSocket.
The core architecture is a fleet of stateful Chat Servers. Each server maintains WebSocket connections for a slice of online users. Routing between servers happens through Redis Pub/Sub: each Chat Server subscribes to channels keyed by the user_ids of its connected clients.
Components:
- Client: Mobile or web app holding a persistent WebSocket connection to a Chat Server.
- Chat Server: Stateful service managing WebSocket connections. Writes messages to the Message DB, then routes delivery via Redis Pub/Sub.
- Redis Pub/Sub: Each Chat Server subscribes to
user:{user_id}for every connected client. Publishing touser:{recipient_id}delivers to whichever server holds that connection, without any lookup table. - Message DB (Cassandra): Durable message store partitioned by
chat_id. Source of truth for all message content and status.
Request walkthrough:
- Sender sends
{ type: "send_message", chat_id, content, client_msg_id }over WebSocket. - Chat Server A generates a Snowflake
message_idand inserts{ message_id, chat_id, sender_id, content, status: "sent" }into Cassandra. - Chat Server A publishes
{ message_id, chat_id, content }to Redis channeluser:{recipient_id}. - Chat Server B (subscribed to
user:{recipient_id}) receives the event and pushesnew_messageto the recipient via WebSocket. - Chat Server A returns
ack_sentto the sender.
Chat Server A returns ack_sent to the sender once Cassandra commits and Redis acknowledges the publish. The Redis Pub/Sub publish returns the subscriber count; zero means the recipient is offline. The offline delivery path fires on that signal, covered in the next requirement.
I always write to Cassandra before publishing to Redis Pub/Sub. I learned this ordering the hard way: reversing it means a delivered message might never be persisted, and there is nothing to retry from. If those two operations were reversed, a message could be delivered to the recipient but never persisted, leaving nothing to retry from on reconnect.
2. Users receive messages when offline
The offline path: when Redis Pub/Sub returns zero subscribers, the message_id is queued in Redis, a push notification wakes the device, and the full message is delivered on reconnect.
When the recipient is not connected to any Chat Server, the Redis channel has no subscriber. The message must persist durably until the recipient reconnects, then be delivered reliably without polling.
Components:
- Chat Server (updated): After a Pub/Sub publish returns 0 subscribers, writes the
message_idto an Offline Queue and triggers a push notification. - Offline Queue (Redis List): A per-user FIFO list keyed
offline:{user_id}. Storesmessage_ids of undelivered messages. Bounded at 1,000 entries with a 30-day TTL. - Push Notification Service: Sends an APNs or FCM push to wake the device and trigger a reconnect. Fires only after the offline queue write, never before.
Request walkthrough (offline path):
- Chat Server A publishes to
user:{recipient_id}. Redis returns 0 subscribers. - Chat Server A appends
message_idtooffline:{recipient_id}withLPUSHand a 30-day TTL. - Chat Server A calls Push Notification Service with the sender name and a message preview.
- Push Notification Service sends an APNs or FCM notification to the recipient's device.
- Recipient reconnects. Chat Server B calls
LRANGE offline:{recipient_id} 0 -1. - Chat Server B batch-fetches full message content for the returned IDs from Cassandra.
- Chat Server B delivers the messages in order and sends
ack_deliveredback upstream per message.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.