Live Comments
Design a live commenting system for broadcasts like Facebook Live or YouTube Live that delivers thousands of new comments per second to millions of concurrent viewers in near real time.
What is a live comments feature?
A live comments feature lets viewers post and see comments on a broadcast in real time. The interesting engineering problem is not the storage. It is the fan-out: one viewer posts a comment, and every other viewer's screen must update within a second.
A naive pub-sub collapses under the load of millions of concurrent connections; a naive broadcast kills the database. I'd lead with the fan-out ratio in the interview because it is the number that makes this problem hard and every other system design problem easy by comparison. This question tests connection management at scale, fan-out architecture under write amplification, WebSocket state at the fleet level, and pub-sub decoupling.
Functional Requirements
Core Requirements
- Viewers can post comments on a live stream.
- All viewers see new comments within 1 second of posting.
- Popular streams may have millions of concurrent viewers and thousands of comments per second.
- Viewers can load the most recent comments when they first join a stream.
Below the Line (out of scope)
- Comment moderation and toxicity filtering
- Persistent comment threads after the stream ends
- Reactions and emoji bursts
The hardest part in scope: Fan-out. One comment post triggers delivery to potentially millions of open WebSocket connections simultaneously. Doing this with a naive shared server is a guaranteed single point of failure at exactly the moment traffic is highest.
Comment moderation is below the line because it does not change the delivery path. To add it, attach an async Kafka consumer that runs each new comment through a content classifier. Shadow-delete flagged comments before they reach the fan-out layer. An admin UI exposes the shadow-deleted queue for human review.
Persistent threads are below the line because they live on a different access pattern: paginated reads against a stable dataset rather than streaming pushes. To add them, flush the stream's comment log to a relational comments table on stream-end and serve it through a standard paginated API.
Reactions are below the line because they require a separate aggregation strategy: individual reaction events must be collapsed to per-stream emoji counters before delivery. To add them, route reaction events through a sliding-window counter service and deliver aggregate counts to viewers on a lower-frequency channel (every 500ms rather than per event).
Non-Functional Requirements
Core Requirements
- Latency: A posted comment appears on all viewer screens within 1 second of posting (p99). This rules out polling as the delivery mechanism, since polling at 1-second intervals burns N requests per second where N is the number of viewers.
- Availability: 99.99% uptime. New comments must be postable even during partial backend failures. Availability over consistency: if a viewer misses one comment during a brief partition, that is acceptable.
- Scale: 10M concurrent viewers per stream at peak; up to 10K comments per second during high-energy moments. These numbers must not be treated as theoretical.
- Ordering: Comments within a stream appear in approximate posting order. Eventual consistency is acceptable for comments posted within the same second.
- Durability: Recently posted comments survive a server restart. A viewer joining the stream sees the last 100 comments.
Below the Line
- Exactly-once comment delivery (at-least-once is fine; duplicate suppression adds coordination cost without user-visible benefit)
- Cross-region synchronization for global broadcasts (viable through multi-region Kafka replication but deferred)
Write/fan-out ratio: With 10K comments per second and 10M concurrent viewers, each write triggers 10M deliveries. That is a 1:10,000,000 write-to-fan-out amplification. Almost no other system design problem has this property. Every architecture decision in this article is a response to this number. If you quote nothing else in the interview, quote this ratio.
The fan-out ratio means the total delivery obligation is 10,000 x 10,000,000 = 100 billion comment deliveries per second at absolute peak. You never actually deliver 100B independent messages per second; you use broadcast trees and shared subscriptions to reduce the work, but the logical fan-out is real. Keeping that number in mind is what separates a good answer from a great one.
Core Entities
- Stream: The live broadcast. Identified by a
stream_id, carries the host user, creation timestamp, and status (liveorended). All other entities link to a stream. - Comment: A single comment event. Carries a
comment_id,stream_id,author_id, text content, andposted_attimestamp. The primary unit of storage and delivery. - ViewerSession: A connected viewer. Carries a
viewer_id,stream_id, connection metadata (server node, socket ID), and join timestamp. Lives as long as the WebSocket connection is open.
Full schema details, including indexes and the Redis sorted set shape, are deferred to the deep dives. These three entities are sufficient to drive the API and High-Level Design.
API Design
The API has two distinct shapes: a REST endpoint for posting comments, and a streaming channel for receiving them. Both are needed because the client is a participant (posting) and a subscriber (receiving).
FR 1 and FR 4: Post a comment and join a stream:
POST /streams/{stream_id}/comments
Authorization: Bearer <token>
Body: { text }
Response: { comment_id, posted_at }
POST creates a new resource. The response returns only the server-assigned comment_id and timestamp; the other fields are known to the client already. Authentication is out of scope for this design but I'd add a standard Authorization: Bearer header here in a real system.
FR 2: Open a real-time comment stream:
WebSocket: wss://host/streams/{stream_id}/feed
Server pushes: { comment_id, author_id, text, posted_at }
WebSocket over Server-Sent Events (SSE) here because viewers both post and receive. SSE is unidirectional: great for read-only consumption, but it requires a separate POST endpoint for writes and two separate connections per client. WebSocket gives you a single bidirectional connection per viewer, which halves the connection count and simplifies multiplexing comments and system messages (like "Stream is ending") over the same channel.
Long polling is not a serious option: it works for small audiences but generates N reconnect storms per second and cannot hit the 1-second latency target at scale.
FR 4: Load recent comments on join:
GET /streams/{stream_id}/comments/recent?limit=100
Response: { comments: [{ comment_id, author_id, text, posted_at }], oldest_cursor }
The client calls this once when joining, before or coincident with the WebSocket handshake. The oldest_cursor is a timestamp-based cursor for paging further back if the viewer wants scrollback. I'd design it cursor-based rather than offset-based because the comment stream is append-only; cursor pagination is stable under concurrent inserts.
Keep history and streaming separate
Some designs use the WebSocket connection itself to replay recent comments on open. This works but couples the replay mechanism to the real-time delivery path. Keep them separate: REST for history, WebSocket for live.
High-Level Design
1. Posting and loading comments (FR 1 and FR 4)
The write and read path first, real-time delivery later.
Start with the simplest possible system: the client posts a comment, an App Server stores it, and the same server serves recent comments when a new viewer joins.
Components:
- Client: Web or mobile app sending POST requests and WebSocket upgrades.
- App Server: Receives comment POSTs, validates them, writes to the database, and serves recent-comment GETs.
- Database: Stores all Comment rows. Indexed by
(stream_id, posted_at DESC)for efficient recent-comment queries.
Request walkthrough (post a comment):
- Client sends
POST /streams/{stream_id}/commentswith{ text }. - App Server validates the request (stream exists, text not empty).
- App Server generates a
comment_idand writes the comment to the database. - App Server returns
{ comment_id, posted_at }to the client.
Request walkthrough (load recent comments on join):
- Client sends
GET /streams/{stream_id}/comments/recent?limit=100. - App Server queries the database for the 100 most recent comments on this stream.
- App Server returns the sorted list to the client.
This handles persistence and history. It delivers nothing to viewers who are already watching. That is the next problem. I'd draw this diagram first on the whiteboard to anchor the conversation, then immediately say "this handles zero real-time delivery" to set up the fan-out discussion.
2. Delivering comments to all viewers in real time (FR 2)
The fan-out problem has two layers: connection management and message distribution.
Naive approach: all connections on one server
The simplest real-time design has every viewer open a WebSocket to the same App Server. When a comment is posted, the server holds all connections in memory and broadcasts directly.
Components (added):
- App Server (extended): Now also accepts WebSocket connections and broadcasts posted comments to all in-memory connections.
Walkthrough:
- Each viewer opens a WebSocket to the App Server on
/streams/{stream_id}/feed. - The App Server stores each connection in a local hash map:
stream_id -> [conn1, conn2, ...]. - A comment POST arrives. After writing to DB, the server iterates the connection list for this stream and sends the comment to each open socket.
This works for 100 viewers on one server. It does not survive the NFRs. I've seen teams demo this architecture in a staging environment with 50 test connections and declare it production-ready, only to discover that the first real broadcast with 100K viewers melts the server.
Break it
With 10M viewers per stream, one server cannot hold 10M open TCP connections. A single Linux process tops out around 1M file descriptors, and that requires kernel tuning. More practically, 10M connections at even 1 KB of kernel buffer each is 10 GB of RAM just for socket state, before any application memory.
With multiple servers, the problem inverts: a comment posted to Server A is visible only to the viewers whose WebSocket happens to land on Server A. The 90% of viewers connected to Server B through K never see the comment. The NFR says 1-second delivery to all viewers; partial delivery is a failure.
Evolved: pub-sub bridge across the server fleet
The fix is a pub-sub layer that every App Server subscribes to. When a comment is posted, it is published to a channel for that stream. Every App Server listening to that stream receives the comment and pushes it to its locally connected viewers.
Components (added):
- Message Broker: Carries comment events from the write path to all App Servers subscribed to a stream.
- Load Balancer: Distributes viewer WebSocket connections across the App Server fleet. Uses consistent hashing on
stream_idfor sticky routing.
Walkthrough:
- Viewer opens WebSocket. Load Balancer routes to an App Server using consistent hashing on
stream_id. - App Server registers itself as a subscriber to that stream's Kafka partition.
- Comment POST arrives at any App Server. Server writes to DB, then publishes to Kafka.
- All App Servers subscribed to the stream's partition receive the Kafka message.
- Each App Server pushes the comment to its locally connected viewers over open WebSocket connections.
This is the baseline evolved design. The Kafka topic per stream is what lets you scale the App Server fleet horizontally with no inter-server RPC. The deeper questions belong in the deep dives below.
Name the pub-sub layer early in the interview
Candidates who say "App Servers broadcast to viewers" without adding the pub-sub broker lose points when the interviewer asks "what happens if a viewer is connected to a different server than the one that received the POST?" Name the broker early. It is the load-bearing piece.
3. Millions of concurrent viewers (FR 3 preview)
At 10M viewers per stream, even the evolved pub-sub has a bottleneck: 10M connections spread across 200 App Servers means each server holds 50K WebSocket connections. That is feasible with async I/O, but the Kafka fan-out from a single topic to 200 subscriber threads becomes a throughput concern when comment rates spike.
Components:
- Load Balancer: Routes connections using
stream_idhash to a specific gateway node, ensuring all viewers of one stream land on the same gateway cluster. - WebSocket Gateway Fleet: Dedicated gateway nodes optimized for long-lived connections. Each node owns a shard of stream connections.
- Fan-out Workers: Stateless consumers that read from Kafka and push events to the correct gateway node(s).
Request walkthrough:
- Viewer opens a WebSocket connection. The load balancer hashes
stream_idto a gateway shard. - The WebSocket Gateway node accepts the long-lived connection and subscribes to events for that stream.
- A new comment is posted, published to Kafka.
- A Fan-out Worker consumes the Kafka event and pushes it to the gateway node(s) serving that stream.
- The gateway broadcasts to all local viewer connections.
The load balancer's stream_id hashing is the key insight: every viewer of a given stream lands on the same gateway shard, turning the broad fan-out into a localized broadcast per shard. I'd highlight this in the interview because it is the difference between O(N) internal messages and O(shard_count) internal messages per comment.
The full treatment is in Deep Dive 3 below. The correct preview answer is: shard the connection layer by stream_id to isolate hot streams, and use a dedicated WebSocket Gateway tier rather than the general App Server fleet.
Potential Deep Dives
1. Which real-time protocol: WebSocket vs SSE vs long polling?
The choice of delivery protocol determines connection cost, server architecture, and how complex the client reconnect logic becomes.
2. Fan-out architecture: delivering one comment to millions of viewers
This is the central engineering problem. The question is how to move one comment event from the server that received it to every other server holding viewer connections for that stream.
3. Connection scaling: handling 10M concurrent WebSockets
10M open WebSocket connections is a real infrastructure challenge. The question is how to shard them without creating hot spots when one viral stream dominates traffic.
4. Loading recent comments on join: the "last 100 comments" problem
A viewer joining mid-stream needs to see the last 100 comments instantly, before the WebSocket stream begins. Getting this right under 10K viewers joining per second requires more than a database query.
Final Architecture
The complete architecture connects every component introduced across the functional requirements and deep dives.
The central insight is that the fan-out problem requires two separate decoupling layers: Kafka decouples the comment write rate from the delivery rate, and the stream-sharded Gateway tier decouples the connection state from the stateless App Tier. Without both, either the database collapses under write pressure or the Gateway tier collapses under connection pressure. With both, each layer scales independently.
Interview Cheat Sheet
-
WebSocket over SSE for this design. Viewers both post and receive. SSE is fine for read-only consumers but requires two connections per active commenter. WebSocket gives you one bidirectional channel, simplifying lifecycle management and halving connection count.
-
Fan-out ratio is the threat. 10K comments/sec x 10M viewers = 100B logical deliveries per second. No database can absorb this directly. A pub-sub layer is not optional; it is the only way to decouple the write rate from the delivery rate.
-
Kafka over Redis Pub/Sub for persistence. Redis Pub/Sub is fire-and-forget. A gateway restart or a Redis blip drops messages with no recovery. Kafka retains events for 24 hours, so reconnecting viewers replay from their last consumed offset.
-
Stream-shard the Gateway tier, not round-robin. Consistent hashing on
stream_idroutes all connections for a given stream to the same subset of Gateway nodes. Fan-out Workers push new comments to a known small set of nodes per stream rather than broadcasting to all nodes. Internal fan-out traffic drops by 20x on a large fleet. -
CDN edge relay only above ~1M viewers per stream. Below that threshold, origin Gateway nodes with global anycast routing are sufficient. Above it, regional WebSocket termination reduces per-viewer latency from 150-200ms to under 30ms.
-
Redis sorted set for join-time history.
ZADDon every new comment,ZREVRANGEfor the last 100 on join,ZREMRANGEBYRANKto trim to 200 members. Sub-1ms response for all join requests. DB is the cold fallback on cache miss, never the primary path. -
Fan-out workers are stateless and horizontally scalable. One Kafka consumer per partition. No coordination between worker instances. Scale by adding pods to the consumer group deployment.
-
Approximate ordering is acceptable. Comments posted within the same second may arrive slightly out of order on different viewer screens. Requiring strict global ordering demands a single-partition Kafka topic, which creates a throughput ceiling you cannot break.
-
Heartbeat frames keep WebSocket connections alive. Proxies and load balancers have idle TCP timeouts (often 60-90 seconds). Send a lightweight ping frame every 30 seconds to prevent silent drops on idle connections.
-
Quote these numbers in the interview: 10M concurrent viewers, 10K comments/second, 1:10,000,000 write-to-fan-out ratio, 100B logical deliveries per second at peak. These numbers are what makes an answer sound like it came from someone who has operated real infrastructure.