Design a live commenting system for broadcasts like Facebook Live or YouTube Live that delivers thousands of new comments per second to millions of concurrent viewers in near real time.
33 min read2026-03-29hardlive-commentsreal-timesystem-designwebsocketspub-sub
A live comments feature lets viewers post and see comments on a broadcast in real time. The interesting engineering problem is not the storage. It is the fan-out: one viewer posts a comment, and every other viewer's screen must update within a second.
A naive pub-sub collapses under the load of millions of concurrent connections; a naive broadcast kills the database. I'd lead with the fan-out ratio in the interview because it is the number that makes this problem hard and every other system design problem easy by comparison. This question tests connection management at scale, fan-out architecture under write amplification, WebSocket state at the fleet level, and pub-sub decoupling.
The hardest part in scope: Fan-out. One comment post triggers delivery to potentially millions of open WebSocket connections simultaneously. Doing this with a naive shared server is a guaranteed single point of failure at exactly the moment traffic is highest.
Comment moderation is below the line because it does not change the delivery path. To add it, attach an async Kafka consumer that runs each new comment through a content classifier. Shadow-delete flagged comments before they reach the fan-out layer. An admin UI exposes the shadow-deleted queue for human review.
Persistent threads are below the line because they live on a different access pattern: paginated reads against a stable dataset rather than streaming pushes. To add them, flush the stream's comment log to a relational comments table on stream-end and serve it through a standard paginated API.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.
Reactions are below the line because they require a separate aggregation strategy: individual reaction events must be collapsed to per-stream emoji counters before delivery. To add them, route reaction events through a sliding-window counter service and deliver aggregate counts to viewers on a lower-frequency channel (every 500ms rather than per event).
Latency: A posted comment appears on all viewer screens within 1 second of posting (p99). This rules out polling as the delivery mechanism, since polling at 1-second intervals burns N requests per second where N is the number of viewers.
Availability: 99.99% uptime. New comments must be postable even during partial backend failures. Availability over consistency: if a viewer misses one comment during a brief partition, that is acceptable.
Scale: 10M concurrent viewers per stream at peak; up to 10K comments per second during high-energy moments. These numbers must not be treated as theoretical.
Ordering: Comments within a stream appear in approximate posting order. Eventual consistency is acceptable for comments posted within the same second.
Durability: Recently posted comments survive a server restart. A viewer joining the stream sees the last 100 comments.
Exactly-once comment delivery (at-least-once is fine; duplicate suppression adds coordination cost without user-visible benefit)
Cross-region synchronization for global broadcasts (viable through multi-region Kafka replication but deferred)
Write/fan-out ratio: With 10K comments per second and 10M concurrent viewers, each write triggers 10M deliveries. That is a 1:10,000,000 write-to-fan-out amplification. Almost no other system design problem has this property. Every architecture decision in this article is a response to this number. If you quote nothing else in the interview, quote this ratio.
The fan-out ratio means the total delivery obligation is 10,000 x 10,000,000 = 100 billion comment deliveries per second at absolute peak. You never actually deliver 100B independent messages per second; you use broadcast trees and shared subscriptions to reduce the work, but the logical fan-out is real. Keeping that number in mind is what separates a good answer from a great one.
Stream: The live broadcast. Identified by a stream_id, carries the host user, creation timestamp, and status (live or ended). All other entities link to a stream.
Comment: A single comment event. Carries a comment_id, stream_id, author_id, text content, and posted_at timestamp. The primary unit of storage and delivery.
ViewerSession: A connected viewer. Carries a viewer_id, stream_id, connection metadata (server node, socket ID), and join timestamp. Lives as long as the WebSocket connection is open.
Full schema details, including indexes and the Redis sorted set shape, are deferred to the deep dives. These three entities are sufficient to drive the API and High-Level Design.
The API has two distinct shapes: a REST endpoint for posting comments, and a streaming channel for receiving them. Both are needed because the client is a participant (posting) and a subscriber (receiving).
FR 1 and FR 4: Post a comment and join a stream:
POST /streams/{stream_id}/commentsAuthorization: Bearer <token>Body: { text }Response: { comment_id, posted_at }
POST creates a new resource. The response returns only the server-assigned comment_id and timestamp; the other fields are known to the client already. Authentication is out of scope for this design but I'd add a standard Authorization: Bearer header here in a real system.
FR 2: Open a real-time comment stream:
WebSocket: wss://host/streams/{stream_id}/feed
Server pushes: { comment_id, author_id, text, posted_at }
WebSocket over Server-Sent Events (SSE) here because viewers both post and receive. SSE is unidirectional: great for read-only consumption, but it requires a separate POST endpoint for writes and two separate connections per client. WebSocket gives you a single bidirectional connection per viewer, which halves the connection count and simplifies multiplexing comments and system messages (like "Stream is ending") over the same channel.
Long polling is not a serious option: it works for small audiences but generates N reconnect storms per second and cannot hit the 1-second latency target at scale.
The client calls this once when joining, before or coincident with the WebSocket handshake. The oldest_cursor is a timestamp-based cursor for paging further back if the viewer wants scrollback. I'd design it cursor-based rather than offset-based because the comment stream is append-only; cursor pagination is stable under concurrent inserts.
Keep history and streaming separate
Some designs use the WebSocket connection itself to replay recent comments on open. This works but couples the replay mechanism to the real-time delivery path. Keep them separate: REST for history, WebSocket for live.
The write and read path first, real-time delivery later.
Start with the simplest possible system: the client posts a comment, an App Server stores it, and the same server serves recent comments when a new viewer joins.
Components:
Client: Web or mobile app sending POST requests and WebSocket upgrades.
App Server: Receives comment POSTs, validates them, writes to the database, and serves recent-comment GETs.
Database: Stores all Comment rows. Indexed by (stream_id, posted_at DESC) for efficient recent-comment queries.
Request walkthrough (post a comment):
Client sends POST /streams/{stream_id}/comments with { text }.
App Server validates the request (stream exists, text not empty).
App Server generates a comment_id and writes the comment to the database.
App Server returns { comment_id, posted_at } to the client.
Request walkthrough (load recent comments on join):
Client sends GET /streams/{stream_id}/comments/recent?limit=100.
App Server queries the database for the 100 most recent comments on this stream.
App Server returns the sorted list to the client.
This handles persistence and history. It delivers nothing to viewers who are already watching. That is the next problem. I'd draw this diagram first on the whiteboard to anchor the conversation, then immediately say "this handles zero real-time delivery" to set up the fan-out discussion.
The simplest real-time design has every viewer open a WebSocket to the same App Server. When a comment is posted, the server holds all connections in memory and broadcasts directly.
Components (added):
App Server (extended): Now also accepts WebSocket connections and broadcasts posted comments to all in-memory connections.
Walkthrough:
Each viewer opens a WebSocket to the App Server on /streams/{stream_id}/feed.
The App Server stores each connection in a local hash map: stream_id -> [conn1, conn2, ...].
A comment POST arrives. After writing to DB, the server iterates the connection list for this stream and sends the comment to each open socket.
This works for 100 viewers on one server. It does not survive the NFRs. I've seen teams demo this architecture in a staging environment with 50 test connections and declare it production-ready, only to discover that the first real broadcast with 100K viewers melts the server.
With 10M viewers per stream, one server cannot hold 10M open TCP connections. A single Linux process tops out around 1M file descriptors, and that requires kernel tuning. More practically, 10M connections at even 1 KB of kernel buffer each is 10 GB of RAM just for socket state, before any application memory.
With multiple servers, the problem inverts: a comment posted to Server A is visible only to the viewers whose WebSocket happens to land on Server A. The 90% of viewers connected to Server B through K never see the comment. The NFR says 1-second delivery to all viewers; partial delivery is a failure.
The fix is a pub-sub layer that every App Server subscribes to. When a comment is posted, it is published to a channel for that stream. Every App Server listening to that stream receives the comment and pushes it to its locally connected viewers.