Likes Counter
Design a scalable likes counter for celebrity posts receiving millions of writes per second: handling hot-key write amplification, aggregating counts with acceptable staleness, and preventing double-likes.
What is a likes counter for high-profile posts?
A likes counter tracks how many users have liked a post and enforces that no single user can like the same post twice. The deceptively simple task of incrementing a number becomes a serious distributed systems problem the moment a celebrity posts and 100,000 users like it within 60 seconds.
I like this interview question because it forces you past the "just use Redis" reflex. A single Redis key saturates at exactly the wrong moment. This problem tests hot-key mitigation, write aggregation, probabilistic duplicate rejection, and write-behind persistence in one focused question.
Functional Requirements
Core Requirements
- Users can like or unlike a post.
- Any user viewing a post can see the current like count.
- A user can only like a post once (no double-likes).
Below the Line (out of scope)
- Like notifications delivered to the post author
- Displaying who liked a post (partial like list for display)
- Reactions beyond a binary like (emoji reactions)
- Real-time push of count updates to all active viewers
The hardest part in scope: Writing 100,000 likes per second to the same logical counter for a celebrity post without saturating a single database row or cache key. The uniqueness constraint adds a second layer of difficulty: we must prevent double-likes at a fraction of the per-like write cost.
Like notifications are below the line because they introduce a fan-out problem that is orthogonal to the counter design. To support them, I would publish a like_event to Kafka and have a notification service consume it asynchronously, separate from the write path.
Displaying who liked a post is below the line because it requires paginated queries across potentially millions of records. To add it, I would store (post_id, user_id, liked_at) in a dedicated table with an index on (post_id, liked_at DESC) for cursor-based pagination.
Real-time push of count updates is below the line because it adds WebSocket fan-out complexity without changing the counter storage design. To add it, I would use Server-Sent Events, publishing count deltas from the read service to connected clients on a configurable threshold (for example, every 1,000 new likes).
Non-Functional Requirements
Core Requirements
- Write throughput: Celebrity posts receive up to 100,000 like writes per second for the first 2-5 minutes after posting. Average posts receive far less (roughly 1 like per second at peak).
- Read throughput: With 500M DAU and an average of 20 post views per day, the system processes approximately 115,000 count reads per second.
- Write latency: A like write acknowledges in under 200ms p99.
- Read latency: A like count read returns in under 50ms p99.
- Availability: 99.99% uptime. Availability over consistency: a stale count that is 1-5 seconds behind is fully acceptable.
- Uniqueness: Each user can like a post at most once. Duplicate writes must be rejected.
Below the Line
- Sub-5ms read latency via CDN edge caching (requires additional infrastructure)
- Global multi-region write consistency (out of scope for this design)
Read/write ratio: During the peak of a celebrity post, the ratio briefly inverts to roughly 1:1 (equal writes and reads). For average posts and steady-state traffic, reads outpace writes by roughly 10:1. Both extremes need addressing: the peak write case determines the counter design, and the read-heavy steady state determines the caching strategy.
The 100,000 writes-per-second target rules out any design that routes all writes to a single storage node. A typical Redis node handles 100,000-200,000 ops/second total.
With no write spreading, a single hot key saturates the CPU of one Redis primary almost immediately. The 1-5 second stale-count tolerance is the most consequential NFR: it unlocks batching, in-memory aggregation, and async database flush, all of which are required to absorb the write burst.
Core Entities
- Like: The canonical record that a specific user liked a specific post. Contains
post_id,user_id, andliked_at. Enforces uniqueness at the database layer via aUNIQUE(post_id, user_id)constraint. - Post: The content item being liked. Treated as an external entity. The counter design does not own the post record.
- LikeCount: The aggregated count for a post. In the naive design this is a derived value from
COUNT(*). In the evolved design it is a materialized value held in Redis shards and periodically flushed to the database.
I'd keep the entity list this short in an interview. The full schema, indexes, and flush semantics are deferred to the deep dives. These three entities are sufficient to drive the API and high-level design.
API Design
Two functional requirements drive the API shape.
FR 1 and FR 3 - Like or unlike a post:
POST /posts/{post_id}/likes
Authorization: Bearer <token>
Response: 200 OK | 409 Conflict (already liked)
DELETE /posts/{post_id}/likes
Authorization: Bearer <token>
Response: 200 OK | 404 Not Found (not liked)
POST creates a like record. DELETE removes it. Both derive user_id from the auth token rather than the request body, preventing any user from spoofing another user's like action.
The 409 on POST and 404 on DELETE are the signals the client uses to update button state without a separate round trip.
FR 2 - Read the current like count:
GET /posts/{post_id}/likes/count
Response: { "count": 14823901, "is_approximate": true }
The is_approximate field is the honest signal to clients that the count may be 1-5 seconds stale. Surfacing this explicitly prevents product teams from building features that depend on exact real-time counts, which this system deliberately does not guarantee at peak load.
High-Level Design
1. Naive approach: direct database writes
The simplest design routes every like directly to a relational database. The database enforces uniqueness via a UNIQUE(post_id, user_id) constraint and computes the count via COUNT(*) on each read.
Components:
- Client: Issues
POST /posts/{id}/likesto like a post andGET /posts/{id}/likes/countto read the count. - API Service: Validates the auth token, extracts the user ID, and issues SQL writes and reads.
- Database: Stores the canonical like records.
SELECT COUNT(*)returns the current count.
Request walkthrough (write):
- Client sends
POST /posts/42/likeswith an auth token. - API service validates the token and extracts
user_id. - Service issues
INSERT INTO likes (post_id, user_id, liked_at) VALUES (42, 891, now()). - The
UNIQUE(post_id, user_id)constraint rejects duplicates with a constraint violation, which the service maps to a 409. - Client receives
200 OKor409 Conflict.
This is the complete naive system. It is correct for low-traffic posts. I'd sketch this on the whiteboard first because it shows the interviewer you can identify the simplest correct solution before optimizing. It breaks under celebrity traffic.
The problem at celebrity scale
This design breaks immediately when a celebrity posts. At 100,000 writes per second, every write serializes against the same rows in the same database table. PostgreSQL accumulates contention on the (post_id, user_id) index partition.
SELECT COUNT(*) becomes an expensive index scan across potentially millions of rows. A single database node saturates well below 100,000 writes/second.
The core failure is a hot-key problem: all 100,000 writes per second are directed at the same logical key (post_id = <celebrity_post>). Any single-node storage system, whether a database row or a single Redis key, will saturate under this load pattern.
A single Redis key (INCR post:42:count) does not solve the hot-key problem. Redis is single-threaded per key. At 100,000 INCR ops/second, one Redis key consumes the full CPU of one node. The saturation point is just higher than Postgres before it happens, not eliminated.
2. Evolved approach: sharded Redis counters with async flush
The fix is to spread writes across K independent Redis keys (shards) for the same post. A write picks a random shard and INCRs it. A read fetches all K shards with a single MGET and sums them.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.