Trending Topics
Design a system that tracks the K most-shared articles within sliding time windows: ingesting share events at scale, maintaining real-time leaderboards per window, and serving ranked results with low latency.
What is a trending articles system?
A trending articles system tracks the K most-shared articles over a configurable sliding time window, such as the last hour, 24 hours, or 7 days. The hidden engineering challenge is time-bounded counting: share events that fall outside the window must stop contributing to an article's score, which means counts age out continuously rather than only accumulating.
This is harder than a generic Top-K because the answer changes every second as old events drop off. I find this one of the best interview questions because candidates initially assume it is just "count stuff and sort," then realize the sliding window fundamentally changes the data structure choices. The design gets pushed toward bucket-level granularity, approximate counting, and precomputation.
Functional Requirements
Core Requirements
- Users can share an article, generating a share event that increments the article's trending score.
- The system exposes the top-K trending articles for configurable sliding windows (last 1 hour, 24 hours, 7 days). K is configurable, defaulting to 10, with a maximum of 100.
- Trending results can be filtered by category (tech, sports, politics).
Below the Line (out of scope)
- Article content storage and serving
- User notifications when a shared article enters the trending list
- Trending people, hashtags, or topics (articles only)
- Real-time push of trending updates to all connected clients
The hardest part in scope: Maintaining a correct leaderboard across a sliding time window at 100,000 share events per second. Counts must age out continuously as the window moves, which rules out naive counters and requires a bucket-based aggregation strategy combined with precomputed results.
Article content storage is below the line because it solves a completely different problem. To add it, I would store article metadata in a relational database with a full-text search index, separate from the counter pipeline we're designing here.
User notifications are below the line because they introduce a fan-out write pattern orthogonal to the counter design. To add them, I would publish a Kafka event when an article first enters the top-K, consumed by a dedicated notification service that fans out to interested users asynchronously.
Real-time push of trending updates is below the line because it adds WebSocket fan-out complexity without changing the storage design. To add it, I would use Server-Sent Events: when the precomputed top-K changes, the Read Service publishes a diff event that SSE connections consume, keeping the push path entirely separate from the read path.
Non-Functional Requirements
Core Requirements
- Write throughput: 100,000 share events per second at peak, during viral events when a single article dominates a window. This rules out any design that routes all writes through a single storage key.
- Read throughput: 500M DAU with 30 page loads per day each, producing roughly 170,000 trending reads per second. This demands precomputed results, not live aggregation.
- Read latency: Trending list returns in under 50ms p99. A real-time ZUNIONSTORE across 60 Redis sorted sets on every request is too slow at this read rate.
- Write latency: Share event acknowledged in under 100ms. The ingestion path must stay thin and delegate aggregation work to an async pipeline.
- Freshness: Top-K results are at most 30 seconds stale. This is the key tolerance that enables precomputation and CDN caching.
- Availability: 99.99% uptime. Availability over consistency: a slightly stale trending list is always preferable to an error response.
Below the Line
- Sub-5ms global read latency via CDN edge (achievable but not a core NFR in this design)
- Exactly-once share counting (at-least-once with deduplication is sufficient)
Read/write ratio: At steady state, reads outpace writes roughly 50:1. During viral spikes, write rate briefly spikes 10x. The system must handle both extremes independently: the write path must absorb bursts without affecting read latency, and the read path at 170,000 requests/second requires precomputed answers rather than live computation.
The 30-second freshness tolerance is the most consequential NFR in this design. It gives us permission to compute the top-K in a background job and cache the result, rather than aggregating on every read. Without this tolerance, the architecture would require a much more complex real-time aggregation pipeline.
The 100,000 writes per second for a single viral article is the worst-case write multiplier. Any design that routes all writes for one article to a single Redis key will saturate a Redis node, since Redis serializes all operations on a single key on one thread.
Core Entities
- ShareEvent: A record that a specific user shared a specific article at a timestamp. Contains
article_id,user_id, andshared_at. This is the raw event that drives all downstream counting. - Article: An external content item identified by
article_id, with acategoryfield used for filtered trending queries. This service does not own article content or metadata beyond the category. - TrendingBucket: An aggregated score for an article within a specific one-minute time bucket. Contains
bucket_ts,article_id,category, andshare_count. This is the fundamental unit of the sliding window implementation. - TrendingResult: A precomputed cached list of the top-K articles for a given window and optional category. Refreshed every 30 seconds by the background Trending Compute Job.
Full schema, bucket key design, and Redis data structures are covered in the deep dives. These four entities are sufficient to drive the API and high-level design.
API Design
Three functional requirements drive the API shape.
FR 1 - Record a share event:
POST /articles/{article_id}/shares
Authorization: Bearer <token>
Response: 202 Accepted
202 Accepted over 200 OK because the write is asynchronous: the ingestion service accepts the event to Kafka and returns immediately. A 200 would incorrectly imply the count was atomically updated, which it is not.
FR 2 - Get the top-K trending articles:
GET /trending?window=1h&k=10&category=tech&cursor=<opaque_cursor>
Response: {
"articles": [
{ "article_id": "a1b2c3", "title": "...", "share_count": 45231, "rank": 1 }
],
"window": "1h",
"computed_at": "2026-03-29T00:00:00Z",
"next_cursor": "..."
}
The computed_at field is the explicit freshness signal to clients. It tells product teams that results are up to 30 seconds stale by design, preventing them from building features that depend on exact real-time counts.
Cursor-based pagination handles large K values. Even at K=100, the result set is small, but the cursor enables incremental loading in mobile UIs. The category filter is optional; when omitted, the response covers all categories. When specified, the Read Service fetches a category-specific precomputed result at no extra aggregation cost.
FR 3 - Trending articles filtered by category:
This uses the same GET /trending endpoint shown above with the category query parameter. No separate endpoint is needed because category filtering is resolved at precompute time, not at query time.
High-Level Design
1. Naive approach: share events directly to the database
The simplest design writes every share to a share_events table and queries it for the trending list at read time.
Components:
- Client: Sends
POST /articles/{id}/sharesfor writes andGET /trendingfor reads. - Share Service: Accepts requests, inserts share rows, and runs aggregate queries for the trending list.
- Database: Stores all share events. Top-K query uses
GROUP BY article_id ORDER BY count DESC LIMIT Kwith a time range filter.
Request walkthrough:
- Client sends
POST /articles/42/shares. - Share Service extracts
user_idfrom the auth token. - Service inserts
(article_id=42, user_id=891, shared_at=NOW())into theshare_eventstable. - Client receives
202 Accepted. - Client calls
GET /trending?window=1h. - Share Service runs
SELECT article_id, COUNT(*) FROM share_events WHERE shared_at > NOW() - INTERVAL '1 hour' GROUP BY article_id ORDER BY count DESC LIMIT 10against the database. - Client receives the top-10 list.
This is the complete naive system. It is correct at low traffic. I always start here in an interview because it takes 30 seconds to draw and immediately sets up the scaling conversation. It fails at scale in two specific ways.
Why the naive approach breaks
The GROUP BY + ORDER BY + LIMIT query scans every row in the last hour's window. At 100,000 share events per second, the share_events table accumulates 360 million rows per hour. Even with an index on shared_at, the aggregation scan takes seconds per query, which immediately violates the 50ms p99 read latency NFR.
The sliding window makes this worse. Because the answer changes every second as old events drop off, there is no safe way to cache the query result. An in-process cache would return stale counts and could not predict when a given article's count changed.
Adding a materialized view or a scheduled aggregation job helps with read latency but does not solve the sliding window problem. A materialized view refreshed every minute gives tumbling windows, not a true slide. An article that surged 59 minutes ago holds its score for the entire next minute before the view catches up. The fix requires bucket-level granularity.
2. Evolved write path: Kafka ingestion with Redis sorted set buckets
The key insight is to separate the write path from the aggregation path. Writes publish to Kafka instantly and return. A Stream Processor consumes from Kafka, maintaining per-minute sorted sets in Redis, while reads serve precomputed results rather than live aggregation.
New components:
- Share Ingestion Service: Accepts the share event and publishes to Kafka. Returns 202 immediately after Kafka acknowledgment. Stateless and horizontally scalable.
- Kafka: Durable ordered log of share events, partitioned by
article_idto ensure events for the same article are consumed in order. Decouples write rate from processing rate. - Stream Processor: Kafka consumer group. Reads share events and issues
ZINCRBY trending:{bucket_ts} 1 article:{id}on Redis. Bucket key design and hot-key mitigation are deferred to the deep dives. - Redis Cluster: Holds per-minute sorted sets keyed by bucket timestamp. Each bucket expires via TTL after it falls outside the longest configured window.
Request walkthrough (write path):
- Client sends
POST /articles/42/shares. - Share Ingestion Service publishes
{ article_id: 42, user_id: 891, ts: <now> }to Kafka topicshare_events. - Client receives
202 Acceptedin under 100ms. - Stream Processor reads the event from Kafka.
- Processor computes
bucket_ts = floor(event_ts / 60) * 60(rounds down to the current minute). - Processor issues
ZINCRBY trending:{bucket_ts} 1 article:42on Redis.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.