Design Twitter / X

What is Twitter / X?

Twitter is a social network where users post short messages and see a personalized feed from the accounts they follow. The interesting engineering challenge is not storing tweets; it is the fan-out problem. When a celebrity with 50 million followers posts a tweet, the system must update 50 million timelines nearly instantly while serving hundreds of millions of users refreshing their feeds. No single fan-out strategy works across the full follower distribution, making Twitter a rich test of trade-off thinking between write amplification and read amplification.

Functional Requirements

Core Requirements

Users can post a tweet (up to 280 characters of text).
Users can follow and unfollow other users.
Users can view their home timeline: reverse-chronological tweets from users they follow.
Users can view a profile timeline: all tweets posted by a specific user.

Below the Line (out of scope)

Like, retweet, and quote tweet interactions
Full-text search for tweets and users
Media attachments (images and videos)
Notifications and push alerts

The hardest part in scope: Generating the home timeline for 200M daily active users, where the fan-out ratio ranges from 1 (a new account followed by nobody) to 100M+ (a celebrity). No single strategy satisfies both ends of this distribution.

Likes and retweets are below the line because they do not change the core write or timeline delivery paths. To add them, I would store a tweet_likes table keyed by (tweet_id, user_id) and cache a like counter per tweet in Redis. Retweets would create a new tweet row with a retweet_of reference and follow the same fan-out path as an original tweet.

Search is below the line because it requires a separate indexing pipeline that does not interact with the timeline design. To add it, I would emit every new tweet to a Kafka topic and consume it into an Elasticsearch index. Full-text tweet search does not fit the key-value access patterns of the timeline service.

Media is below the line because it converts the write path into a two-phase upload without changing the fan-out logic. To add it, the client uploads directly to S3 via a pre-signed URL and includes the returned object key in the POST body. The tweet row stores the key; a CDN serves the bytes.

Notifications are below the line because they form a separate outbound delivery system that reads from tweet events but does not affect the read path. To add them, I would consume tweet creation events from Kafka and dispatch push notifications via APNs and FCM per follower.

Non-Functional Requirements

Core Requirements

Availability: 99.99% uptime. Availability over consistency for home timelines: a tweet visible to some followers before others is acceptable; a failed timeline load is not.
Latency: Home timeline loads under 300ms p99. Profile timeline loads under 200ms p99. Tweet creation completes under 500ms.
Scale: 500M registered users, 200M DAU. Each active user posts ~5 tweets per day on average.
Write throughput: ~11,600 tweet writes per second on average (200M × 5 / 86,400), peaking at ~35K per second during events.
Read throughput: ~46,000 home timeline reads per second on average (200M DAU × 20 refreshes/day / 86,400), peaking at ~140K per second.

Below the Line

Sub-50ms timeline latency via CDN edge caching
Real-time guarantee on notification delivery

Fan-out ratio: For every tweet posted by a user with 1,000 followers, 1,000 timeline cache entries need to be updated. With an average of ~200 follows per active user, the effective write amplification on the timeline cache peaks at approximately 11,600 × 200 = 2.3M cache writes per second. This number, not the raw tweet write rate, drives the infrastructure decisions in this article.

The 300ms latency target for home timelines rules out assembling the feed on the read path by querying the database across all followed accounts in real time. Pre-computation is required. The 99.99% availability target means a single Redis node is not acceptable for the timeline cache, and the primary tweet database cannot be in the read path for every timeline load.

Core Entities

Tweet: A 280-character message. Carries a tweet_id, user_id, text, and created_at. The schema also supports a nullable retweet_of reference for the retweet feature we've deferred.
User: An account with a profile and follower and following counts. The follower_count field drives the celebrity threshold check in the fan-out deep dive.
Follow: A directed relationship from a follower to a followee. The follow graph is the input to every home timeline generation and fan-out operation in the system.
Timeline (derived): A pre-computed ordered list of tweet IDs cached per user, not a stored entity. It is the most performance-critical data structure in the design.

The full schema, indexes, and partition keys are deferred to the data model deep dive. The four entities above are sufficient to drive the API design and the High-Level Design.

API Design

Post a tweet:

POST /tweets
Body: { text }
Response: { tweet_id, created_at }

Get home timeline:

GET /timelines/home
Query: { cursor?, limit? }
Response: { tweets: [...], next_cursor }

Get profile timeline:

GET /users/{user_id}/tweets
Query: { cursor?, limit? }
Response: { tweets: [...], next_cursor }

Follow a user:

POST /users/{user_id}/follows
Response: 201 Created

Unfollow a user:

DELETE /users/{user_id}/follows/{followee_id}
Response: 204 No Content

Cursor pagination: All timeline endpoints use cursor-based pagination, not offset-based. Offset pagination breaks when new tweets arrive between page loads: inserting one tweet at position 0 shifts every offset by 1, causing items to be skipped or duplicated across pages. A cursor encodes the last-seen tweet_id, and the next page begins strictly after that ID.

Authentication is not shown in the endpoint bodies but it is assumed to be present. In practice, an API gateway validates a session token and injects the viewer_id into every downstream request. The follow and post endpoints require authentication; the profile timeline endpoint is public.

High-Level Design

1. Users can post a tweet

The write path: client submits a tweet, the Tweet Service validates it, generates a tweet_id, and writes it to the database.

Components:

Client: Web or mobile interface sending POST /tweets.
Tweet Service: Validates that text is 280 characters or fewer, generates a tweet_id (black box for now, covered in the deep dives), and inserts the row.
Tweet DB: Stores the canonical tweet record. Indexed on user_id for profile timeline queries.

Request walkthrough:

Client sends POST /tweets with the text body.
Tweet Service validates the length constraint.
Tweet Service generates a tweet_id.
Tweet Service inserts { tweet_id, user_id, text, created_at } into the Tweet DB.
Tweet Service returns { tweet_id, created_at } to the client.

flowchart LR
  C(["👤 Client\nWeb / mobile app"])
  TS["⚙️ Tweet Service\nValidate text ≤ 280 chars\nGenerate tweet_id · INSERT"]
  TDB[("🗄️ Tweet DB\ntweet_id, user_id, text, created_at\nIndex on (user_id, tweet_id)")]

  C -->|"POST /tweets · text"| TS
  TS -->|"INSERT tweet row"| TDB
  TS -->|"Returns { tweet_id, created_at }"| C

The write path only. Fan-out to follower timelines is deferred to requirement 3, once the follow graph exists.

2. Users can view a profile timeline

The profile timeline is ordered tweets from a single user. A database index on (user_id, tweet_id) is all that is required. I treat this as the simple read case before addressing the harder home timeline in requirement 4.

Components:

Timeline Service: Handles all read requests. Queries the Tweet DB for profile timelines.
Tweet DB (updated): The index on (user_id, tweet_id) makes profile timeline queries fast. Because tweet_id encodes the timestamp (covered in deep dive 2), this index also sorts by time.

Request walkthrough:

Client sends GET /users/{user_id}/tweets?limit=20.
Timeline Service queries the Tweet DB: SELECT * FROM tweets WHERE user_id = ? ORDER BY tweet_id DESC LIMIT 20.
Timeline Service returns the tweet list with a cursor pointing to the last tweet_id.

flowchart LR
  C(["👤 Client\nWeb / mobile app"])
  TLS["⚙️ Timeline Service\nQuery tweets by user_id\nCursor-based pagination"]
  TDB[("🗄️ Tweet DB\nIndex on (user_id, tweet_id)\nPage by tweet_id for cursor pagination")]

  C -->|"GET /users/{user_id}/tweets"| TLS
  TLS -->|"SELECT WHERE user_id = ? ORDER BY tweet_id DESC"| TDB
  TDB -->|"20 tweets + next cursor"| TLS
  TLS -->|"{ tweets, next_cursor }"| C

Profile timeline is a single-account read. Home timeline is more complex because it requires aggregating tweets across many accounts.

3. Users can follow and unfollow other users

The follow graph drives every home timeline. It answers two questions: "who do I follow?" (for reading my home timeline) and "who follows me?" (for fan-out when I post). Both access patterns need to be fast.

Components:

Follow Service: Handles POST and DELETE on follow relationships. Updates both the forward and reverse indices.
Follow Store: Stores the follow graph as two adjacency lists: follower_id → [followee_ids] and followee_id → [follower_ids]. Both directions are required.

Request walkthrough:

Client sends POST /users/{followee_id}/follows.
Follow Service writes (follower_id, followee_id) to the Follow Store in both the forward and reverse direction.
Follow Service returns 201 Created.

flowchart LR
  C(["👤 Client\nWeb / mobile app"])
  TS["⚙️ Tweet Service\nValidate · generate tweet_id · INSERT"]
  TDB[("🗄️ Tweet DB\ntweet_id, user_id, text, created_at\nIndex on (user_id, tweet_id)")]
  FS["⚙️ Follow Service\nPOST creates follow row\nDELETE removes follow row\nMaintains forward + reverse index"]
  FDB[("🗄️ Follow Store\nfollower_id → [followee_ids]\nfollowee_id → [follower_ids]\nBoth directions required")]

  C -->|"POST /tweets"| TS
  TS -->|"INSERT tweet"| TDB
  C -->|"POST /follows · DELETE /follows"| FS
  FS -->|"Write both adjacency directions"| FDB

Maintaining both adjacency directions in the Follow Store doubles write cost on follow and unfollow but makes every read O(1) per user. The alternative, computing one direction from the other on the fly, is a full table scan. At 100B follow edges in the graph, that is not viable.

4. Users can view a home timeline

This is the hard requirement. A user's home timeline is the merged, reverse-chronological feed of tweets from every account they follow. Assembling this at read time for a user following 500 people against a live database would mean 500 queries per request. At 140K timeline requests per second, that is 70M database queries per second. We need a pre-computed feed.

Components:

Fan-out Worker: An async worker that consumes new tweet events and pushes tweet_ids into each follower's timeline cache.
Kafka: A durable message queue decoupling tweet writes from fan-out. The Tweet Service publishes a NewTweetEvent on every write. The Fan-out Worker consumes it.
Redis Timeline Cache (new): Stores per-user sorted sets. Key: home_timeline:{user_id}. Score: tweet creation timestamp. Value: tweet_id. Capped at 800 entries per user.
Timeline Service (updated): On a home timeline read, fetches tweet_ids from Redis and hydrates them into full tweet objects via the Tweet DB.

Request walkthrough (write path):

Client sends POST /tweets.
Tweet Service inserts into Tweet DB and publishes NewTweetEvent { tweet_id, author_id } to Kafka.
Fan-out Worker reads the event, fetches the author's follower list from Follow Store.
Fan-out Worker calls ZADD home_timeline:{follower_id} {timestamp} {tweet_id} for each follower.
Fan-out Worker trims each list to 800 entries.

Request walkthrough (read path):

Client sends GET /timelines/home.
Timeline Service calls ZREVRANGE home_timeline:{user_id} 0 19 on Redis.
Timeline Service batch-fetches full tweet objects for the returned tweet_ids from Tweet DB (or a tweet cache).
Timeline Service returns the assembled tweet list.

flowchart LR
  C(["👤 Client\nWeb / mobile app"])
  TS["⚙️ Tweet Service\nINSERT tweet · Publish NewTweetEvent"]
  TDB[("🗄️ Tweet DB\nSource of truth for tweet content\nIndex on (user_id, tweet_id)")]
  MQ["📨 Kafka\nNewTweetEvent queue\nDurable · decouples write from fan-out\nAt-least-once delivery"]
  FW["⚙️ Fan-out Worker\nFetch follower list \nZADD to each timeline\nTrim to 800 entries per user"]
  FDB[("🗄️ Follow Store\nget_followers(author_id)\nfollowee_id → [follower_ids]")]
  RC["⚡ Redis Timeline Cache\nhome_timeline:{user_id}\n(sorted sets)\nScore = timestamp \nValue = tweet_id\n800 entries per user\npre-computed feed"]
  TLS["⚙️ Timeline Service\nZREVRANGE from Redis\nHydrate tweet_ids → full tweet objects"]

  C -->|"POST /tweets"| TS
  TS -->|"INSERT tweet"| TDB
  TS -->|"Publish NewTweetEvent"| MQ
  MQ -->|"Consume event"| FW
  FW -->|"get_followers(author_id)"| FDB
  FDB -->|"List of follower_ids"| FW
  FW -->|"ZADD home_timeline:{follower_id}"| RC
  C -->|"GET /timelines/home"| TLS
  TLS -->|"ZREVRANGE home_timeline:{user_id}"| RC
  TLS -->|"Batch fetch tweet content"| TDB
  TLS -->|"{ tweets, next_cursor }"| C

This is the High-Level Design: tweets write through Kafka to pre-computed Redis timelines; home timeline reads serve entirely from Redis. The fan-out worker is the component that collapses under celebrity-scale writes, which we address in deep dive 1.

I am treating the fan-out worker as a simple loop over all followers here. A user with 50 million followers makes this loop catastrophically slow. The deep dive on fan-out strategy addresses exactly this: the naive fan-out on write breaks for celebrities and requires a hybrid approach.

Potential Deep Dives

1. How do we generate home timelines at scale?

Three constraints define this problem:

Home timeline reads must complete in under 300ms p99.
A celebrity tweet must not stall the fan-out pipeline for all other users.
The fan-out write rate must stay manageable: our average is 2.3M timeline cache writes per second across all accounts.

2. How do we generate unique, time-sortable tweet IDs?

Three constraints drive the design:

Tweet IDs must be globally unique across all servers and regions with no central coordination.
Tweet IDs should sort chronologically so that ORDER BY tweet_id DESC gives the timeline order.
Generation must be fast enough not to add latency to the tweet write path.

3. How do we hydrate tweet content at read time?

Context: When the Timeline Service retrieves a home timeline from Redis, it gets a list of up to 20 tweet_ids. It must fetch the full tweet content (text, author display name, like count) for each. At 140K timeline reads per second with 20 tweet_ids each, the service needs to handle approximately 2.8 million tweet-content fetches per second. A direct primary database read for each is not viable.

4. How do we store and query the follow graph at scale?

Context: The follow graph is enormous. At 500M users with an average of 200 follows per active user, the graph has ~100 billion edges. The fan-out worker reads the reverse direction (followers of author X) on every tweet write. The Timeline Service reads the forward direction (celebrities user Y follows) on every home timeline load. Both reads must complete in milliseconds.

5. How do we model and scale the tweet table?

Context: Core Entities identified four fields for a tweet (tweet_id, user_id, text, created_at). The two dominant access patterns are very different: profile timeline reads filter by user_id and sort by tweet_id; tweet content hydration looks up by tweet_id directly. At 1B tweets per day, the table grows by roughly 300GB per day of raw text data alone. After one year, that is over 100TB. How you physically store and shard this table determines whether both access patterns stay fast at that scale.

Final Architecture

flowchart LR
  subgraph Clients["👤 Clients"]
    C(["👤 User\nWeb / Mobile"])
  end

  subgraph Gateway["🔀 Gateway"]
    AG["🔀 API Gateway\nAuth · Rate limit · Routing"]
  end

  subgraph AppTier["⚙️ App Services"]
    TS["⚙️ Tweet Service\nINSERT · Publish to Kafka"]
    TLS["⚙️ Timeline Service\nTimeline reads · Hydration"]
    FSvc["⚙️ Follow Service\nDual-write · Cache evict"]
  end

  subgraph AsyncTier["📨 Async Pipeline"]
    MQ["📨 Kafka\nNewTweetEvent · At-least-once"]
    FW["⚙️ Fan-out Worker\nConsumes Kafka · ZADD per follower"]
  end

  subgraph CacheTier["⚡ Cache Tier"]
    RTC["⚡ Timeline Cache\nhome_timeline:{uid} · sorted set"]
    RCC["⚡ Tweet Cache\ntweet:{id} · 24h TTL"]
    RFC["⚡ Follow Cache\nfollowers:{id} · 1h TTL"]
  end

  subgraph DBTier["🗄️ Storage Tier"]
    TDB[("🟢 Tweet DB Primary\nWrites · Snowflake IDs")]
    TRR[("🔵 Tweet DB Replica\nRead fallback · ~10-50ms")]
    CASS[("🗄️ Cassandra\nFollow graph · wide rows")]
  end

  C -->|"POST /tweets · POST /follows"| AG
  C -->|"GET /timelines"| AG
  AG -->|"Writes"| TS
  AG -->|"Reads"| TLS
  AG -->|"Follow ops"| FSvc
  TS -->|"INSERT tweet"| TDB
  TS -->|"Publish event"| MQ
  MQ -->|"Consume"| FW
  FW -->|"Follower lookup"| RFC
  RFC -.->|"Cache miss"| CASS
  FW -->|"ZADD home_timeline:{id}"| RTC
  FSvc -->|"Dual-write"| CASS
  FSvc -->|"DEL followers:{id}"| RFC
  TLS -->|"ZREVRANGE"| RTC
  TLS -->|"MGET tweet:{id}"| RCC
  TLS -->|"Celebrity fetch"| TDB
  RCC -.->|"Cache miss"| TRR
  TDB -.->|"Async replication"| TRR

The read/write split into Tweet Service and Timeline Service lets each scale independently. Redis absorbs the vast majority of both timeline and tweet-content reads. Kafka decouples the tweet write path from the fan-out pipeline so a celebrity post cannot block other users' tweet delivery.

Interview Cheat Sheet

State the fan-out problem in your first breath: when someone with 50 million followers posts a tweet, the system must update 50 million timelines. Everything downstream is an answer to this one constraint.
Fan-out on read is too slow at scale: a user following 500 accounts triggers 500 DB queries per timeline load, and latency grows linearly with follow count.
Pure fan-out on write breaks for celebrities: a single tweet creates 50 million Redis writes and stalls the fan-out pipeline for all other users queued behind it.
The hybrid strategy splits at a follower threshold (Twitter reportedly used ~150K): write fan-out for normal users, live read fan-out for celebrity tweets at timeline load time.
Store pre-computed home timelines as Redis sorted sets: key is home_timeline:{user_id}, score is creation timestamp, value is tweet_id. Cap each list at 800 entries.
At 200M DAU storing 800 tweet_ids per timeline at 8 bytes, the full timeline cache totals roughly 1.2TB. Plan for Redis Cluster from the start.
Use Snowflake IDs for tweets: 64-bit integers encoding 41 bits of timestamp, 10 bits of machine ID, 12 bits of sequence counter per millisecond.
Snowflake IDs are time-sortable, so ORDER BY tweet_id DESC replaces ORDER BY created_at DESC. No secondary timestamp index is needed for chronological timeline queries.
Use cursor-based pagination for all timeline endpoints. Offset pagination breaks when new tweets arrive between page loads.
Cache tweet content (full tweet objects) in Redis keyed by tweet_id with a 24-hour TTL. The Timeline Service hits the Redis tweet cache with a batch MGET before falling back to a read replica, never the primary.
The primary tweet database handles writes only. Read replicas absorb all tweet content hydration on cache miss. Keep the primary out of the read path entirely.
Cassandra is a natural fit for the follow graph: partition by followee_id maps directly to an adjacency list lookup, and wide-row reads return an entire follower list in one operation.
Maintain both directions of the follow graph (follows_by_follower and follows_by_followee) in Cassandra. The fan-out worker uses the reverse index; the timeline service uses the forward index for celebrity lookups.
Skip fan-out for users inactive for 30+ days. Check a Redis key set on login with a 30-day TTL. Reconstruct their timeline from the follow graph and tweet DB on their next login.
Fan-out workers must be idempotent: a duplicate ZADD with an already-present member is a no-op in a Redis sorted set. Kafka at-least-once delivery is safe.

What is Twitter / X?

Functional Requirements

Core Requirements

Users can post a tweet (up to 280 characters of text).
Users can follow and unfollow other users.
Users can view their home timeline: reverse-chronological tweets from users they follow.
Users can view a profile timeline: all tweets posted by a specific user.

Below the Line (out of scope)

Like, retweet, and quote tweet interactions
Full-text search for tweets and users
Media attachments (images and videos)
Notifications and push alerts

The hardest part in scope: Generating the home timeline for 200M daily active users, where the fan-out ratio ranges from 1 (a new account followed by nobody) to 100M+ (a celebrity). No single strategy satisfies both ends of this distribution.

Non-Functional Requirements

Core Requirements

Availability: 99.99% uptime. Availability over consistency for home timelines: a tweet visible to some followers before others is acceptable; a failed timeline load is not.
Latency: Home timeline loads under 300ms p99. Profile timeline loads under 200ms p99. Tweet creation completes under 500ms.
Scale: 500M registered users, 200M DAU. Each active user posts ~5 tweets per day on average.
Write throughput: ~11,600 tweet writes per second on average (200M × 5 / 86,400), peaking at ~35K per second during events.
Read throughput: ~46,000 home timeline reads per second on average (200M DAU × 20 refreshes/day / 86,400), peaking at ~140K per second.

Below the Line

Sub-50ms timeline latency via CDN edge caching
Real-time guarantee on notification delivery

Fan-out ratio: For every tweet posted by a user with 1,000 followers, 1,000 timeline cache entries need to be updated. With an average of ~200 follows per active user, the effective write amplification on the timeline cache peaks at approximately 11,600 × 200 = 2.3M cache writes per second. This number, not the raw tweet write rate, drives the infrastructure decisions in this article.

Core Entities

Tweet: A 280-character message. Carries a tweet_id, user_id, text, and created_at. The schema also supports a nullable retweet_of reference for the retweet feature we've deferred.
User: An account with a profile and follower and following counts. The follower_count field drives the celebrity threshold check in the fan-out deep dive.
Follow: A directed relationship from a follower to a followee. The follow graph is the input to every home timeline generation and fan-out operation in the system.
Timeline (derived): A pre-computed ordered list of tweet IDs cached per user, not a stored entity. It is the most performance-critical data structure in the design.

The full schema, indexes, and partition keys are deferred to the data model deep dive. The four entities above are sufficient to drive the API design and the High-Level Design.

API Design

Post a tweet:

POST /tweets
Body: { text }
Response: { tweet_id, created_at }

Get home timeline:

GET /timelines/home
Query: { cursor?, limit? }
Response: { tweets: [...], next_cursor }

Get profile timeline:

GET /users/{user_id}/tweets
Query: { cursor?, limit? }
Response: { tweets: [...], next_cursor }

Follow a user:

POST /users/{user_id}/follows
Response: 201 Created

Unfollow a user:

DELETE /users/{user_id}/follows/{followee_id}
Response: 204 No Content

Cursor pagination: All timeline endpoints use cursor-based pagination, not offset-based. Offset pagination breaks when new tweets arrive between page loads: inserting one tweet at position 0 shifts every offset by 1, causing items to be skipped or duplicated across pages. A cursor encodes the last-seen tweet_id, and the next page begins strictly after that ID.

High-Level Design

1. Users can post a tweet

The write path: client submits a tweet, the Tweet Service validates it, generates a tweet_id, and writes it to the database.

Components:

Client: Web or mobile interface sending POST /tweets.
Tweet Service: Validates that text is 280 characters or fewer, generates a tweet_id (black box for now, covered in the deep dives), and inserts the row.
Tweet DB: Stores the canonical tweet record. Indexed on user_id for profile timeline queries.

Request walkthrough:

Client sends POST /tweets with the text body.
Tweet Service validates the length constraint.
Tweet Service generates a tweet_id.
Tweet Service inserts { tweet_id, user_id, text, created_at } into the Tweet DB.
Tweet Service returns { tweet_id, created_at } to the client.

flowchart LR
  C(["👤 Client\nWeb / mobile app"])
  TS["⚙️ Tweet Service\nValidate text ≤ 280 chars\nGenerate tweet_id · INSERT"]
  TDB[("🗄️ Tweet DB\ntweet_id, user_id, text, created_at\nIndex on (user_id, tweet_id)")]

  C -->|"POST /tweets · text"| TS
  TS -->|"INSERT tweet row"| TDB
  TS -->|"Returns { tweet_id, created_at }"| C

The write path only. Fan-out to follower timelines is deferred to requirement 3, once the follow graph exists.

2. Users can view a profile timeline

Components:

Timeline Service: Handles all read requests. Queries the Tweet DB for profile timelines.
Tweet DB (updated): The index on (user_id, tweet_id) makes profile timeline queries fast. Because tweet_id encodes the timestamp (covered in deep dive 2), this index also sorts by time.

Request walkthrough:

Client sends GET /users/{user_id}/tweets?limit=20.
Timeline Service queries the Tweet DB: SELECT * FROM tweets WHERE user_id = ? ORDER BY tweet_id DESC LIMIT 20.
Timeline Service returns the tweet list with a cursor pointing to the last tweet_id.

flowchart LR
  C(["👤 Client\nWeb / mobile app"])
  TLS["⚙️ Timeline Service\nQuery tweets by user_id\nCursor-based pagination"]
  TDB[("🗄️ Tweet DB\nIndex on (user_id, tweet_id)\nPage by tweet_id for cursor pagination")]

  C -->|"GET /users/{user_id}/tweets"| TLS
  TLS -->|"SELECT WHERE user_id = ? ORDER BY tweet_id DESC"| TDB
  TDB -->|"20 tweets + next cursor"| TLS
  TLS -->|"{ tweets, next_cursor }"| C

Profile timeline is a single-account read. Home timeline is more complex because it requires aggregating tweets across many accounts.

3. Users can follow and unfollow other users

Components:

Follow Service: Handles POST and DELETE on follow relationships. Updates both the forward and reverse indices.
Follow Store: Stores the follow graph as two adjacency lists: follower_id → [followee_ids] and followee_id → [follower_ids]. Both directions are required.

Request walkthrough:

Client sends POST /users/{followee_id}/follows.
Follow Service writes (follower_id, followee_id) to the Follow Store in both the forward and reverse direction.
Follow Service returns 201 Created.

flowchart LR
  C(["👤 Client\nWeb / mobile app"])
  TS["⚙️ Tweet Service\nValidate · generate tweet_id · INSERT"]
  TDB[("🗄️ Tweet DB\ntweet_id, user_id, text, created_at\nIndex on (user_id, tweet_id)")]
  FS["⚙️ Follow Service\nPOST creates follow row\nDELETE removes follow row\nMaintains forward + reverse index"]
  FDB[("🗄️ Follow Store\nfollower_id → [followee_ids]\nfollowee_id → [follower_ids]\nBoth directions required")]

  C -->|"POST /tweets"| TS
  TS -->|"INSERT tweet"| TDB
  C -->|"POST /follows · DELETE /follows"| FS
  FS -->|"Write both adjacency directions"| FDB

4. Users can view a home timeline

Components:

Fan-out Worker: An async worker that consumes new tweet events and pushes tweet_ids into each follower's timeline cache.
Kafka: A durable message queue decoupling tweet writes from fan-out. The Tweet Service publishes a NewTweetEvent on every write. The Fan-out Worker consumes it.
Redis Timeline Cache (new): Stores per-user sorted sets. Key: home_timeline:{user_id}. Score: tweet creation timestamp. Value: tweet_id. Capped at 800 entries per user.
Timeline Service (updated): On a home timeline read, fetches tweet_ids from Redis and hydrates them into full tweet objects via the Tweet DB.

Request walkthrough (write path):

Client sends POST /tweets.
Tweet Service inserts into Tweet DB and publishes NewTweetEvent { tweet_id, author_id } to Kafka.
Fan-out Worker reads the event, fetches the author's follower list from Follow Store.
Fan-out Worker calls ZADD home_timeline:{follower_id} {timestamp} {tweet_id} for each follower.
Fan-out Worker trims each list to 800 entries.

Request walkthrough (read path):

Client sends GET /timelines/home.
Timeline Service calls ZREVRANGE home_timeline:{user_id} 0 19 on Redis.
Timeline Service batch-fetches full tweet objects for the returned tweet_ids from Tweet DB (or a tweet cache).
Timeline Service returns the assembled tweet list.

flowchart LR
  C(["👤 Client\nWeb / mobile app"])
  TS["⚙️ Tweet Service\nINSERT tweet · Publish NewTweetEvent"]
  TDB[("🗄️ Tweet DB\nSource of truth for tweet content\nIndex on (user_id, tweet_id)")]
  MQ["📨 Kafka\nNewTweetEvent queue\nDurable · decouples write from fan-out\nAt-least-once delivery"]
  FW["⚙️ Fan-out Worker\nFetch follower list \nZADD to each timeline\nTrim to 800 entries per user"]
  FDB[("🗄️ Follow Store\nget_followers(author_id)\nfollowee_id → [follower_ids]")]
  RC["⚡ Redis Timeline Cache\nhome_timeline:{user_id}\n(sorted sets)\nScore = timestamp \nValue = tweet_id\n800 entries per user\npre-computed feed"]
  TLS["⚙️ Timeline Service\nZREVRANGE from Redis\nHydrate tweet_ids → full tweet objects"]

  C -->|"POST /tweets"| TS
  TS -->|"INSERT tweet"| TDB
  TS -->|"Publish NewTweetEvent"| MQ
  MQ -->|"Consume event"| FW
  FW -->|"get_followers(author_id)"| FDB
  FDB -->|"List of follower_ids"| FW
  FW -->|"ZADD home_timeline:{follower_id}"| RC
  C -->|"GET /timelines/home"| TLS
  TLS -->|"ZREVRANGE home_timeline:{user_id}"| RC
  TLS -->|"Batch fetch tweet content"| TDB
  TLS -->|"{ tweets, next_cursor }"| C

Potential Deep Dives

1. How do we generate home timelines at scale?

Three constraints define this problem:

Home timeline reads must complete in under 300ms p99.
A celebrity tweet must not stall the fan-out pipeline for all other users.
The fan-out write rate must stay manageable: our average is 2.3M timeline cache writes per second across all accounts.

2. How do we generate unique, time-sortable tweet IDs?

Three constraints drive the design:

Tweet IDs must be globally unique across all servers and regions with no central coordination.
Tweet IDs should sort chronologically so that ORDER BY tweet_id DESC gives the timeline order.
Generation must be fast enough not to add latency to the tweet write path.

flowchart LR
  subgraph Clients["👤 Clients"]
    C(["👤 User\nWeb / Mobile"])
  end

  subgraph Gateway["🔀 Gateway"]
    AG["🔀 API Gateway\nAuth · Rate limit · Routing"]
  end

  subgraph AppTier["⚙️ App Services"]
    TS["⚙️ Tweet Service\nINSERT · Publish to Kafka"]
    TLS["⚙️ Timeline Service\nTimeline reads · Hydration"]
    FSvc["⚙️ Follow Service\nDual-write · Cache evict"]
  end

  subgraph AsyncTier["📨 Async Pipeline"]
    MQ["📨 Kafka\nNewTweetEvent · At-least-once"]
    FW["⚙️ Fan-out Worker\nConsumes Kafka · ZADD per follower"]
  end

  subgraph CacheTier["⚡ Cache Tier"]
    RTC["⚡ Timeline Cache\nhome_timeline:{uid} · sorted set"]
    RCC["⚡ Tweet Cache\ntweet:{id} · 24h TTL"]
    RFC["⚡ Follow Cache\nfollowers:{id} · 1h TTL"]
  end

  subgraph DBTier["🗄️ Storage Tier"]
    TDB[("🟢 Tweet DB Primary\nWrites · Snowflake IDs")]
    TRR[("🔵 Tweet DB Replica\nRead fallback · ~10-50ms")]
    CASS[("🗄️ Cassandra\nFollow graph · wide rows")]
  end

  C -->|"POST /tweets · POST /follows"| AG
  C -->|"GET /timelines"| AG
  AG -->|"Writes"| TS
  AG -->|"Reads"| TLS
  AG -->|"Follow ops"| FSvc
  TS -->|"INSERT tweet"| TDB
  TS -->|"Publish event"| MQ
  MQ -->|"Consume"| FW
  FW -->|"Follower lookup"| RFC
  RFC -.->|"Cache miss"| CASS
  FW -->|"ZADD home_timeline:{id}"| RTC
  FSvc -->|"Dual-write"| CASS
  FSvc -->|"DEL followers:{id}"| RFC
  TLS -->|"ZREVRANGE"| RTC
  TLS -->|"MGET tweet:{id}"| RCC
  TLS -->|"Celebrity fetch"| TDB
  RCC -.->|"Cache miss"| TRR
  TDB -.->|"Async replication"| TRR

Interview Cheat Sheet

State the fan-out problem in your first breath: when someone with 50 million followers posts a tweet, the system must update 50 million timelines. Everything downstream is an answer to this one constraint.
Fan-out on read is too slow at scale: a user following 500 accounts triggers 500 DB queries per timeline load, and latency grows linearly with follow count.
Pure fan-out on write breaks for celebrities: a single tweet creates 50 million Redis writes and stalls the fan-out pipeline for all other users queued behind it.
The hybrid strategy splits at a follower threshold (Twitter reportedly used ~150K): write fan-out for normal users, live read fan-out for celebrity tweets at timeline load time.
Store pre-computed home timelines as Redis sorted sets: key is home_timeline:{user_id}, score is creation timestamp, value is tweet_id. Cap each list at 800 entries.
At 200M DAU storing 800 tweet_ids per timeline at 8 bytes, the full timeline cache totals roughly 1.2TB. Plan for Redis Cluster from the start.
Use Snowflake IDs for tweets: 64-bit integers encoding 41 bits of timestamp, 10 bits of machine ID, 12 bits of sequence counter per millisecond.
Snowflake IDs are time-sortable, so ORDER BY tweet_id DESC replaces ORDER BY created_at DESC. No secondary timestamp index is needed for chronological timeline queries.
Use cursor-based pagination for all timeline endpoints. Offset pagination breaks when new tweets arrive between page loads.
Cache tweet content (full tweet objects) in Redis keyed by tweet_id with a 24-hour TTL. The Timeline Service hits the Redis tweet cache with a batch MGET before falling back to a read replica, never the primary.
The primary tweet database handles writes only. Read replicas absorb all tweet content hydration on cache miss. Keep the primary out of the read path entirely.
Cassandra is a natural fit for the follow graph: partition by followee_id maps directly to an adjacency list lookup, and wide-row reads return an entire follower list in one operation.
Maintain both directions of the follow graph (follows_by_follower and follows_by_followee) in Cassandra. The fan-out worker uses the reverse index; the timeline service uses the forward index for celebrity lookups.
Skip fan-out for users inactive for 30+ days. Check a Redis key set on login with a 30-day TTL. Reconstruct their timeline from the follow graph and tweet DB on their next login.
Fan-out workers must be idempotent: a duplicate ZADD with an already-present member is a no-op in a Redis sorted set. Kafka at-least-once delivery is safe.

Comments

Comments