Design Twitter / X
Walk through a complete Twitter design, from a bare-bones tweet service to a hybrid fan-out architecture serving home timelines to 200M DAU in under 300ms.
What is Twitter / X?
Twitter is a social network where users post short messages and see a personalized feed from the accounts they follow. The interesting engineering challenge is not storing tweets; it is the fan-out problem. When a celebrity with 50 million followers posts a tweet, the system must update 50 million timelines nearly instantly while serving hundreds of millions of users refreshing their feeds. No single fan-out strategy works across the full follower distribution, making Twitter a rich test of trade-off thinking between write amplification and read amplification.
Functional Requirements
Core Requirements
- Users can post a tweet (up to 280 characters of text).
- Users can follow and unfollow other users.
- Users can view their home timeline: reverse-chronological tweets from users they follow.
- Users can view a profile timeline: all tweets posted by a specific user.
Below the Line (out of scope)
- Like, retweet, and quote tweet interactions
- Full-text search for tweets and users
- Media attachments (images and videos)
- Notifications and push alerts
The hardest part in scope: Generating the home timeline for 200M daily active users, where the fan-out ratio ranges from 1 (a new account followed by nobody) to 100M+ (a celebrity). No single strategy satisfies both ends of this distribution.
Likes and retweets are below the line because they do not change the core write or timeline delivery paths. To add them, I would store a tweet_likes table keyed by (tweet_id, user_id) and cache a like counter per tweet in Redis. Retweets would create a new tweet row with a retweet_of reference and follow the same fan-out path as an original tweet.
Search is below the line because it requires a separate indexing pipeline that does not interact with the timeline design. To add it, I would emit every new tweet to a Kafka topic and consume it into an Elasticsearch index. Full-text tweet search does not fit the key-value access patterns of the timeline service.
Media is below the line because it converts the write path into a two-phase upload without changing the fan-out logic. To add it, the client uploads directly to S3 via a pre-signed URL and includes the returned object key in the POST body. The tweet row stores the key; a CDN serves the bytes.
Notifications are below the line because they form a separate outbound delivery system that reads from tweet events but does not affect the read path. To add them, I would consume tweet creation events from Kafka and dispatch push notifications via APNs and FCM per follower.
Non-Functional Requirements
Core Requirements
- Availability: 99.99% uptime. Availability over consistency for home timelines: a tweet visible to some followers before others is acceptable; a failed timeline load is not.
- Latency: Home timeline loads under 300ms p99. Profile timeline loads under 200ms p99. Tweet creation completes under 500ms.
- Scale: 500M registered users, 200M DAU. Each active user posts ~5 tweets per day on average.
- Write throughput: ~11,600 tweet writes per second on average (200M Γ 5 / 86,400), peaking at ~35K per second during events.
- Read throughput: ~46,000 home timeline reads per second on average (200M DAU Γ 20 refreshes/day / 86,400), peaking at ~140K per second.
Below the Line
- Sub-50ms timeline latency via CDN edge caching
- Real-time guarantee on notification delivery
Fan-out ratio: For every tweet posted by a user with 1,000 followers, 1,000 timeline cache entries need to be updated. With an average of ~200 follows per active user, the effective write amplification on the timeline cache peaks at approximately 11,600 Γ 200 = 2.3M cache writes per second. This number, not the raw tweet write rate, drives the infrastructure decisions in this article.
The 300ms latency target for home timelines rules out assembling the feed on the read path by querying the database across all followed accounts in real time. Pre-computation is required. The 99.99% availability target means a single Redis node is not acceptable for the timeline cache, and the primary tweet database cannot be in the read path for every timeline load.
Core Entities
- Tweet: A 280-character message. Carries a
tweet_id,user_id,text, andcreated_at. The schema also supports a nullableretweet_ofreference for the retweet feature we've deferred. - User: An account with a profile and follower and following counts. The
follower_countfield drives the celebrity threshold check in the fan-out deep dive. - Follow: A directed relationship from a follower to a followee. The follow graph is the input to every home timeline generation and fan-out operation in the system.
- Timeline (derived): A pre-computed ordered list of tweet IDs cached per user, not a stored entity. It is the most performance-critical data structure in the design.
The full schema, indexes, and partition keys are deferred to the data model deep dive. The four entities above are sufficient to drive the API design and the High-Level Design.
API Design
Post a tweet:
POST /tweets
Body: { text }
Response: { tweet_id, created_at }
Get home timeline:
GET /timelines/home
Query: { cursor?, limit? }
Response: { tweets: [...], next_cursor }
Get profile timeline:
GET /users/{user_id}/tweets
Query: { cursor?, limit? }
Response: { tweets: [...], next_cursor }
Follow a user:
POST /users/{user_id}/follows
Response: 201 Created
Unfollow a user:
DELETE /users/{user_id}/follows/{followee_id}
Response: 204 No Content
Cursor pagination: All timeline endpoints use cursor-based pagination, not offset-based. Offset pagination breaks when new tweets arrive between page loads: inserting one tweet at position 0 shifts every offset by 1, causing items to be skipped or duplicated across pages. A cursor encodes the last-seen tweet_id, and the next page begins strictly after that ID.
Authentication is not shown in the endpoint bodies but it is assumed to be present. In practice, an API gateway validates a session token and injects the viewer_id into every downstream request. The follow and post endpoints require authentication; the profile timeline endpoint is public.
High-Level Design
1. Users can post a tweet
The write path: client submits a tweet, the Tweet Service validates it, generates a tweet_id, and writes it to the database.
Components:
- Client: Web or mobile interface sending
POST /tweets. - Tweet Service: Validates that text is 280 characters or fewer, generates a tweet_id (black box for now, covered in the deep dives), and inserts the row.
- Tweet DB: Stores the canonical tweet record. Indexed on
user_idfor profile timeline queries.
Request walkthrough:
- Client sends
POST /tweetswith the text body. - Tweet Service validates the length constraint.
- Tweet Service generates a tweet_id.
- Tweet Service inserts
{ tweet_id, user_id, text, created_at }into the Tweet DB. - Tweet Service returns
{ tweet_id, created_at }to the client.
flowchart LR
C(["π€ Client\nWeb / mobile app"])
TS["βοΈ Tweet Service\nValidate text β€ 280 chars\nGenerate tweet_id Β· INSERT"]
TDB[("ποΈ Tweet DB\ntweet_id, user_id, text, created_at\nIndex on (user_id, tweet_id)")]
C -->|"POST /tweets Β· text"| TS
TS -->|"INSERT tweet row"| TDB
TS -->|"Returns { tweet_id, created_at }"| C
The write path only. Fan-out to follower timelines is deferred to requirement 3, once the follow graph exists.
2. Users can view a profile timeline
The profile timeline is ordered tweets from a single user. A database index on (user_id, tweet_id) is all that is required. I treat this as the simple read case before addressing the harder home timeline in requirement 4.
Components:
- Timeline Service: Handles all read requests. Queries the Tweet DB for profile timelines.
- Tweet DB (updated): The index on
(user_id, tweet_id)makes profile timeline queries fast. Because tweet_id encodes the timestamp (covered in deep dive 2), this index also sorts by time.
Request walkthrough:
- Client sends
GET /users/{user_id}/tweets?limit=20. - Timeline Service queries the Tweet DB:
SELECT * FROM tweets WHERE user_id = ? ORDER BY tweet_id DESC LIMIT 20. - Timeline Service returns the tweet list with a cursor pointing to the last tweet_id.
flowchart LR
C(["π€ Client\nWeb / mobile app"])
TLS["βοΈ Timeline Service\nQuery tweets by user_id\nCursor-based pagination"]
TDB[("ποΈ Tweet DB\nIndex on (user_id, tweet_id)\nPage by tweet_id for cursor pagination")]
C -->|"GET /users/{user_id}/tweets"| TLS
TLS -->|"SELECT WHERE user_id = ? ORDER BY tweet_id DESC"| TDB
TDB -->|"20 tweets + next cursor"| TLS
TLS -->|"{ tweets, next_cursor }"| C
Profile timeline is a single-account read. Home timeline is more complex because it requires aggregating tweets across many accounts.
3. Users can follow and unfollow other users
The follow graph drives every home timeline. It answers two questions: "who do I follow?" (for reading my home timeline) and "who follows me?" (for fan-out when I post). Both access patterns need to be fast.
Components:
- Follow Service: Handles
POSTandDELETEon follow relationships. Updates both the forward and reverse indices. - Follow Store: Stores the follow graph as two adjacency lists:
follower_id β [followee_ids]andfollowee_id β [follower_ids]. Both directions are required.
Request walkthrough:
- Client sends
POST /users/{followee_id}/follows. - Follow Service writes
(follower_id, followee_id)to the Follow Store in both the forward and reverse direction. - Follow Service returns 201 Created.
flowchart LR
C(["π€ Client\nWeb / mobile app"])
TS["βοΈ Tweet Service\nValidate Β· generate tweet_id Β· INSERT"]
TDB[("ποΈ Tweet DB\ntweet_id, user_id, text, created_at\nIndex on (user_id, tweet_id)")]
FS["βοΈ Follow Service\nPOST creates follow row\nDELETE removes follow row\nMaintains forward + reverse index"]
FDB[("ποΈ Follow Store\nfollower_id β [followee_ids]\nfollowee_id β [follower_ids]\nBoth directions required")]
C -->|"POST /tweets"| TS
TS -->|"INSERT tweet"| TDB
C -->|"POST /follows Β· DELETE /follows"| FS
FS -->|"Write both adjacency directions"| FDB
Maintaining both adjacency directions in the Follow Store doubles write cost on follow and unfollow but makes every read O(1) per user. The alternative, computing one direction from the other on the fly, is a full table scan. At 100B follow edges in the graph, that is not viable.
4. Users can view a home timeline
This is the hard requirement. A user's home timeline is the merged, reverse-chronological feed of tweets from every account they follow. Assembling this at read time for a user following 500 people against a live database would mean 500 queries per request. At 140K timeline requests per second, that is 70M database queries per second. We need a pre-computed feed.
Components:
- Fan-out Worker: An async worker that consumes new tweet events and pushes tweet_ids into each follower's timeline cache.
- Kafka: A durable message queue decoupling tweet writes from fan-out. The Tweet Service publishes a
NewTweetEventon every write. The Fan-out Worker consumes it. - Redis Timeline Cache (new): Stores per-user sorted sets. Key:
home_timeline:{user_id}. Score: tweet creation timestamp. Value: tweet_id. Capped at 800 entries per user. - Timeline Service (updated): On a home timeline read, fetches tweet_ids from Redis and hydrates them into full tweet objects via the Tweet DB.
Request walkthrough (write path):
- Client sends
POST /tweets. - Tweet Service inserts into Tweet DB and publishes
NewTweetEvent { tweet_id, author_id }to Kafka. - Fan-out Worker reads the event, fetches the author's follower list from Follow Store.
- Fan-out Worker calls
ZADD home_timeline:{follower_id} {timestamp} {tweet_id}for each follower. - Fan-out Worker trims each list to 800 entries.
Request walkthrough (read path):
- Client sends
GET /timelines/home. - Timeline Service calls
ZREVRANGE home_timeline:{user_id} 0 19on Redis. - Timeline Service batch-fetches full tweet objects for the returned tweet_ids from Tweet DB (or a tweet cache).
- Timeline Service returns the assembled tweet list.
flowchart LR
C(["π€ Client\nWeb / mobile app"])
TS["βοΈ Tweet Service\nINSERT tweet Β· Publish NewTweetEvent"]
TDB[("ποΈ Tweet DB\nSource of truth for tweet content\nIndex on (user_id, tweet_id)")]
MQ["π¨ Kafka\nNewTweetEvent queue\nDurable Β· decouples write from fan-out\nAt-least-once delivery"]
FW["βοΈ Fan-out Worker\nFetch follower list \nZADD to each timeline\nTrim to 800 entries per user"]
FDB[("ποΈ Follow Store\nget_followers(author_id)\nfollowee_id β [follower_ids]")]
RC["β‘ Redis Timeline Cache\nhome_timeline:{user_id}\n(sorted sets)\nScore = timestamp \nValue = tweet_id\n800 entries per user\npre-computed feed"]
TLS["βοΈ Timeline Service\nZREVRANGE from Redis\nHydrate tweet_ids β full tweet objects"]
C -->|"POST /tweets"| TS
TS -->|"INSERT tweet"| TDB
TS -->|"Publish NewTweetEvent"| MQ
MQ -->|"Consume event"| FW
FW -->|"get_followers(author_id)"| FDB
FDB -->|"List of follower_ids"| FW
FW -->|"ZADD home_timeline:{follower_id}"| RC
C -->|"GET /timelines/home"| TLS
TLS -->|"ZREVRANGE home_timeline:{user_id}"| RC
TLS -->|"Batch fetch tweet content"| TDB
TLS -->|"{ tweets, next_cursor }"| C
This is the High-Level Design: tweets write through Kafka to pre-computed Redis timelines; home timeline reads serve entirely from Redis. The fan-out worker is the component that collapses under celebrity-scale writes, which we address in deep dive 1.
I am treating the fan-out worker as a simple loop over all followers here. A user with 50 million followers makes this loop catastrophically slow. The deep dive on fan-out strategy addresses exactly this: the naive fan-out on write breaks for celebrities and requires a hybrid approach.
Potential Deep Dives
1. How do we generate home timelines at scale?
Three constraints define this problem:
- Home timeline reads must complete in under 300ms p99.
- A celebrity tweet must not stall the fan-out pipeline for all other users.
- The fan-out write rate must stay manageable: our average is 2.3M timeline cache writes per second across all accounts.
2. How do we generate unique, time-sortable tweet IDs?
Three constraints drive the design:
- Tweet IDs must be globally unique across all servers and regions with no central coordination.
- Tweet IDs should sort chronologically so that
ORDER BY tweet_id DESCgives the timeline order. - Generation must be fast enough not to add latency to the tweet write path.
3. How do we hydrate tweet content at read time?
Context: When the Timeline Service retrieves a home timeline from Redis, it gets a list of up to 20 tweet_ids. It must fetch the full tweet content (text, author display name, like count) for each. At 140K timeline reads per second with 20 tweet_ids each, the service needs to handle approximately 2.8 million tweet-content fetches per second. A direct primary database read for each is not viable.
4. How do we store and query the follow graph at scale?
Context: The follow graph is enormous. At 500M users with an average of 200 follows per active user, the graph has ~100 billion edges. The fan-out worker reads the reverse direction (followers of author X) on every tweet write. The Timeline Service reads the forward direction (celebrities user Y follows) on every home timeline load. Both reads must complete in milliseconds.
5. How do we model and scale the tweet table?
Context: Core Entities identified four fields for a tweet (tweet_id, user_id, text, created_at). The two dominant access patterns are very different: profile timeline reads filter by user_id and sort by tweet_id; tweet content hydration looks up by tweet_id directly. At 1B tweets per day, the table grows by roughly 300GB per day of raw text data alone. After one year, that is over 100TB. How you physically store and shard this table determines whether both access patterns stay fast at that scale.
Final Architecture
flowchart LR
subgraph Clients["π€ Clients"]
C(["π€ User\nWeb / Mobile"])
end
subgraph Gateway["π Gateway"]
AG["π API Gateway\nAuth Β· Rate limit Β· Routing"]
end
subgraph AppTier["βοΈ App Services"]
TS["βοΈ Tweet Service\nINSERT Β· Publish to Kafka"]
TLS["βοΈ Timeline Service\nTimeline reads Β· Hydration"]
FSvc["βοΈ Follow Service\nDual-write Β· Cache evict"]
end
subgraph AsyncTier["π¨ Async Pipeline"]
MQ["π¨ Kafka\nNewTweetEvent Β· At-least-once"]
FW["βοΈ Fan-out Worker\nConsumes Kafka Β· ZADD per follower"]
end
subgraph CacheTier["β‘ Cache Tier"]
RTC["β‘ Timeline Cache\nhome_timeline:{uid} Β· sorted set"]
RCC["β‘ Tweet Cache\ntweet:{id} Β· 24h TTL"]
RFC["β‘ Follow Cache\nfollowers:{id} Β· 1h TTL"]
end
subgraph DBTier["ποΈ Storage Tier"]
TDB[("π’ Tweet DB Primary\nWrites Β· Snowflake IDs")]
TRR[("π΅ Tweet DB Replica\nRead fallback Β· ~10-50ms")]
CASS[("ποΈ Cassandra\nFollow graph Β· wide rows")]
end
C -->|"POST /tweets Β· POST /follows"| AG
C -->|"GET /timelines"| AG
AG -->|"Writes"| TS
AG -->|"Reads"| TLS
AG -->|"Follow ops"| FSvc
TS -->|"INSERT tweet"| TDB
TS -->|"Publish event"| MQ
MQ -->|"Consume"| FW
FW -->|"Follower lookup"| RFC
RFC -.->|"Cache miss"| CASS
FW -->|"ZADD home_timeline:{id}"| RTC
FSvc -->|"Dual-write"| CASS
FSvc -->|"DEL followers:{id}"| RFC
TLS -->|"ZREVRANGE"| RTC
TLS -->|"MGET tweet:{id}"| RCC
TLS -->|"Celebrity fetch"| TDB
RCC -.->|"Cache miss"| TRR
TDB -.->|"Async replication"| TRR
The read/write split into Tweet Service and Timeline Service lets each scale independently. Redis absorbs the vast majority of both timeline and tweet-content reads. Kafka decouples the tweet write path from the fan-out pipeline so a celebrity post cannot block other users' tweet delivery.
Interview Cheat Sheet
- State the fan-out problem in your first breath: when someone with 50 million followers posts a tweet, the system must update 50 million timelines. Everything downstream is an answer to this one constraint.
- Fan-out on read is too slow at scale: a user following 500 accounts triggers 500 DB queries per timeline load, and latency grows linearly with follow count.
- Pure fan-out on write breaks for celebrities: a single tweet creates 50 million Redis writes and stalls the fan-out pipeline for all other users queued behind it.
- The hybrid strategy splits at a follower threshold (Twitter reportedly used ~150K): write fan-out for normal users, live read fan-out for celebrity tweets at timeline load time.
- Store pre-computed home timelines as Redis sorted sets: key is
home_timeline:{user_id}, score is creation timestamp, value is tweet_id. Cap each list at 800 entries. - At 200M DAU storing 800 tweet_ids per timeline at 8 bytes, the full timeline cache totals roughly 1.2TB. Plan for Redis Cluster from the start.
- Use Snowflake IDs for tweets: 64-bit integers encoding 41 bits of timestamp, 10 bits of machine ID, 12 bits of sequence counter per millisecond.
- Snowflake IDs are time-sortable, so
ORDER BY tweet_id DESCreplacesORDER BY created_at DESC. No secondary timestamp index is needed for chronological timeline queries. - Use cursor-based pagination for all timeline endpoints. Offset pagination breaks when new tweets arrive between page loads.
- Cache tweet content (full tweet objects) in Redis keyed by tweet_id with a 24-hour TTL. The Timeline Service hits the Redis tweet cache with a batch MGET before falling back to a read replica, never the primary.
- The primary tweet database handles writes only. Read replicas absorb all tweet content hydration on cache miss. Keep the primary out of the read path entirely.
- Cassandra is a natural fit for the follow graph: partition by followee_id maps directly to an adjacency list lookup, and wide-row reads return an entire follower list in one operation.
- Maintain both directions of the follow graph (follows_by_follower and follows_by_followee) in Cassandra. The fan-out worker uses the reverse index; the timeline service uses the forward index for celebrity lookups.
- Skip fan-out for users inactive for 30+ days. Check a Redis key set on login with a 30-day TTL. Reconstruct their timeline from the follow graph and tweet DB on their next login.
- Fan-out workers must be idempotent: a duplicate
ZADDwith an already-present member is a no-op in a Redis sorted set. Kafka at-least-once delivery is safe.