Fan-out amplification anti-pattern
Learn why fan-out-on-write fails for celebrity users, why fan-out-on-read fails under heavy read load, and how hybrid strategies handle both extremes in production news feed systems.
TL;DR
- Fan-out amplification: a single write operation triggers an explosion of downstream write operations proportional to a user's follower count.
- A celebrity with 50 million followers posts a tweet. Fan-out-on-write attempts to copy that tweet to 50 million feed caches, meaning a single user action becomes 50 million database writes.
- Fan-out-on-read solves the write amplification but moves it to reads: when every user loads their feed, you query all accounts they follow, aggregate, and sort. Expensive at read time.
- Production systems (Twitter, Instagram) use a hybrid approach: fan-out-on-write for normal users, fan-out-on-read for celebrities (typically above a follower threshold like 1M).
The Problem
You're designing a news feed for a social platform. A user posts something. Their followers should see the post in their feed. The naive approach: when the post is created, write it to every follower's feed table immediately. This is fan-out-on-write.
This works beautifully for a user with 500 followers. 500 writes per post, totally manageable. Your feed reads are instant because every user has a pre-computed list of posts waiting for them.
Then your platform gets popular. A celebrity joins with 150 million followers. They post "hello." Your write workers receive a job to do 150,000,000 feed insertions. At 100,000 writes/second, this takes 25 minutes to complete.
During those 25 minutes, the damage cascades. Some followers see the tweet immediately, others wait 25 minutes (inconsistent user experience). Your write queue backs up behind this one celebrity event. Regular users posting to their 200 followers are stuck in the same queue, delayed by the celebrity writes. Your write workers are pinned at 100% CPU doing nothing but celebrity fan-out. Database write IOPS are saturated.
The celebrity posts again 30 minutes later. And again. Your queue never fully drains. Feed freshness degrades for every user on the platform. I've seen this exact scenario take a social feed from sub-second delivery to 45-minute delays in a single afternoon.
This is fan-out amplification: your fan-out work is O(write count Γ follower count), and follower counts are not bounded.
Why It Happens
The fan-out-on-write approach sounds perfect when you first design it. Every follower has a pre-computed feed. Reads are O(1), just fetch the list. Latency is low, caching is simple, the architecture is clean.
The problem is that "per follower" multiplier is unbounded. Here are the forces that push teams into this trap:
"Read latency matters more than write latency." Pre-computing feeds makes reads instant. This is true for 99% of users. But the 1% with millions of followers create write storms that delay everyone else's feed updates.
"All users are roughly the same." Early in a platform's life, nobody has 50 million followers. The write amplification is invisible because follower counts are small. By the time celebrities join, the architecture is deeply embedded.
"One model should handle all cases." Engineering simplicity is valuable. Running a single fan-out pipeline for all users avoids the complexity of routing logic. But that simplicity collapses under power-law follower distributions.
I've seen teams build fan-out-on-write pipelines that work beautifully for a year, then melt the moment a single high-profile account joins the platform.
The math that breaks everything
The core issue is that follower counts follow a power-law distribution. Most users have 100-1,000 followers. A small number have millions. The average fan-out per post might be 500, but the 99th percentile is 5 million. System designs that work for the average case collapse at the tail.
Consider a platform with 10 million daily active posters, averaging 2 posts/day:
- At 500 followers average: 10M Γ 2 Γ 500 = 10 billion feed writes/day. Manageable.
- Add 100 celebrity accounts with 10M followers each, posting 5 times/day: 100 Γ 5 Γ 10M = 5 billion additional writes/day. The 100 celebrities (0.001% of users) produce 33% of the total write load.
The two extremes
Fan-out on write (push model): When a user posts, immediately write to every follower's pre-computed feed. Reads are O(1). Writes are O(followers). Catastrophic for celebrities.
Fan-out on read (pull model): When a user loads their feed, query all accounts they follow and aggregate in real time. Zero write amplification. But a user following 2,000 accounts requires 2,000 queries or a complex JOIN, expensive at read time.
Neither extreme works in isolation. The production answer is a hybrid.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.