Thundering herd anti-pattern
Learn why a single cache expiry can trigger millions of simultaneous DB reads, and how probabilistic early expiry, mutex-on-miss, and request coalescing stop it.
TL;DR
- The thundering herd happens when a popular cache key expires and thousands of concurrent requests simultaneously find a cache miss, all racing to recompute the same expensive value.
- Every one of those requests hits your database directly, creating a traffic spike that can be 100x the normal load in milliseconds.
- Three mitigations exist: probabilistic early expiry (evict slightly before TTL expires), mutex-on-miss (only one request recomputes, the rest wait), and request coalescing (deduplicate in-flight fetches at the cache layer).
- This is one of the most common causes of "cache-induced database overload." Your cache works perfectly until it doesn't, and then your DB falls over at exactly the worst moment.
The Problem
You cache your most-read database query with a 5-minute TTL. At 9 a.m. on a Monday, when traffic spikes, that key expires. In the same 50-millisecond window, 4,000 concurrent requests check the cache, get a miss, and each independently fires a SELECT * FROM products WHERE category_id = ? against your primary database.
Your database goes from 200 queries per second to 4,000 queries per second instantaneously. If the query takes 500ms under load, your connection pool exhausts in under a second. The DB starts queueing connections. The cache-fill queries time out. The cache never gets repopulated. Every request for the next 30 seconds is a cache miss. Your site falls over.
The painful thing is that this looks like a database problem. Your dashboards show DB CPU at 100%, connection timeouts, query latency at 30 seconds. But the root cause is a design choice in your cache layer: you gave every concurrent reader the same miss behaviour.
I once spent two hours scaling up a Postgres replica before a teammate pointed out that our cache TTL was exactly 300 seconds and the spikes were exactly 5 minutes apart. The database was never the problem.
Why It Happens
The thundering herd emerges from three individually reasonable decisions that combine into a structural failure.
Fixed TTL with no jitter. You set EX 300 on every key. Every instance of the same key expires at the exact same millisecond. Under low traffic, this is fine. Under high traffic, it creates a synchronized cliff.
No coordination between readers. Each application instance checks the cache independently. There's no "someone is already fetching this" signal. When the key is gone, every reader acts as if it's the only one.
Recompute time exceeds arrival interval. If your DB query takes 500ms but new requests arrive every 0.25ms, you'll accumulate 2,000 duplicate queries before the first one finishes. The gap between "miss detected" and "cache repopulated" is the danger window.
Here's what the timeline looks like:
Time T: Key expires
T+0ms: 1 request checks cache → MISS → goes to DB
T+1ms: 200 more requests check cache → all MISS → all go to DB
T+2ms: 2000 more requests ... all MISS ... all go to DB
T+500ms: First DB response comes back → key populated
T+500ms: 2000 duplicate DB queries still in flight
Every request independently checked the cache before any of them had a chance to repopulate it. This is not a race condition you can fix with faster hardware. It's a structural problem with how cache misses are handled under concurrent load.
How to Detect It
The good news: thundering herd has extremely distinctive signatures. Once you know the pattern, you can spot it in 30 seconds on a dashboard.
| Symptom | What It Means | How to Check |
|---|---|---|
| DB CPU spikes at exact TTL intervals | Cache keys expiring in sync | Graph DB CPU over 1 hour, look for periodic spikes matching your TTL |
| Cache hit rate drops to 0% then recovers | All readers missing simultaneously | redis-cli INFO stats and monitor keyspace_hits vs keyspace_misses |
| Connection pool exhaustion during spike | Too many concurrent DB queries | Monitor active_connections vs max_connections in your DB |
| P99 latency spikes correlate with cache misses | Requests queueing behind DB overload | Correlate app latency metrics with cache hit rate |
| Identical slow queries in DB logs at same timestamp | Duplicate recomputes | pg_stat_activity showing many identical queries at the same second |
The smoking gun: if your DB CPU graph shows sawtooth spikes at regular intervals that match your cache TTL, you have a thundering herd. No other failure mode produces this exact signature.
Here's a quick Redis diagnostic to check if your hot keys are expiring in sync:
# Watch keyspace events for expiry patterns
redis-cli --no-auth-warning MONITOR | grep -E "EXPIRE|DEL|GET" | head -1000
# Check if a specific key pattern has simultaneous misses
redis-cli INFO stats | grep keyspace
# keyspace_hits:4523987
# keyspace_misses:12 <-- normally low
# If misses spike to thousands in a burst, that's the herd
Detection shortcut
Set up an alert on keyspace_misses rate. If it exceeds 10x your normal miss rate for more than 2 seconds, investigate immediately. This catches both thundering herd and cache stampede.
The Fix
Before the three main mitigations, there's a dead-simple first step that every caching system should implement.
Fix 0: TTL jitter (the 5-minute fix)
Randomize your cache TTL by +/- 20%. Instead of every key expiring at exactly 300 seconds, they expire between 240 and 360 seconds. This alone won't prevent a herd on a single hot key, but it prevents "all keys expiring at once" scenarios that amplify the problem.
function ttlWithJitter(baseTtl: number, jitterPercent = 0.2): number {
const jitter = baseTtl * jitterPercent;
return Math.floor(baseTtl + (Math.random() * 2 - 1) * jitter);
}
// Usage: instead of redis.set(key, value, "EX", 300)
await redis.set(key, value, "EX", ttlWithJitter(300));
This is not a complete solution. It reduces the herd size but doesn't eliminate it for individual hot keys. Think of it as a baseline hygiene practice, like input validation: always do it, but don't rely on it alone.
Fix 1: Mutex-on-miss (lock-based refresh)
When a request gets a cache miss, it acquires a distributed lock (e.g., Redis SET NX PX 5000). Only the lock holder recomputes the value and writes it back. Every other request waits briefly, then reads from cache.
async function getWithMutex(key: string): Promise<Value> {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const lockKey = `lock:${key}`;
const acquired = await redis.set(lockKey, "1", "NX", "PX", 5000);
if (acquired) {
// We own the lock: recompute and populate
const value = await db.query(key);
await redis.set(key, JSON.stringify(value), "EX", 300);
await redis.del(lockKey);
return value;
} else {
// Someone else is recomputing, wait briefly and retry
await sleep(50);
return getWithMutex(key); // retry
}
}
Trade-off: The lock introduces serialization. Under heavy load, hundreds of requests may pile up waiting. Keep the lock TTL short and the recomputation fast.
Fix 2: Probabilistic early expiry (XFetch)
Instead of expiring at a fixed TTL, each reader independently decides to proactively refresh the value slightly before it expires. The closer the key is to expiring, the higher the probability of triggering a refresh.
function shouldEarlyExpire(ttlRemaining: number, delta: number, beta = 1.0): boolean {
// XFetch algorithm: P(refresh) increases as TTL approaches zero
return Date.now() / 1000 - delta * beta * Math.log(Math.random()) >= expireTime;
}
The key insight: instead of one expiry event causing a thunderstorm, expiry is spread across many small, gradual refreshes. No lock needed.
Trade-off: Probabilistic early expiry adds some unnecessary recomputes (a reader might refresh a key that still has 30 seconds left). You're trading a small amount of extra DB load during normal operation for eliminating the catastrophic spike at expiry. In practice, the extra load is negligible compared to the herd spike it prevents.
When it shines: XFetch is ideal for systems with many hot keys and unpredictable traffic patterns. It requires no locks, no coordination infrastructure, and works across distributed application instances without any shared state.
Fix 3: Request coalescing (deduplication layer)
A dedicated layer (your cache client, a sidecar, or a middleware) deduplicates in-flight fetches. If 200 requests miss the same key simultaneously, the coalescing layer issues exactly one DB call and fans the result out to all 200 waiters.
const inFlight = new Map<string, Promise<Value>>();
async function getWithCoalescing(key: string): Promise<Value> {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
// Check if someone else is already fetching this key
if (inFlight.has(key)) {
return inFlight.get(key)!; // wait on the same promise
}
// We're the first: fetch and let everyone else piggyback
const fetchPromise = db.query(key).then(async (value) => {
await redis.set(key, JSON.stringify(value), "EX", 300);
inFlight.delete(key);
return value;
});
inFlight.set(key, fetchPromise);
return fetchPromise;
}
Libraries like dataloader (Node.js) do this natively for GraphQL. The approach works well when all your cache clients run in the same process. For distributed coalescing across multiple application instances, you need a shared coalescing layer (like Nginx's proxy_cache_lock or a custom Redis-based dedup).
When to Use Each Fix
| Situation | Best Fix |
|---|---|
| Recompute is fast (< 100ms), moderate concurrency | Mutex-on-miss |
| Recompute is slow or unpredictable, high concurrency | Probabilistic early expiry |
| Request fan-out happens at application layer | Request coalescing (DataLoader) |
| You control the CDN or reverse proxy | Stale-while-revalidate at HTTP layer |
Here's how mutex-on-miss changes the flow. Instead of 4,000 DB queries, you get exactly one:
Choosing the Right Fix
Not sure which fix to use? Walk through this decision tree:
Severity and Blast Radius
Thundering herd is a high-severity anti-pattern with cascading impact.
- Blast radius: Everything behind the cache. If your cache fronts a shared database, every service reading from that DB is affected, not just the one with the expiring key.
- Cascade risk: High. DB overload causes connection timeouts, which cause retries, which add more load. I've seen a single hot key take down an entire product catalog for 12 minutes.
- Recovery time: Usually 1-5 minutes if the DB can absorb the spike. If the DB crashes or the connection pool deadlocks, expect 10-30 minutes.
- Detection to fix: Hours (adding mutex-on-miss to an existing cache layer is a small code change). Prevention is better: build it in from day one.
When It's Actually OK
Not every cache expiry under load is a problem worth solving.
- Low-traffic keys (under 10 QPS): The "herd" is 1-2 requests. Your DB won't notice.
- Fast recomputes (under 5ms): Even 100 duplicate queries finish before the pile-up matters.
- Read replicas with headroom: If your read replica can absorb 10x normal load for a few seconds, the herd is a blip, not a crisis.
- Development and staging environments: Don't add mutex complexity to systems that never see real concurrency.
- Cache-aside with short TTL as a performance optimization, not a reliability layer: If the system works fine without cache and you're just shaving latency, thundering herd is an annoyance, not a failure mode.
How This Shows Up in Interviews
Interviewers mention thundering herd when your design has a TTL-based cache feeding a relational database. They'll ask: "What happens when your cache key expires?" The correct answer names the anti-pattern, explains why it cascades, and describes at least one mitigation by name.
At senior level, you're expected to know the trade-off between mutex (serialization cost) and probabilistic early expiry (complexity). At staff level, identify that mutex-on-miss can itself cause a thundering herd when the lock TTL is too short.
The cache works until it doesn't
Thundering herd failures are sudden. A system can run for months without incident, then fail catastrophically on the first cache expiry during a traffic spike. Build mitigation in before you need it.
Quick Recap
- Thundering herd: all concurrent requests simultaneously miss the same cache key and race to the DB.
- The trigger is a fixed TTL expiry under high concurrency, not a bug, but a structural design gap.
- Mutex-on-miss serializes the recompute; probabilistic early expiry spreads the refresh; coalescing deduplicates in-flight fetches.
- Layer your defences: probabilistic early expiry at the cache layer + connection pool limits at the DB layer.
- Monitoring signal: a sharp spike in DB query rate that correlates with cache hit rate dipping to zero for a specific key prefix.
- Don't confuse thundering herd (TTL-driven, periodic) with cache stampede (write-driven, irregular). The trigger is different, and so are the best fixes.
- The simplest first step: add TTL jitter. Randomize your cache expiry by +/- 20% to spread expirations across a wider time window.