๐Ÿ“HowToHLD
Vote for New Content
Vote for New Content
Home/High Level Design/Concepts

Caching

Learn how caching eliminates redundant database reads, which strategy to choose for your write pattern, and how to design a cache layer that survives invalidation at scale.

49 min read2026-03-23mediumcachingredisperformancescalabilityhld

TL;DR

  • A cache is a fast, in-memory store that intercepts reads before they reach the database. The cache hit rate is the single number that determines how hard your database has to work.
  • At 95% hit rate, your DB sees 20ร— less read traffic. At 99%, 100ร— less. That compounding effect is why caching is the first line of defense against every read-scaling problem.
  • The core trade-off is consistency vs. latency: your cache is always a snapshot of the DB at a point in time. TTL and invalidation strategies control how stale that snapshot can get โ€” and how much complexity you accept in exchange.
  • Cache-aside is the default read pattern. Write-through for data that must be fresh. Write-behind for high-throughput telemetry. Write-around for infrequently read, one-off writes.
  • Cache invalidation โ€” knowing when to evict a stale entry โ€” is the hardest operational problem in caching. Get it wrong and you serve wrong data silently at scale.

The Problem It Solves

It's Black Friday. Your e-commerce platform has 500,000 concurrent users. Eighty percent of them are looking at the same 100 product pages โ€” the viral deals.

Your PostgreSQL database has 12 million products, but those 100 rows are being read roughly 4,000 times per second each. Each SELECT takes 5ms. Simple math: 400,000 reads/second ร— 5ms = you need 2,000 database connections just to avoid queuing. Your connection pool maxes out at 200.

The query time climbs from 5ms to 50ms as connection contention sets in, then to 500ms as the connection pool queue fills. Your monitoring dashboard shows CPU at 95%, I/O wait climbing. At 10:03 a.m. the DB falls over.

Not because the data is complex to retrieve โ€” because you're fetching the same 100 rows over and over, from disk, for every request. I'll often see engineers reach for read replicas here first, but that misses the point: the problem isn't read capacity, it's redundant computation.

The hidden assumption in every 'scale horizontally' recommendation

Adding more app servers doesn't help if every one of them fires a database query for the same popular data. You go from one app server making 400K DB queries/second to 10 app servers making 400K DB queries/second โ€” because the bottleneck was never the app tier. Adding more app servers without a cache layer just means more machines hammering the same database.

flowchart TD
  subgraph Internet["๐ŸŒ Black Friday โ€” 500K Concurrent Users"]
    Users(["๐Ÿ‘ค Users\n500K concurrent\n80% reading same 100 products"])
  end

  subgraph AppTier["โš™๏ธ App Tier โ€” 2,000 Servers Still Not Enough"]
    AS1["โš™๏ธ App Server 1\nEvery read โ†’ DB\n400K queries/min"]
    AS2["โš™๏ธ App Server 2โ€“N\nSame pattern โ€” no deduplication"]
  end

  subgraph DBTier["๐Ÿ’ฅ Database Under Siege"]
    DB[("๐Ÿ—„๏ธ PostgreSQL\n400K queries/s\nQuery time: 5ms โ†’ 500ms\nConnections maxed ยท 503s firing")]
  end

  Users -->|"500K HTTP requests/s"| AS1 & AS2
  AS1 & AS2 -->|"Every request โ†’ same 100 product rows\nNo deduplication ยท No reuse"| DB

The fix isn't more database replicas or more app servers. The fix is answering the same question from memory instead of computing it fresh every single time.


What Is It?

A cache is a faster storage layer that sits between your application and your primary data store, holding copies of recently or frequently accessed data so future requests for that data can be served without touching the original source.

Analogy: Think of a coffee shop that serves 300 customers a day, and 90% of them order the same three drinks. The barista could grind beans fresh for every single cup. Or, they could brew a large batch of the popular drip coffees at the start of the hour and pour from it.

Most customers get their coffee in 10 seconds instead of 3 minutes. The three popular coffees are the cache. The bean grinder is the database.

The batch brew is the cache population. Stale coffee that's sat for 4 hours is the invalidation problem โ€” at some point, you throw out the old batch and brew a fresh one.

flowchart TD
  subgraph Internet["๐ŸŒ 500K Concurrent Users"]
    Users(["๐Ÿ‘ค Users\n500K concurrent requests"])
  end

  subgraph AppTier["โš™๏ธ Stateless App Tier"]
    AS1["โš™๏ธ App Server 1"]
    AS2["โš™๏ธ App Server 2โ€“N"]
  end

  subgraph CacheTier["โšก Cache Tier"]
    Redis["โšก Redis Cluster\n< 1ms ยท ~95%+ hit rate\nProduct data ยท Sessions ยท Hot reads"]
  end

  subgraph DBTier["๐Ÿ—„๏ธ Database Tier"]
    DB[("๐Ÿ—„๏ธ PostgreSQL\nWrites + 5% of reads\n~20K queries/s ยท manageable")]
  end

  Users -->|"HTTPS requests"| AS1 & AS2
  AS1 & AS2 -->|"Cache read\n(95% hit)"| Redis
  Redis -.->|"Cache miss (5%)\nfetch + populate"| DB
  AS1 & AS2 -->|"Writes + cache miss reads"| DB

With a 95% cache hit rate, your database now handles 20,000 queries/second instead of 400,000. The same database that was collapsing under Black Friday traffic runs comfortably. The cache tier absorbs the fan-out; the database sees only the residual misses and all writes.

My recommendation here is to anchor every caching discussion in the hit rate number โ€” "95% hit rate = 20ร— less DB traffic" is the sentence that shows you understand the mechanism, not just the concept. The database literally doesn't know 95% of your users exist.


How It Works

Here's what happens on every request when cache-aside (the default pattern) is in use:

  1. Client sends a request โ€” e.g., GET /products/7429. The app server needs product data.
  2. Check the cache first โ€” The app constructs the cache key (product:7429) and fires a Redis GET. This takes < 1ms.
  3. Cache hit โ†’ return immediately โ€” If the key exists, deserialize the value and return it. The database is never involved. Cost: ~0.5ms.
  4. Cache miss โ†’ fetch from source โ€” Redis returns nil. The app queries the database (SELECT * FROM products WHERE id = 7429). Cost: ~5ms.
  5. Populate the cache โ€” The app takes the DB result, serializes it, and writes it to Redis with SET product:7429 <data> EX 300 (5-minute TTL). Future requests for this product skip the DB for the next 300 seconds.
  6. Return the result โ€” First caller gets database latency. Every subsequent caller until TTL expires gets cache latency.
async function getProduct(productId: string): Promise<Product> {
  const cacheKey = `product:${productId}`;

  // Step 1: Check cache (~0.5ms on hit)
  const cached = await cache.get(cacheKey);
  if (cached !== null) {
    return JSON.parse(cached) as Product; // Cache hit
  }

  // Step 2: Cache miss โ€” fetch from database (~5ms)
  const product = await db.queryOne<Product>(
    'SELECT id, name, price, description FROM products WHERE id = $1',
    [productId]
  );

  if (!product) throw new NotFoundError(`Product ${productId} not found`);

  // Step 3: Populate cache with TTL (5 minutes)
  // Fire-and-forget: don't block the response on the cache write
  cache.set(cacheKey, JSON.stringify(product), { EX: 300 }).catch(console.error);

  return product; // DB miss latency โ€” only the first caller pays this cost
}

Interview tip: state your hit rate and its consequence

When you add a cache in an interview design, immediately follow it with: "With a 95% cache hit rate, the database sees roughly 20ร— less read traffic โ€” dropping from X to Y queries/second." That number shows you understand why the cache exists, not just that it exists.

sequenceDiagram
    participant C as ๐Ÿ‘ค Client
    participant A as โš™๏ธ App Server
    participant R as โšก Redis Cache
    participant D as ๐Ÿ—„๏ธ Database

    Note over C,D: Cache HIT path โ€” 95%+ of requests
    C->>A: GET /products/7429
    A->>R: GET product:7429
    R-->>A: {"id":7429,"name":"...","price":99} ยท < 1ms
    A-->>C: HTTP 200 โ”€ product data

    Note over C,D: Cache MISS path<br/>first request or after TTL expires
    C->>A: GET /products/1337
    A->>R: GET product:1337
    R-->>A: (nil) โ€” not in cache
    activate D
    A->>D: SELECT * FROM products WHERE id=1337
    D-->>A: row data ยท ~5ms
    deactivate D
    A->>R: SET product:1337 {data} EX 300
    Note over R: Populated for next 300s
    A-->>C: HTTP 200 โ”€ product data

The first caller to request product 1337 after a miss (or after TTL expiry) pays database latency. Every caller for the next 5 minutes gets cache latency. At any read-heavy traffic profile, this asymmetry is enormously valuable.

I'll often walk through both paths step by step in an interview โ€” it forces you to articulate the hit path and the miss path, which is where all the interesting tradeoffs live. The hit path is free. The miss path is where your design choices matter.


Key Components

ComponentRole
Cache keyThe string identifier for a cached value. Namespace convention: {resource}:{id} (e.g., product:7429, user:session:abc123). Poorly designed keys collide or cannot be selectively invalidated.
TTL (Time-To-Live)How long a cache entry lives before automatic eviction. Balances freshness (low TTL) against hit rate (high TTL). Choosing the wrong TTL is the source of most caching bugs.
Eviction policyWhen the cache is full, the rule for choosing what to evict. LRU (Least Recently Used) is the Redis default and correct for most workloads.
Hit ratecache_hits / (cache_hits + cache_misses). The primary health metric for any cache. A declining hit rate is the early warning for a failing cache strategy.
Cache clusterMultiple Redis nodes providing replication (read replicas for throughput) and sharding (partitioning for capacity). Single-node Redis is a SPOF and a throughput ceiling.
Serialization formatHow values are encoded for storage. JSON is human-readable but slow. MessagePack or protobuf are faster. The choice compounds at high hit rates.
Connection poolA pool of persistent connections from your app to Redis. Creating a new connection per request adds 1โ€“5ms overhead and connection exhaustion at scale. Always use a pool.
Read replicaA Redis replica that accepts reads, offloading throughput from the primary. Essential for read-heavy workloads that saturate a single Redis node (~100Kโ€“200K ops/sec).

Cache Layers

Every production system has multiple cache layers operating simultaneously. Each layer is physically closer to the user and faster than the one below it โ€” at the cost of smaller capacity and lower consistency guarantees.

Five-tier cache hierarchy from browser cache at the top to disk storage at the bottom, with latency labels showing the 10-100x slowdown at each layer.
Each cache miss falls through to the next layer, which is 10โ€“100ร— slower. Optimising the topmost layers has the highest leverage โ€” a browser cache hit costs 0ms of network time.
LayerTechnologyLatencyWho manages it
Browser cacheHTTP cache headers (Cache-Control, ETag)0ms (local)Browser, controlled by your response headers
CDN edge cacheCloudflare, Fastly, Akamai PoP5โ€“30ms (nearest PoP)CDN provider + your purge API
Application cacheRedis, Memcached< 1ms (same network)You โ€” fully under your control
DB buffer poolPostgres shared_buffers, MySQL InnoDB pool2โ€“5ms (in-process)DB engine, automatically managed
Disk / storageSSD, HDD, object store5โ€“50ms+You / cloud provider

The browser and CDN layers are covered in depth in the CDN article. The remainder of this article focuses on the application cache layer โ€” the one you design, own, and debug.

In interviews, I typically sketch just three layers: browser/CDN, app cache (Redis), and database โ€” with the latency numbers at each. That's enough context to show you understand the hierarchy without burning five minutes explaining the full stack. The application layer is the only one where your architecture decisions actually live.


Read Strategies

How the cache gets populated is a design decision. These two patterns cover 95% of use cases:

Cache-Aside (Lazy Loading)

The application manages the cache directly. On a miss, the app fetches from the database and populates the cache itself. This is the pattern shown in the "How It Works" code above.

Characteristics: Only data that's actually been requested is cached โ€” no wasted memory on cold data. The first request to any key always pays database latency; subsequent requests get cache latency. Stale data is bounded by TTL unless you explicitly invalidate on write, and it works with any data source since the cache has no knowledge of where the data lives.

The mistake I see most often is teams skipping cache-aside in favor of a fancier pattern โ€” but cache-aside is correct precisely because it's explicit and simple.

Cache-aside is the correct default for most read-heavy systems.

Read-Through

The cache itself is responsible for loading data from the database on a miss. The application talks only to the cache and never directly queries the DB on a read path.

Characteristics: Simpler application code โ€” one data access layer handles everything. The cache must be configured with your DB schema and connection, often via libraries like Spring Cache or AWS DAX for DynamoDB. The first-request latency and thundering-herd exposure are identical to cache-aside.

In practice, reach for read-through only if your framework naturally supports it โ€” cache-aside is easier to reason about and debug.

Cache-aside vs. read-through in interviews

The distinction is where the DB query logic lives. Cache-aside: the application populates the cache on a miss. Read-through: the cache populates itself. Both produce identical outcomes. Interviewers mostly care that you can name both and distinguish them โ€” the practical difference is an implementation detail.


Write Strategies

When data changes, what gets updated and in what order? This is where most caching bugs originate. The mistake I see most often is engineers treating caching as a read concern and never thinking about the write path.

Three-column diagram comparing write-through (sync writes to both cache and DB), write-behind (sync to cache, async queue to DB), and write-around (bypass cache, write directly to DB) strategies.
The write strategy determines your consistency guarantee and your write latency. Write-through maximises consistency; write-behind maximises throughput; write-around prevents cache pollution for infrequent writes.

Write-Through

Data is written to the cache and the database synchronously in the same operation. Both writes must succeed before the caller receives an acknowledgement.

async function updateProductPrice(productId: string, newPrice: number): Promise<void> {
  // Both writes happen serially before we return โ€” latency = DB write + cache write
  await db.query('UPDATE products SET price = $1 WHERE id = $2', [newPrice, productId]);
  await cache.set(`product:${productId}`, JSON.stringify({ price: newPrice }), { EX: 300 });
  // Cache is now immediately consistent with DB
}

Use when: Read-heavy data that must always be immediately fresh (user account details, inventory counts). Writes are infrequent enough that the doubled write latency is acceptable.

Avoid when: Write-heavy workloads (analytics events, counters) โ€” the cache write adds latency to every single write.

Write-Behind (Write-Back)

Data is written to the cache first and acknowledged to the caller immediately. The cache flushes changes to the database asynchronously via a background process or queue.

async function recordPageView(articleId: string): Promise<void> {
  // Increment in-memory counter โ€” responds in < 1ms
  await cache.incr(`views:${articleId}`);

  // Enqueue DB flush โ€” batched, happens out-of-band
  await queue.enqueue({ type: 'flush_views', articleId });
  // If the cache node crashes before flush, view counts since last flush are lost
}

Use when: Very high write throughput where small data loss windows are acceptable โ€” view counters, click events, IoT telemetry. Losing 500ms of analytics data in a crash is fine. Losing 500ms of payment records is not.

Avoid when: Data represents money, inventory, or any state that must survive a cache failure with zero loss.

Write-Around

Data is written directly to the database, bypassing the cache entirely. The cache entry is not updated or touched โ€” it becomes stale until TTL expires and a read repopulates it.

async function archiveOldOrder(orderId: string): Promise<void> {
  // Direct DB write โ€” this data is read rarely; caching it wastes memory
  await db.query('UPDATE orders SET archived = true WHERE id = $1', [orderId]);
  // No cache interaction โ€” cache either holds stale data until TTL expires,
  // or this key was never cached in the first place
}

Use when: Data that is written once and read rarely (archived records, audit logs, cold historical data). Caching it would consume memory with zero hit-rate benefit on the hot path.

Cache Invalidation on Write (Most Common Pattern)

In practice, the most common pattern across all write types is: write to the database, then delete the cache key. The next read miss repopulates with fresh data.

async function updateUserProfile(userId: string, profile: UserProfile): Promise<void> {
  // 1. Write to source of truth first
  await db.query(
    'UPDATE users SET name = $1, email = $2 WHERE id = $3',
    [profile.name, profile.email, userId]
  );

  // 2. Invalidate โ€” DELETE the key, never SET a new value here
  // Setting a value here introduces a race: two concurrent writes can leave
  // the older write's value as the current cache entry
  await cache.del(`user:${userId}`);
}

Always DEL the cache key on write โ€” never SET

A common bug: write to DB, then immediately SET the new value in the cache. Under concurrent load, two writes can race: Write A updates DB โ†’ Write B updates DB โ†’ Write B updates cache โ†’ Write A updates cache. Now the cache holds the older Write A value, not the latest Write B. Always DELETE the key after a write. The next read fetches the authoritative DB state.

Write to the database first. Delete the cache key after. That rule eliminates the entire class of write-race bugs.


Cache Invalidation

Phil Karlton's famous quip โ€” "There are only two hard things in Computer Science: cache invalidation and naming things" โ€” is funny because it's operationally true at every scale.

Invalidation is the question: given that the database has changed, when does the cache know to discard its stale copy?

TTL-Based Invalidation

The simplest approach. Every entry has a fixed expiry. After TTL seconds, Redis automatically evicts the key; the next read repopulates with current DB data.

TTL rangeBehaviorRight for
Very short (< 30s)Near-real-time freshness ยท High miss rateData that changes per-second (live scores, stock prices)
Medium (1โ€“15 min)Good hit rate ยท Acceptable stalenessProduct catalog, user profiles, session tokens
Long (1โ€“24h)Excellent hit rate ยท Risk of stale dataReference data (country list, config flags)
No TTL (persistent)No self-healingKeys that are explicitly invalidated on every write

If your data changes frequently enough for users to notice staleness, TTL alone won't save you โ€” you'll need event-driven invalidation.

Event-Driven Invalidation

On every write, proactively delete relevant cache keys. The cache is stale for at most the propagation delay between the DB write and the invalidation call โ€” typically < 5ms same-region. The challenge: your write path must know every cache key that depends on the changed data.

A users row change might need to invalidate user:{id}, feed:{id}, profile:{id}, and recommendations:{id}. Miss any one of those and stale data persists silently. I've seen teams discover this dependency graph for the first time only after a production incident.

Version-Based Invalidation

Embed a version number in the cache key. When data changes, increment the version. Old keys become unreachable and eventually expire.

// Version-stamped key: product:7429:v3
const cacheKey = `product:${productId}:v${product.version}`;

// After an update, the new version key is used; the old key simply expires via TTL
// No explicit invalidation needed โ€” the old key is unreachable from new reads

Elegant for immutable-snapshot use cases. The downside: stale keys linger until TTL โ€” memory overhead grows proportional to how often data is updated.

Invalidation is the part of caching that will bite you in production โ€” TTL, events, or versioning, you need a strategy before you ship.


Eviction Policies

When Redis runs out of memory, the eviction policy determines which keys are dropped to accommodate new writes.

PolicyHow it worksBest forPitfall
noeviction (default)Refuse new writes when full โ€” returns an errorWhen you must never silently lose dataWrite errors cascade to DB overload if not caught
allkeys-lruEvict the least recently used key from all keysGeneral-purpose caches โ€” automatic cold-data cleanupCan evict infrequently accessed but expensive-to-recompute keys
volatile-lruLRU only among keys with a TTL setShared Redis instances mixing persistent and cached dataKeys without TTL are never evicted โ€” grow unbounded
allkeys-lfuEvict the least frequently used keyStable hot-key patterns (Zipfian distribution)Needs a warm-up period; early access skews the frequency count
volatile-ttlEvict the key with the shortest remaining TTLPreserving long-lived data over short-lived dataMay evict keys with seconds left, causing spurious misses

Interview tip: default to allkeys-lru for most caches

For a general-purpose application cache in an interview, say: "I'd configure maxmemory-policy allkeys-lru so Redis automatically evicts cold data when memory fills, eliminating write errors and keeping the hot working set in memory without manual intervention." That one sentence signals operational awareness most answers lack.


Trade-offs

ProsCons
Dramatic read latency reduction โ€” 5โ€“50ms DB reads become < 1ms cache readsCache miss paths add code complexity โ€” miss and populate logic must be correct
Shields the database from read fan-out โ€” prevents DB overload at scaleConsistency is eventually (not immediately) guaranteed โ€” stale reads are an inherent property
Cost efficient: RAM is cheap; DB instances that scale to millions of QPS are notCache coherency bugs are silent โ€” no exception fires when you serve a stale value
Enables horizontal read scaling โ€” Redis cluster handles millions of ops/sec cheaplyThundering herd on popular key expiry can spike DB load above its capacity ceiling
Sessions, rate limit counters, leaderboards, and pub/sub land naturally on an in-memory storeRedis is a new SPOF โ€” failure must be handled gracefully (fall through to DB + circuit breaker)
Absorbs hot-key read traffic that no single DB shard could sustainDebugging cache coherency issues requires correlating cache state and DB state across time

The fundamental tension here is consistency vs. performance. A cache is an explicitly stale replica of your database. Every performance gain from caching exists because you are, by design, serving data that might be slightly behind the source of truth.

The engineering challenge is deciding how stale is acceptable for each type of data โ€” and building the invalidation mechanics that enforce that bound. Get the staleness tolerance wrong and your users will notice.


When to Use It / When to Avoid It

So when does this actually matter? The short answer: any time the same data is read far more often than it's written, a cache will dramatically reduce your database load. Here's the full breakdown.

Use caching when:

  • Your read traffic is 5ร— or more than your write traffic for a given dataset.
  • The same data is read repeatedly within a window shorter than your acceptable staleness window.
  • Database query response time is a meaningful fraction of total request latency.
  • You need to scale reads past what a single primary or read replica can handle.
  • You store derived or computed values that are expensive to recompute (aggregations, rendered templates, ML inference results).

Avoid caching (or be very careful) when:

  • Data is write-heavy and read-once โ€” audit logs, financial ledger entries. Cache pollution with zero hit benefit.
  • Data must be instantly visible to all users immediately after a write โ€” for example, a payment acknowledgement shown to the paying user AND billed to their account simultaneously. TTL-based caching requires explicit invalidation for this.
  • You're prototyping. Cache adds complexity that masks performance problems. Measure DB performance first, then cache proven bottlenecks.
  • Cache failure would cause silently incorrect behavior. If your app serves wrong inventory or wrong prices when the cache breaks, verify your invalidation logic is complete before deploying.

If your application is serving the same read-heavy data to many concurrent users, a cache isn't optional โ€” it's the difference between a database that survives the load and one that doesn't.

Cache vs. read replica โ€” different tools for different problems

A database read replica reduces write pressure on your primary and provides a standby for failover. Queries still take milliseconds; connection pool limits still apply. A cache is an in-memory key-value store โ€” lookups take microseconds with no connection-pool concern. Use read replicas for complex queries, analytics, and reducing primary write pressure. Use caching for hot-path reads that are the same query repeated thousands of times per second.


The Thundering Herd Problem

Caching's most dangerous failure mode has nothing to do with the cache being full or down. It happens when a TTL expires on a popular key. I'll often see candidates skip this failure mode entirely in interviews โ€” it's the one that bites production caches hardest.

The scenario: Your most popular product (product:viral-item) is cached with a 60-second TTL. At 09:00:00.000, the key expires. Within the next 50ms, 1,000 requests arrive for that product โ€” every single one gets a cache miss.

Every single one fires a database query simultaneously. Your database spikes from 50 queries/second to 1,050 queries/second in 50ms. If your DB connection pool has 100 connections, 900 queries queue immediately.

Latency climbs. Cache population slows. The spike persists until repopulation finishes โ€” which takes longer because the DB is already overloaded.

For any high-traffic key, TTL expiry is a ticking clock to a DB spike โ€” the only question is how many concurrent misses you'll absorb when it fires.

sequenceDiagram
    participant R1 as โš™๏ธ Request 1
    participant R2 as โš™๏ธ Request 2
    participant Rn as โš™๏ธ Request 1,000
    participant C as โšก Redis Cache
    participant D as ๐Ÿ—„๏ธ Database

    Note over R1,D: TTL expires: "product:viral"<br/>at 09:00:00.000

    R1->>C: GET product:viral
    R2->>C: GET product:viral
    Rn->>C: GET product:viral

    C-->>R1: (nil) โ€” cache miss
    C-->>R2: (nil) โ€” cache miss
    C-->>Rn: (nil) โ€” cache miss

    Note over D: 1,000 simultaneous DB queries

    R1->>D: SELECT * FROM products
    R2->>D: SELECT * FROM products
    Rn->>D: SELECT * FROM products

    D-->>R1: row data ยท now 200ms (contention)
    D-->>Rn: row data ยท now 200ms

    R1->>C: SET product:viral {data} EX 60
    Rn->>C: SET product:viral {data} EX 60
    Note over C: 1,000 requests<br/>all re-populate the same key

Fixing the Thundering Herd

Option 1: Mutex lock โ€” Only the first requester that observes a miss acquires a lock and fetches from DB. All others wait, then return the populated value.

function getWithMutex(key, fetchFn):
  cached = cache.get(key)
  if cached exists โ†’ return cached

  lockKey = "lock:" + key
  acquired = cache.set(lockKey, "1", NX=true, EX=10)
  // NX = "set only if key does not exist" โ€” atomically claims the lock

  if acquired:
    try:
      value = fetchFn()             // only this one request hits the database
      cache.set(key, value, EX=300)
      return value
    finally:
      cache.del(lockKey)            // always release the lock, even on error

  // Another process is already fetching โ€” wait and retry
  sleep(50ms)
  return getWithMutex(key, fetchFn) // lock is gone; value should be in cache now

Option 2: Probabilistic Early Expiration (PER) โ€” Before TTL fully expires, randomly refresh the cache with probability rising as expiry approaches. No lock contention; stale data is briefly served while the background refreshes.

function getWithEarlyExpiration(key, ttl, fetchFn):
  value     = cache.get(key)
  remaining = cache.ttl(key)

  if value is null โ†’ return null   // complete miss โ€” caller handles fallback

  // Randomly trigger a background refresh before the key fully expires.
  // Probability rises as remaining TTL shrinks toward zero.
  threshold = ttl ร— 0.20           // begin considering refresh at 20% TTL left
  if remaining < threshold AND random() < 1.5 ร— (1 - remaining / threshold):
    background: cache.set(key, fetchFn(), EX=ttl)  // async โ€” does not block caller

  return value   // always return current cached value immediately

Option 3: TTL jitter โ€” Instead of a fixed TTL across a batch of keys, add random variance at population time. Expiry events spread out across a window instead of firing simultaneously.

const BASE_TTL = 300;
const JITTER = 60; // ยฑ 1 minute

const ttl = BASE_TTL + Math.floor(Math.random() * JITTER * 2) - JITTER;
await cache.set(cacheKey, value, { EX: ttl }); // TTL between 240โ€“360 seconds

TTL jitter is free to implement and eliminates the synchronized expiry problem without any lock contention โ€” add it by default to any batch of keys populated together.


Real-World Examples

Twitter (X) โ€” 99% of reads served from cache

Twitter's home timeline API serves hundreds of thousands of requests per second. Their underlying storage (Manhattan, their distributed KV store) handles only writes and cache-miss fallback reads โ€” roughly 1โ€“2% of total read traffic. Twitter uses a two-level cache: an in-process LRU cache (L1) within each app server for the absolute hottest content, backed by a distributed Memcached cluster (L2). The L1 cache eliminates the Redis network round-trip entirely for the most popular tweets. When a celebrity with 50M followers posts, the tweet is synchronously fan-out-written into the pre-cached timelines of followers via a fanout write service โ€” so by the time users refresh their feed, the data is already cache-warm with no miss possible. The lesson: at Twitter's scale, the cache is the primary read path. The database is the backup.

I bring up Twitter's fan-out-on-write design whenever an interviewer asks about social feeds โ€” it's a concrete example of caching as an architectural decision, not just a performance optimization.

Facebook (Meta) โ€” "Scaling Memcache at Facebook" (2013)

Facebook published this paper after running one of the largest Memcached deployments in the world โ€” tens of thousands of nodes. Their most surprising finding: cache invalidation, not cache misses, was the primary operational challenge. At their scale, a single DB write needed to invalidate cache entries across thousands of servers simultaneously. The naive approach โ€” write to DB, then delete from cache โ€” had a race condition: a stale read could repopulate the cache between the write and the delete. To prevent this, they built a "lease" mechanism: on a miss, the server issues a lease token to the first requester.

If a concurrent write invalidates the key while that requester is fetching from DB, the lease is revoked and the requester must re-read from cache. Without this, Facebook's caches would silently serve stale data after every write burst.

Stack Overflow โ€” 10M pageviews/day from a handful of servers

Stack Overflow serves enormous traffic from remarkably few servers, and aggressive two-level caching is the core reason. They run an in-process L1 cache within each ASP.NET worker (MemoryCache) and a shared L2 Redis cluster. The in-process cache eliminates the Redis network round-trip for the hottest queries โ€” top questions, user reputation, tag pages โ€” reducing latency to nanoseconds. Stack Overflow uses an active pre-warming strategy: background jobs continuously refresh popular cached values before TTL expires, so DB queries are only ever executed by background workers, never by user-traffic request handlers. A question page that takes 40ms to render from DB cold is served from the in-process cache in under 1ms for 99.9% of loads.

Every one of these examples proves the same thing: at scale, caching is a consistency and invalidation problem far more than a performance problem.


How This Shows Up in Interviews

Here's the honest answer on what separates a good caching answer from a forgettable one: it's not knowing Redis exists โ€” it's narrating the impact in numbers. My recommendation is to always follow "I'd add a cache here" with the hit rate and its consequence on DB load. That one habit alone moves you from junior to mid-level in the interviewer's mental model.

When to bring it up proactively

Draw a cache in the first component you sketch for any read-heavy design. Within the first 5 minutes, say: "I'll add a Redis cache in front of the database here โ€” at 95% hit rate, the DB sees 20ร— less read traffic." That specific ratio, calculated from your hit rate, signals you understand what caching actually does. Don't just draw the box โ€” narrate the impact.

Don't just draw the cache โ€” defend your choices

Saying "we'd add Redis here" without explaining what you're caching, your expected hit rate, your TTL rationale, or how you handle invalidation signals a memorised pattern without real understanding. One follow-up question exposes it. State the key space, the TTL, and the invalidation strategy in the same sentence as the cache.

Depth expected at senior/staff level:

  • Name your cache hit rate and calculate the DB impact: "95% hit rate โ†’ DB sees 5% of reads โ†’ 20ร— less traffic โ€” that's the difference between needing 5 read replicas and needing none."
  • Explain the write strategy choice. Write-through for critical user data; write-behind for high-throughput counters; always-delete-never-set on explicit invalidation.
  • Know the thundering herd problem and two mitigations: mutex lock (serialize misses) or probabilistic early expiration (background refresh before TTL expires).
  • Address Redis as a SPOF: replication (Redis Sentinel) and a graceful degradation path (fall through to DB with a circuit breaker) when cache fails.
  • Distinguish eviction policy (Redis choosing what to drop when full) from invalidation strategy (you proactively evicting stale data after a write). These are separate mechanisms that work together.

Common follow-up questions and strong answers:

Interviewer asksStrong answer
"What's your cache hit rate and how does it affect DB load?""At 95% hit rate, the DB receives 5% of reads โ€” roughly 20ร— less than without a cache. I'd alert if hit rate drops below 85%; that signals the working set changed, TTLs are misconfigured, or eviction pressure is churn-cycling hot keys."
"How do you handle cache invalidation?""On writes: delete the key after the DB write, never update the value. Deleting forces the next read to re-fetch from the authoritative DB state and avoids the race condition where two concurrent writes leave a stale value in cache."
"What happens when Redis goes down?""Circuit breaker on the Redis client. Fall through to DB directly on cache unavailability โ€” accept higher latency but keep the system serving. Watch DB CPU and connection count immediately; if the DB can't absorb full traffic, activate rate limiting or queuing during Redis recovery."
"How do you prevent thundering herd?""Mutex lock for correctness: only the first requester to observe a miss fetches from DB, others wait for the cache to populate. Or probabilistic early expiration for performance: randomly background-refresh 20โ€“30 seconds before TTL expires so the key is never actually cold. PER has no lock contention โ€” my default for high-traffic keys."
"How would you cache a social media feed?""Fan-out-on-write: when a user posts, synchronously write into the cached feeds of their followers at write time. Reads are pure cache hits. The trade-off: expensive writes for users with large follower counts. For celebrity users with 50M followers, skip fan-out and let followers pull on-demand with a short TTL โ€” the hot-path latency delta is acceptable."

Know these cold โ€” cache invalidation, thundering herd, and Redis SPOF handling come up in nearly every system design interview at the senior level.


Test Your Understanding


Quick Recap

  1. A cache is a fast in-memory store that intercepts reads before they hit the database โ€” cache hit rate, not server count, is what determines how hard your database actually works.
  2. In cache-aside (the default read pattern), your application checks the cache first, falls back to the database on a miss, and populates the cache on return. Write-through keeps cache and DB synchronous; write-behind allows fast writes at the cost of durability; always delete (never update) the cache key on a write to avoid race conditions.
  3. Every cache entry must have a TTL. Short TTLs maximise freshness at the cost of hit rate; long TTLs improve performance but risk serving stale data. Jitter TTL values across hot batches to prevent a Cache Avalanche when they expire simultaneously.
  4. Cache invalidation on write is always DEL key, never SET key newValue. Updating the cache value after a write introduces a race condition where two concurrent writes leave the cache holding the older value.
  5. The thundering herd fires when a popular key's TTL expires and hundreds or thousands of requests simultaneously miss โ€” use mutex locks or probabilistic early expiration to limit the DB burst to a single re-population fetch.
  6. Redis is a potential SPOF. Deploy with replication (Redis Sentinel for HA, Redis Cluster for horizontal scale) and build a graceful degradation path โ€” fall through to the database with a rate limiter when the cache is down.
  7. In every interview, state your hit rate numerically and calculate its DB impact: "95% hit rate โ†’ DB handles 5% of reads โ†’ 20ร— less traffic than without a cache" is the sentence that signals you understand what caching actually does, not just that it exists.

Related Concepts

  • Load balancing โ€” Load balancers route traffic across app servers; caches reduce the database traffic those app servers generate. Both are required for a horizontally scaled system โ€” neither alone is sufficient.
  • CDN โ€” A CDN is a geographically distributed cache for static and edge-cacheable content. Understanding browser cache headers and CDN invalidation is the natural extension of application-layer caching concepts.
  • Databases โ€” Caching exists to protect databases from read fan-out. Understanding database connection pooling, query cost, and replica lag helps calibrate the right TTL and hit-rate targets for any caching strategy.
  • Replication โ€” Database read replicas and Redis replication are two different answers to the same read-throughput problem. Read replicas suit complex queries; caches suit repeated simple reads. Knowing when to use each prevents over-engineering.
  • Rate limiting โ€” Redis is the canonical store for distributed rate limit counters. The atomic INCR and Lua scripting patterns from this article apply directly to building correct, race-condition-free rate limiters.

Previous

Load balancing

Next

CDN (Content Delivery Network)

Comments

On This Page

TL;DRThe Problem It SolvesWhat Is It?How It WorksKey ComponentsCache LayersRead StrategiesCache-Aside (Lazy Loading)Read-ThroughWrite StrategiesWrite-ThroughWrite-Behind (Write-Back)Write-AroundCache Invalidation on Write (Most Common Pattern)Cache InvalidationTTL-Based InvalidationEvent-Driven InvalidationVersion-Based InvalidationEviction PoliciesTrade-offsWhen to Use It / When to Avoid ItThe Thundering Herd ProblemFixing the Thundering HerdReal-World ExamplesHow This Shows Up in InterviewsTest Your UnderstandingQuick RecapRelated Concepts