Read replicas vs. caching
When to add a read replica vs. a cache: the access pattern that drives the choice, write-heavy invalidation problems, replica lag trade-offs, and why adding a cache to a write-heavy system doesn't help.
TL;DR
| Dimension | Choose Caching | Choose Read Replicas |
|---|---|---|
| Access pattern | Hot data, high read-to-write ratio, repeated identical reads | Complex queries, unique filter combinations, analytical workloads |
| Staleness tolerance | Seconds of staleness are acceptable (profiles, catalogs, sessions) | Need data freshness within replication lag (milliseconds to seconds) |
| Write rate | Low write rate relative to reads (invalidation rate is manageable) | Write-heavy workload where cache invalidation would destroy hit rate |
| Query complexity | Simple key-value lookups or pre-computed results | Full SQL capability needed (JOINs, GROUP BY, arbitrary filters) |
| Data coverage | Hot working set fits in memory (typically 5-20% of total data) | Need to query the full dataset, not just hot keys |
Default answer: use both, at different layers. Cache for hot, repetitive reads (sessions, profiles, product pages). Read replicas for complex queries, analytics, and everything the cache can't cover. The access pattern tells you which to deploy first.
The Framing
Your primary database is at 80% CPU. Reads are taking 200ms when they used to take 20ms. The dashboard is red. You have two cards to play: add a read replica (more database capacity) or add a cache (less database traffic).
I've watched teams pick the wrong one and waste weeks. A team added Redis in front of a write-heavy user activity table with 2,000 writes per second. Cache invalidation rate matched the write rate. Hit rate hovered at 3%. They'd added infrastructure, complexity, and a new failure mode, and the database load didn't budge.
The access pattern tells you which lever to pull. High read-to-write ratio with a clear hot set? Cache. Complex queries where every request is unique? Read replica. Write-heavy with read contention? Definitely not a cache.
Here's my rule of thumb: if you can describe your top 10 most expensive queries and they all have the same parameters across thousands of users, cache them. If each user's query is unique, a cache won't help. Add a replica.
How Each Works
Caching (Redis / Memcached)
A cache sits between your application and database. On read, check the cache first. If the key is present (cache hit), return it immediately at sub-millisecond latency. If not (cache miss), read from the database, write the result to the cache with a TTL, and return it.
The most common pattern is cache-aside (lazy loading):
# Cache-aside pattern (pseudocode)
def get_user_profile(user_id):
# Step 1: Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return deserialize(cached) # Sub-ms response
# Step 2: Cache miss, read from DB
profile = db.query("SELECT * FROM users WHERE id = %s", user_id)
# Step 3: Populate cache with TTL
redis.setex(f"user:{user_id}", 3600, serialize(profile))
return profile
The cache only stores data that's been requested at least once. Over time, the hot working set naturally populates the cache. A well-tuned cache-aside setup achieves 90-99% hit rates on read-heavy workloads, meaning 90-99% of reads never touch the database.
The catch: cache invalidation. When the underlying data changes, the cached copy is stale. You need an invalidation strategy (delete on write, TTL expiry, or event-driven invalidation). Each has trade-offs between freshness, consistency, and complexity.
Read Replicas
A read replica is a copy of your primary database that receives changes via asynchronous replication. Your application routes write queries to the primary and read queries to one or more replicas. Each replica is a full database instance with complete query capability.
# Read replica routing (pseudocode)
def get_user_profile(user_id):
# Read from replica (full SQL capability)
return replica_db.query("SELECT * FROM users WHERE id = %s", user_id)
def search_users(filters):
# Complex query that would be impossible to cache
return replica_db.query("""
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.country = %s AND u.created_at > %s
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 50
""", filters.country, filters.since)
def update_user(user_id, data):
# Writes always go to primary
return primary_db.query("UPDATE users SET name=%s WHERE id=%s", data.name, user_id)
The key advantage: replicas support arbitrary SQL queries. No pre-defined keys, no invalidation logic, no serialization. Every query your primary can run, the replica can run too.
The cost: replication lag. Async replication means the replica is always slightly behind the primary (typically milliseconds on the same region, seconds under heavy load or cross-region). A user might write data and immediately read stale results from a replica that hasn't received the write yet.
Cache Invalidation Patterns
The hardest part of caching isn't adding a cache. It's deciding how and when to update it. There are four patterns, each with a different consistency/complexity trade-off.
Cache-aside (lazy loading): The application manages the cache explicitly. On read, check cache first. On miss, read from DB and populate cache. On write, update DB and delete the cache key. This is the most common pattern because it's simple and the cache only stores data that's actually been requested.
Write-through: On every write, update both the DB and the cache synchronously. The cache is always fresh, but every write pays the latency of two operations (DB write + cache write). Good for data that's read immediately after writing (user profiles, settings).
Write-behind (write-back): On write, update the cache immediately and asynchronously flush to the DB. Writes are fast (cache-speed), but you risk data loss if the cache fails before the async flush completes. I've seen this used for analytics counters where losing a few seconds of data is acceptable.
Event-driven invalidation: The database publishes change events (via CDC, triggers, or binlog streaming), and a consumer invalidates or updates cache entries. This decouples the write path from cache management entirely. More complex to set up, but eliminates the "forgot to invalidate" class of bugs.
For your interview: cache-aside is the safe default. Mention write-through when the interviewer cares about read-after-write consistency. Mention event-driven when the system already has a CDC pipeline (Debezium, DynamoDB Streams). Write-behind is niche and risky; only bring it up for write-heavy counters or metrics.
Head-to-Head Comparison
| Dimension | Caching (Redis) | Read Replicas | Verdict |
|---|---|---|---|
| Read latency | Sub-millisecond (0.1-0.5ms) | Same as primary (5-50ms for typical queries) | Cache, dramatically faster |
| Query flexibility | Key-value lookups only (pre-defined keys) | Full SQL (JOINs, GROUP BY, arbitrary filters) | Replica, full query power |
| Data freshness | Stale until TTL expires or explicit invalidation | Stale by replication lag (ms to seconds) | Replica (fresher by default) |
| Write-heavy tolerance | Poor (invalidation rate = write rate, hit rate collapses) | Good (replicas handle writes via replication stream) | Replica for write-heavy |
| Operational complexity | Cache invalidation logic, thundering herd, cold start | Replication lag monitoring, read-your-own-writes routing | Comparable, different failure modes |
| Cost at scale | Memory cost (RAM-only, expensive per GB) | Full DB instance cost (disk + compute) | Cache cheaper per read, replica cheaper per GB |
| Data coverage | Hot working set only (typically 5-20% of data) | Full dataset | Replica for full coverage |
| Failure mode | Cache miss hits DB (graceful degradation) | Replica lag or stale reads (data correctness risk) | Cache fails safer |
The fundamental tension is speed vs. flexibility. A cache gives you sub-millisecond reads but only for pre-defined keys with acceptable staleness. A read replica gives you full query capability with near-real-time data but no performance leap over the primary.
When Caching Wins
My recommendation: reach for a cache when you have a clear hot set with a high read-to-write ratio.
Product catalogs and configuration data. A product page is read thousands of times per write. Cache the rendered product object with a 5-minute TTL. Even a 90% hit rate eliminates 90% of your product-read DB load. I've seen this single optimization drop database CPU from 80% to 15%.
Session data. Session state is read on every authenticated request and written infrequently (login, cart update). Redis is the canonical session store: in-memory, sub-millisecond, optional persistence for surviving restarts.
Expensive computed results. An analytics dashboard that runs a 500ms aggregation query viewed by 200 users per minute. Without a cache, that's 200 identical 500ms queries. With a 60-second TTL cache, one query per minute serves all 200 users. The math is overwhelming.
API response caching. External API calls with rate limits or high latency. Cache the response with a TTL matching the data's freshness requirements. This is especially valuable for third-party APIs where you're paying per call.
Interview tip: state the hit rate
When proposing a cache in a system design interview, always state the expected hit rate and explain why. "Product pages have a 95%+ hit rate because 80% of traffic hits the top 5% of products" is a strong answer. "I'll add a cache" without hit rate analysis is weak.
When Read Replicas Win
Choose replicas when queries can't be cached effectively or when writes dominate the workload.
Complex filtered queries with unique parameters. A marketplace search with filters for price range, category, location, seller rating, and sort order. Every user's combination is different. Cache hit rate would approach 0% because nearly every query is unique. A read replica handles the full SQL query without any caching infrastructure.
Analytical queries on fresh data. Business dashboards, real-time reporting, aggregations that must reflect data from the last few seconds. A cache with a short TTL (say, 10 seconds) would still miss on the first request after refresh and the aggregation would hit your primary during peak load. A dedicated analytics replica isolates these expensive queries from your OLTP primary.
Write-heavy workloads with read contention. I'll be blunt: if your primary is dying from writes, a read cache doesn't help. The cache invalidation rate matches the write rate, and the hit rate drops toward zero. A read replica offloads the read queries to a separate compute instance while the primary handles writes.
Geographic distribution. Read replicas in multiple regions reduce latency for global users. A user in Tokyo reads from a Tokyo replica instead of crossing the Pacific to a US primary. Caching could also help here, but replicas provide full query capability at each region.
Gotcha: caching a write-heavy table
If your cache invalidation rate matches your write rate, the cache is overhead with no benefit. Every write invalidates the cache, every subsequent read misses, hits the database, and re-populates a value that will be invalidated again within milliseconds. You've added latency (cache lookup + miss + DB query + cache write) while gaining nothing. Fix the write path or add read replicas.
The Nuance: Use Both
Here's the honest answer: most production systems at scale use caching and read replicas together, serving different layers of the read path.
The cache handles the narrow, hot, repetitive reads: session data, user profiles, product pages, configuration. These have high hit rates (90-99%) and sub-millisecond latency. The replicas handle everything else: complex searches, analytics, reports, and any query that the cache doesn't cover or can't efficiently cache.
Typical layering:
| Read type | Solution | Why |
|---|---|---|
| Session data | Redis cache, 1-hour TTL | Read on every request, rarely changes, must be fast |
| User profiles | Redis cache, 5-minute TTL | High hit rate, acceptable staleness |
| Product search | Read replica, no cache | Every query has unique filters, cache can't help |
| Analytics dashboards | Read replica, optional 60-second cache | Expensive queries, dedicated replica isolates OLTP |
| Full-text search | Elasticsearch (specialized replica) | Neither Redis nor SQL replicas handle this well |
The cache miss path should fall through to a read replica, not the primary. This protects the primary from both cache misses and complex queries. My recommendation: always route cache misses to replicas, not to the primary, once you have both in place.
There's also the replication lag mitigation problem. When a user writes data and immediately reads it back, they might hit a replica that hasn't received the write yet. The standard fix is read-your-own-writes routing: for a configurable window after a write (say, 5 seconds), route that user's reads to the primary. After the window, fall back to replicas.
Real-World Examples
Netflix: Uses EVCache (a Memcached-based distributed caching layer built in-house) for high-frequency personalization data, serving billions of requests per day with a 99.9%+ hit rate. EVCache caches user profiles, viewing history summaries, and personalization scores that power the recommendation engine. Each cache hit saves a cross-service call that would otherwise take 5-20ms. For complex recommendation queries that join user history with content metadata across multiple data sources, Netflix routes to dedicated read replicas (Cassandra and MySQL) rather than trying to cache the infinite parameter combinations. The key insight from Netflix's architecture: the cache handles the narrow, hot, predictable reads; replicas handle the wide, unpredictable, analytical queries.
GitHub: Uses Memcached for rendered Markdown (the most expensive read operation on the platform), repository metadata, and user session data. A repository README that takes 200ms to render from Markdown is cached and served in under 1ms on subsequent views. The cache hit rate on rendered Markdown exceeds 98% because most repository pages are viewed far more often than they're edited. For search queries, activity feeds, and contributor graphs that involve complex joins across repositories, users, and events, GitHub routes to MySQL read replicas. Their architecture explicitly separates the hot path (cached page views) from the long tail (search and analytics on replicas).
Shopify: Uses Redis for session storage, storefront caching (product pages, cart data), and rate limiting. During flash sales (which can see 100x normal traffic in seconds), Redis absorbs 99% of product page reads, keeping MySQL primaries focused on order writes. MySQL read replicas handle order history, admin dashboard analytics, and merchant reporting queries. Shopify's architecture routes cache misses to replicas rather than the primary, a pattern that saved them during multiple Black Friday events. Their published numbers show that without the cache layer, flash sale traffic would require 50x more database read capacity.
How This Shows Up in Interviews
This trade-off appears when a system design interview involves scaling reads. The interviewer will often ask "how would you handle increased read load?" and expects you to reason about which tool fits the access pattern.
What they're testing: Whether you pick the right tool for the access pattern or just default to "add a cache." They want to see you think about hit rates, invalidation, and when a cache doesn't help.
Depth expected at senior level:
- Know cache-aside, write-through, and write-behind patterns and when each applies
- Understand why write-heavy workloads kill cache hit rates
- Explain replication lag and the read-your-own-writes mitigation
- Quantify expected hit rates based on access patterns
- Know when to use both together and how they layer
| Interviewer asks | Strong answer |
|---|---|
| "Your DB is overloaded from reads. What do you do?" | "First I'd profile the queries. If they're repetitive key-value lookups (product pages, sessions), I'd add Redis with cache-aside. If they're complex filtered searches, I'd add read replicas. Usually you need both at different layers." |
| "Why not just cache everything?" | "Cache hit rate depends on the read-to-write ratio and query uniqueness. For a search endpoint where every query has different parameters, hit rate approaches 0%. The cache adds overhead without reducing DB load. Read replicas handle unique queries without hit-rate concerns." |
| "How do you handle stale reads after a write?" | "Read-your-own-writes routing. After a user writes, route their reads to the primary for 5 seconds (configurable based on max replication lag). After the window, fall back to replicas. This gives consistency where it matters most without overloading the primary." |
| "What if your cache goes down?" | "Cache-aside degrades gracefully. All reads fall through to the database (or read replicas). Latency increases from sub-ms to 20ms+, and the database takes more load, but the system keeps working. I'd design for cache failure by ensuring the database can handle the miss rate during recovery." |
Quick Recap
- Use caching for high read-to-write ratio workloads where a small hot working set generates most reads. A 95%+ cache hit rate on sessions and profile data can eliminate most database read load entirely.
- Use read replicas for queries that can't be cached effectively: complex filtered searches, analytical queries, and any workload where each query is unique. Replicas add query capacity without invalidation problems.
- Caching doesn't help write-heavy workloads. If your cache invalidation rate matches your write rate, the hit rate approaches zero and the cache adds overhead without benefit. Fix the write path or add replica capacity.
- Replication lag is unavoidable with async replicas. Read-your-own-writes routing (send a user's reads to the primary for N seconds after they write) is the standard mitigation.
- Production systems use both: Redis cache for hot, repetitive data with high TTLs; read replicas for complex or freshness-sensitive queries. Route cache misses to replicas, not the primary.
- In interviews, always state the expected hit rate when proposing a cache, and explain why a cache won't help when the access pattern doesn't support it.
Related Trade-offs
- Caching for cache-aside, write-through, and write-behind patterns in depth
- Replication for leader-follower replication mechanics and consistency guarantees
- Read vs. write optimization for the broader strategy of optimizing the read path vs. the write path
- Redis vs. Memcached for choosing between cache implementations once you've decided to cache
- Strong vs. eventual consistency for the consistency implications of both replicas and caches