Read replicas vs. caching
When to add a read replica vs. a cache: the access pattern that drives the choice, write-heavy invalidation problems, replica lag trade-offs, and why adding a cache to a write-heavy system doesn't help.
TL;DR
| Dimension | Choose Caching | Choose Read Replicas |
|---|---|---|
| Access pattern | Hot data, high read-to-write ratio, repeated identical reads | Complex queries, unique filter combinations, analytical workloads |
| Staleness tolerance | Seconds of staleness are acceptable (profiles, catalogs, sessions) | Need data freshness within replication lag (milliseconds to seconds) |
| Write rate | Low write rate relative to reads (invalidation rate is manageable) | Write-heavy workload where cache invalidation would destroy hit rate |
| Query complexity | Simple key-value lookups or pre-computed results | Full SQL capability needed (JOINs, GROUP BY, arbitrary filters) |
| Data coverage | Hot working set fits in memory (typically 5-20% of total data) | Need to query the full dataset, not just hot keys |
Default answer: use both, at different layers. Cache for hot, repetitive reads (sessions, profiles, product pages). Read replicas for complex queries, analytics, and everything the cache can't cover. The access pattern tells you which to deploy first.
The Framing
Your primary database is at 80% CPU. Reads are taking 200ms when they used to take 20ms. The dashboard is red. You have two cards to play: add a read replica (more database capacity) or add a cache (less database traffic).
I've watched teams pick the wrong one and waste weeks. A team added Redis in front of a write-heavy user activity table with 2,000 writes per second. Cache invalidation rate matched the write rate. Hit rate hovered at 3%. They'd added infrastructure, complexity, and a new failure mode, and the database load didn't budge.
The access pattern tells you which lever to pull. High read-to-write ratio with a clear hot set? Cache. Complex queries where every request is unique? Read replica. Write-heavy with read contention? Definitely not a cache.
Here's my rule of thumb: if you can describe your top 10 most expensive queries and they all have the same parameters across thousands of users, cache them. If each user's query is unique, a cache won't help. Add a replica.
How Each Works
Caching (Redis / Memcached)
A cache sits between your application and database. On read, check the cache first. If the key is present (cache hit), return it immediately at sub-millisecond latency. If not (cache miss), read from the database, write the result to the cache with a TTL, and return it.
The most common pattern is cache-aside (lazy loading):
# Cache-aside pattern (pseudocode)
def get_user_profile(user_id):
# Step 1: Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return deserialize(cached) # Sub-ms response
# Step 2: Cache miss, read from DB
profile = db.query("SELECT * FROM users WHERE id = %s", user_id)
# Step 3: Populate cache with TTL
redis.setex(f"user:{user_id}", 3600, serialize(profile))
return profile
The cache only stores data that's been requested at least once. Over time, the hot working set naturally populates the cache. A well-tuned cache-aside setup achieves 90-99% hit rates on read-heavy workloads, meaning 90-99% of reads never touch the database.
The catch: cache invalidation. When the underlying data changes, the cached copy is stale. You need an invalidation strategy (delete on write, TTL expiry, or event-driven invalidation). Each has trade-offs between freshness, consistency, and complexity.
Read Replicas
A read replica is a copy of your primary database that receives changes via asynchronous replication. Your application routes write queries to the primary and read queries to one or more replicas. Each replica is a full database instance with complete query capability.
# Read replica routing (pseudocode)
def get_user_profile(user_id):
# Read from replica (full SQL capability)
return replica_db.query("SELECT * FROM users WHERE id = %s", user_id)
def search_users(filters):
# Complex query that would be impossible to cache
return replica_db.query("""
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.country = %s AND u.created_at > %s
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 50
""", filters.country, filters.since)
def update_user(user_id, data):
# Writes always go to primary
return primary_db.query("UPDATE users SET name=%s WHERE id=%s", data.name, user_id)
The key advantage: replicas support arbitrary SQL queries. No pre-defined keys, no invalidation logic, no serialization. Every query your primary can run, the replica can run too.
The cost: replication lag. Async replication means the replica is always slightly behind the primary (typically milliseconds on the same region, seconds under heavy load or cross-region). A user might write data and immediately read stale results from a replica that hasn't received the write yet.
Cache Invalidation Patterns
The hardest part of caching isn't adding a cache. It's deciding how and when to update it. There are four patterns, each with a different consistency/complexity trade-off.
Cache-aside (lazy loading): The application manages the cache explicitly. On read, check cache first. On miss, read from DB and populate cache. On write, update DB and delete the cache key. This is the most common pattern because it's simple and the cache only stores data that's actually been requested.
Write-through: On every write, update both the DB and the cache synchronously. The cache is always fresh, but every write pays the latency of two operations (DB write + cache write). Good for data that's read immediately after writing (user profiles, settings).
Write-behind (write-back): On write, update the cache immediately and asynchronously flush to the DB. Writes are fast (cache-speed), but you risk data loss if the cache fails before the async flush completes. I've seen this used for analytics counters where losing a few seconds of data is acceptable.
Event-driven invalidation: The database publishes change events (via CDC, triggers, or binlog streaming), and a consumer invalidates or updates cache entries. This decouples the write path from cache management entirely. More complex to set up, but eliminates the "forgot to invalidate" class of bugs.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.