Strong vs. eventual consistency
The tradeoff between data freshness guarantees: when you need every read to see the latest write, when stale reads are acceptable, and the techniques that let you choose per operation.
TL;DR
| Dimension | Choose Strong Consistency | Choose Eventual Consistency |
|---|---|---|
| Data correctness | Read informs a write that would be wrong if stale (balance checks, inventory, locks) | Stale reads are harmless or invisible to users (profiles, feeds, catalogs) |
| Latency tolerance | Can absorb 5-50ms quorum overhead per operation | Need lowest possible read latency, route to nearest replica |
| Availability under partition | Prefer correctness over availability (CP in CAP) | Prefer availability over freshness (AP in CAP) |
| Cost | Willing to pay 2x read cost (DynamoDB) or single-writer bottleneck | Need cheap reads at scale, replicas everywhere |
| Complexity | Understand quorum math, consensus protocols, 2PC | Understand convergence windows, conflict resolution, anti-entropy |
Default answer: eventual consistency for reads, strong consistency for writes that guard correctness. Most production systems choose per-operation, not globally. The question is never "which one" but "which operations need which guarantee."
The Framing
A team I worked with had a flash-sale system that sold 50 units of a limited item. They ran three read replicas with async replication. The inventory check read from the nearest replica: "47 remaining, allow purchase." The problem: two replicas were 200ms behind the primary. In that 200ms window, all 50 units sold, but the lagging replicas kept reporting "seats available." They oversold by 23 units.
The fix wasn't "make everything strongly consistent." That would have tripled their read latency across the entire catalog. The fix was routing inventory-check reads to the primary (strong consistency for that specific operation) while keeping product descriptions, images, and reviews on eventual-consistency replicas.
This is the real tradeoff. Strong consistency guarantees every read sees the latest write, but it costs latency, throughput, and availability. Eventual consistency gives you speed and resilience, but stale reads can cause real damage when they inform a mutation. The skill is picking the right model per operation.
How Each Works
Strong Consistency: Quorum Writes and Reads
Strong consistency (specifically linearizability) means every read returns the value of the most recent completed write. No stale data, no anomalies. The system behaves as if there's a single copy of the data, even though it's replicated across nodes.
The mechanism is quorum overlap. In a system with N replicas, you write to W nodes and read from R nodes. If R + W > N, at least one node in every read set has the latest write. Classic configuration: N=3, W=2, R=2.
# Quorum write: wait for majority acknowledgment
def quorum_write(key, value, replicas, W=2):
acks = 0
for replica in replicas: # N = 3 nodes
success = replica.write(key, value, timestamp=now())
if success:
acks += 1
if acks >= W: # W = 2: majority confirmed
return "COMMITTED"
return "FAILED" # Didn't reach quorum
# Quorum read: read from majority, return freshest
def quorum_read(key, replicas, R=2):
responses = []
for replica in replicas:
val = replica.read(key)
if val:
responses.append(val)
if len(responses) >= R: # R = 2: got majority
break
# Return the value with the highest timestamp
return max(responses, key=lambda r: r.timestamp)
The cost is clear: every write waits for W acknowledgments (adds ~5-15ms in a single datacenter, ~50-200ms cross-region). Every read contacts R nodes instead of one. Throughput is bounded by the slowest node in the quorum. And if fewer than W nodes are reachable, writes fail entirely.
Consensus protocols like Raft and Paxos formalize this. A leader accepts writes, replicates to a majority, then commits. Reads go through the leader (or use a lease mechanism) to guarantee freshness. Google Spanner takes it further with TrueTime, using GPS and atomic clocks to assign globally ordered timestamps.
Eventual Consistency: Converge Later
Eventual consistency makes a weaker promise: if no new writes occur, all replicas will eventually converge to the same value. The convergence window (the time between a write and all replicas reflecting it) is typically milliseconds to seconds, but under load or partition it can stretch to minutes.
The mechanism is background anti-entropy. Writes go to one or a few nodes and propagate asynchronously to the rest. Reads from any node are fast (single-node latency) but might return stale data.
# Eventual write: acknowledge immediately from one node
def eventual_write(key, value, coordinator):
coordinator.write(key, value, timestamp=now())
# Return success immediately
# Background gossip propagates to other nodes
return "ACKNOWLEDGED"
# Repair on read: detect and fix stale replicas
def read_repair(key, replicas, R=1):
# Read from one (fast, possibly stale)
primary_val = replicas[0].read(key)
# Asynchronously check others, repair if stale
for replica in replicas[1:]:
val = replica.read(key)
if val.timestamp < primary_val.timestamp:
replica.write(key, primary_val) # Read repair
return primary_val
Convergence relies on several mechanisms: gossip protocols spread updates between nodes, read repair fixes stale replicas when they're accessed, and anti-entropy processes (like Cassandra's Merkle-tree-based repair) periodically scan for and fix inconsistencies.
The tradeoff: reads are fast (one node, no quorum wait), writes are fast (acknowledge locally), availability is high (any reachable node can serve requests). But the system can return stale data during the convergence window.
Head-to-Head Comparison
| Dimension | Strong Consistency | Eventual Consistency | Verdict |
|---|---|---|---|
| Read latency | Quorum read from R nodes, +5-15ms single-DC | Single node, sub-ms from local replica | Eventual, significantly |
| Write latency | Wait for W acks, +5-50ms depending on replication | Ack from one node, propagate async | Eventual, significantly |
| Availability under partition | Refuses reads/writes if quorum unreachable (CP) | Serves from any reachable node (AP) | Eventual |
| Data freshness | Always current, no stale reads | Convergence window: ms to seconds (minutes under stress) | Strong |
| Correctness for mutations | Read-then-write patterns are safe | Read-then-write can oversell, double-book, or corrupt | Strong |
| Cost | DynamoDB: 2x RCU. PostgreSQL: all reads hit primary | Cheap reads at any replica, scale-out friendly | Eventual |
| Throughput ceiling | Bounded by quorum coordination | Scales linearly with replica count | Eventual |
| Complexity | Quorum math, consensus, leader election, fencing | Conflict resolution, read repair, anti-entropy, gossip | Similar |
| CAP theorem position | CP: consistent + partition-tolerant, sacrifices availability | AP: available + partition-tolerant, sacrifices consistency | Depends on requirements |
The fundamental tension: strong consistency caps throughput and increases latency because every operation coordinates across nodes. Eventual consistency scales better but can serve stale data. I've seen teams pick strong consistency globally because "correctness matters" and then wonder why reads take 50ms instead of 2ms. The right answer is almost always per-operation.
When Strong Consistency Wins
Strong consistency is the right choice when stale data causes incorrect behavior, not just stale display.
Balance checks before debits. If you read $100 from a stale replica but the real balance is $0, the debit proceeds and creates a negative balance. Financial operations need linearizable reads. DynamoDB offers strongly consistent reads at 2x the RCU cost. PostgreSQL routes these reads to the primary.
Inventory reservations. The flash-sale overselling scenario. If the inventory check reads a stale replica, you sell items you don't have. Route inventory reads to the primary or use a quorum read. My rule: any read that guards a "can I do this?" check before a write needs strong consistency.
Distributed locks and leader election. A lock manager that reports "lock not held" from a stale replica gives two processes the lock simultaneously. Linearizability (the strongest form of strong consistency) is required here. Etcd and ZooKeeper provide this.
Unique constraint enforcement. Username registration: if two nodes both think "alice" is available because replication hasn't converged, you get two accounts with the same username. Route uniqueness checks through a single authority.
Configuration and feature flag reads. When a feature flag change means "turn off billing" or "enable maintenance mode," all nodes must see the change immediately. A stale read could keep processing payments after they should have stopped.
When Eventual Consistency Wins
Eventual consistency is the right choice when speed and availability matter more than perfect freshness.
Product catalogs and search results. A product added 3 seconds ago not appearing in search results is invisible to users. Routing catalog reads to the nearest replica gives sub-millisecond latency instead of 50ms cross-region quorum reads. At Amazon's scale, that latency difference is worth millions.
User profiles and social feeds. If a profile picture update takes 2 seconds to propagate, nobody notices. If the timeline shows a post 500ms late, nobody cares. Eventual consistency lets you serve these reads from the closest datacenter to the user, cutting latency by 10-100x for geographically distributed users.
Analytics and metrics. Dashboard numbers being 5 seconds stale is totally fine. A counter showing 1,247,385 instead of 1,247,392 doesn't change any decision. Eventual reads let analytics queries run on replicas without loading the primary.
Session data (with caveats). Most session reads can tolerate brief staleness. But "read your own writes" is critical: if a user updates their email and immediately sees the old email, that's confusing. Causal consistency (see below) handles this.
DNS and CDN cache propagation. DNS TTLs are eventual consistency by design. CDN caches serve stale content until TTL expires or invalidation propagates. The entire internet runs on eventual consistency for read-heavy content.
The Nuance
The Consistency Spectrum
Strong and eventual aren't the only options. There's a spectrum, and most production systems live somewhere in the middle.
Linearizability is the strongest. Every operation appears to take effect at a single instant between its invocation and completion. Google Spanner achieves this globally using GPS/atomic clocks (TrueTime). The cost: every write waits for clock uncertainty (~7ms) plus replication.
Sequential consistency guarantees all nodes see operations in the same order, but that order doesn't have to match real-time. ZooKeeper provides this.
Causal consistency is the practical middle ground. It guarantees: you see your own writes, you see writes in the order they were made, and if you read a value, you see all writes that causally preceded it. It doesn't guarantee you see other users' concurrent writes immediately.
Per-Operation Consistency Is the Real Pattern
The most important insight for interviews: production systems don't choose one consistency model globally. They choose per operation.
PostgreSQL handles this naturally: writes and critical reads go to the primary, read-heavy queries go to async replicas, and connection-level routing (through PgBouncer or application logic) picks the target. DynamoDB does it with the ConsistentRead parameter per request.
The False Dichotomy
"Should we use strong or eventual consistency?" is usually the wrong question. The right question is: "Which specific operations in our system require strong consistency, and can we isolate them?"
In my experience, fewer than 10% of reads in a typical application need strong consistency. The other 90% are happily served from the nearest replica. The architecture that works: strong consistency by default for all writes (they go to a single primary or quorum), eventual consistency for most reads, and per-operation routing for the few reads that need freshness guarantees.
Real-World Examples
Google Spanner: The strongest possible consistency, globally. Spanner uses TrueTime (GPS + atomic clocks in every datacenter) to assign globally ordered timestamps. Every transaction is externally consistent (linearizable). The cost: write latency includes clock uncertainty (~7ms) plus cross-region replication. Google accepts this because financial data (AdWords billing, Google Play purchases) cannot tolerate stale reads. Spanner processes 100M+ requests/sec across 5+ regions.
Amazon DynamoDB: Eventual by default, strong opt-in. Most DynamoDB reads are eventually consistent (0.5 RCU per 4 KB). Strongly consistent reads cost 1.0 RCU (2x). Global Tables replicate across regions with eventual consistency (last-writer-wins, ~1 second convergence). DynamoDB serves billions of requests per day at single-digit millisecond latency because most reads skip quorum coordination.
Apache Cassandra at Apple: Apple runs one of the largest Cassandra deployments (160,000+ nodes, 10+ PB). Most reads use CL=LOCAL_ONE for speed. Writes use CL=LOCAL_QUORUM for durability within a datacenter. Cross-datacenter replication is eventual. For iCloud features requiring strong consistency (Keychain sync, device trust), Apple uses a separate strongly consistent store, not Cassandra.
How This Shows Up in Interviews
In system design interviews, consistency comes up the moment you add replication or multiple datacenters. The interviewer wants to see that you understand the tradeoff, not that you memorize protocols.
What they're testing: Can you identify which operations need strong consistency and which don't? Do you default to "make everything strongly consistent" (shows lack of operational awareness) or "everything eventual" (shows lack of correctness awareness)?
Depth expected at senior level:
- Know the quorum formula: R + W > N for strong consistency
- Explain why reads go to replicas by default and when to route to primary
- Name the consistency spectrum (linearizable > sequential > causal > eventual)
- Give a concrete example of per-operation consistency routing
- Know DynamoDB's ConsistentRead flag and Cassandra's CL parameter
| Interviewer asks | Strong answer |
|---|---|
| "What consistency model would you use?" | "Eventual consistency for reads by default, routing to the nearest replica. For balance checks and inventory reservations, I'd route to the primary for strong consistency. That's maybe 5% of total reads." |
| "Why not make everything strongly consistent?" | "Every read would hit the primary or require quorum coordination, adding 5-50ms latency and cutting throughput. For a product catalog serving 100K reads/sec, that's unnecessary. Only the 500 writes/sec to inventory need strong guarantees." |
| "What about read-your-own-writes?" | "That's causal consistency. After a user updates their profile, I route their next read to the primary or a replica that's caught up. Session affinity or a version token in the response handles this without full strong consistency." |
| "How does DynamoDB handle this?" | "Eventually consistent reads by default at 0.5 RCU. Setting ConsistentRead=True routes to the leader at 1.0 RCU. Global Tables use eventual consistency across regions with last-writer-wins conflict resolution." |
| "What's the cost of strong consistency in Spanner?" | "Spanner uses TrueTime for global ordering. Every write waits for clock uncertainty (~7ms) plus cross-region replication. You get linearizable transactions globally, but write latency is 10-50ms minimum. Read-only transactions can read from local replicas at a timestamp." |
Interview tip: name the per-operation pattern
Don't say "we'll use strong consistency." Say "catalog reads go to the nearest replica for sub-ms latency, but inventory checks route to the primary. That's maybe 5% of reads needing strong consistency." This shows operational judgment, not theoretical knowledge.
Quick Recap
- Strong consistency guarantees every read sees the latest write by coordinating across nodes (quorum reads/writes, consensus leaders, 2PC). The cost is latency, throughput, and reduced availability during partitions.
- Eventual consistency allows stale reads in exchange for lower latency, higher throughput, and better availability. Replicas converge through anti-entropy, gossip, and read repair, typically within milliseconds to seconds.
- The consistency spectrum runs from linearizable (strongest, Spanner) through sequential, causal ("read your own writes"), down to eventual (weakest, DynamoDB default). Most systems need causal at minimum.
- Production systems choose consistency per operation, not globally. Catalog reads are eventual. Balance checks are strong. Session reads use causal consistency. Fewer than 10% of reads in a typical application need strong guarantees.
- Tunable consistency (DynamoDB's ConsistentRead, Cassandra's CL parameter) lets you set the guarantee per query without changing infrastructure. This is the practical implementation of per-operation routing.
- In interviews, never say "we'll use strong consistency" globally. Say which operations need it and why, then explain that everything else uses eventual consistency for performance. That's the signal interviewers look for.
Related Trade-offs
- Consistency models for a deeper dive into linearizability, serializability, and session guarantees
- CAP theorem for the theoretical foundation of why you can't have strong consistency and full availability under partition
- Replication for the mechanics of async, sync, and semi-sync replication that implement these models
- SQL vs. NoSQL for how database choice constrains your consistency options
- Sync vs. async for the broader communication pattern that parallels strong vs. eventual
- Cassandra QUORUM read vs. LOCAL_ONE: +10-30ms depending on replication factor
Eventual consistency window (typical):
- MySQL/PostgreSQL async replication: 10-200ms
- Cross-region replication: 100-500ms
- DynamoDB global tables: up to seconds
## Interview Decision Framework
When you're asked about consistency in a design review:
-
What happens if a read returns data that's 500ms stale? โ If the answer is "user sees old name" โ eventual OK โ If the answer is "double charge" โ strong required
-
What is the write latency budget? โ >100ms tolerable โ strong consistency often feasible โ <10ms required โ eventual often necessary
-
Is this in a single datacenter or multi-region? โ Single DC: strong consistency usually fine โ Multi-region: strong consistency has cross-continent latency โ expensive
-
What's the failure mode under partition? โ CAP theorem: you must choose between consistency and availability during partition โ Financial systems: take unavailability over inconsistency โ Social apps: take inconsistency over unavailability
## Quick Recap
1. Strong consistency: every read returns the most recent write, at the cost of quorum latency and partition unavailability.
2. Eventual consistency: replicas converge to the same value with no new writes, allowing stale reads but lower latency.
3. The signal for "strong required": when a read informs a write that would be wrong if the read were stale (balances, reservations, locks).
4. Most systems choose per-operation: eventual consistency for catalog reads, strong consistency for balance checks.
5. Causal consistency ("read your own writes") is a useful middle ground that prevents visible self-write anomalies without full quorum overhead.