Consistency models
Learn what consistency models guarantee, which model fits your data, and how to avoid the silent data corruption that happens when you choose wrong.
TL;DR
- A consistency model is a contract between a distributed system and its clients, defining what value a read is legally allowed to return after a write completes.
- There are five main levels — linearizability, sequential, causal, session, and eventual — forming a spectrum from strongest (all reads see the latest write instantly) to weakest (reads see any version, replicas converge over time).
- The core trade-off: stronger consistency requires sync replication, which adds 10–50ms of network latency per write. Weaker models are faster but can return stale data.
- Use the weakest model your correctness requirements allow. Social like counts tolerate eventual. Bank balances do not.
- The most dangerous consistency bug is silent: no exception fires, no error log appears — your system just silently serves wrong data to users.
The Problem It Solves
It's 9:14 a.m. on a Tuesday. A customer calls your payments support line, furious. She transferred $500 from checking to savings at 9:12 a.m., her bank app confirmed the transfer, and then she immediately tried to pay a bill — and got hit with an insufficient funds (NSF) fee. The $500 appeared in her savings account but the checking balance still showed the old pre-transfer amount.
Your database has three replicas: one in us-east-1, one in us-west-2, one in eu-west-1. The write went to the primary in us-east-1. The savings account replica in us-east-1 processed the write immediately.
But the checking balance lives on a different shard — and that shard's replica in us-west-2, where the bill payment read came from, hadn't received the update yet. The system served stale data. The transfer had committed according to one node; it hadn't propagated to the node that handled the next read.
This is a consistency failure. The system wrote correct data at the source then returned incorrect data on a subsequent read.
No code bug. No hardware failure. Just two nodes operating on different snapshots of reality.
flowchart TD
subgraph Internet["?? 9:12 a.m. — Customer Transfers $500"]
User(["?? Customer\nMobile banking app"])
end
subgraph AppTier["?? App Tier"]
API["?? API Server\nRoutes write to primary,\nread to nearest replica"]
end
subgraph DBTier["??? Database Tier — 3 Replicas, async replication"]
Primary[("?? Primary — us-east-1\nWrite: checking=$1,000?$500\nsavings=$200?$700\nCommitted at 09:12:00.000")]
ReplicaUS[("?? Replica — us-west-2\nStill shows: checking=$1,000\nReplication lag: 80ms\nBill payment read hits HERE")]
ReplicaEU[("?? Replica — eu-west-1\nStill shows: checking=$1,000\nReplication lag: 150ms")]
end
User -->|"1. Transfer $500 — write"| API
API -->|"2. Write to primary"| Primary
Primary -.->|"3. Async replication\n~80ms lag"| ReplicaUS
Primary -.->|"4. Async replication\n~150ms lag"| ReplicaEU
User -->|"5. Pay bill — READ\n(routed to us-west-2)"| API
API -->|"6. Read from replica\n— gets $1,000 (stale!)"| ReplicaUS
ReplicaUS -->|"7. Returns old balance\nNSF fee triggered"| API
The immediate instinct is: "this is a replication bug — fix replication lag." But replication lag is not the root issue. The mistake I see most often is treating the symptom — lag — as the disease.
Your consistency model was undefined — and in a distributed system, undefined means anything goes. Define the consistency contract before you write the first line of replication code.
The silent danger of consistency failures
Consistency bugs produce no exceptions, no error codes, and no stack traces. Your metrics look healthy. Your error rate is zero. Users are just getting wrong data. This is why choosing a consistency model is a first-principles design decision, not an operational afterthought.
What Is It?
A consistency model is a formal contract that specifies which values are valid return results for a read operation given the history of write operations in the system. If a write commits value V for key K, the consistency model says: "which nodes, at which times, are allowed to return a value other than V for a subsequent read of K?" My recommendation here is to frame it as a promise: the consistency model is what your storage layer guarantees to the reader — nothing more, nothing less.
Analogy: Think of a shared Google Doc that three people have open simultaneously. When you type a new sentence:
- Linearizability is like Google Docs' actual behavior: your sentence appears in everyone's view within milliseconds. Anyone who refreshes sees your change. The document is always the same for everyone, in real time.
- Eventual consistency is like emailing a Word document around. You make a change, send it, and for a while people have old versions. Eventually everyone gets the update, but during that window, readers disagree on the current state.
- Causal consistency is like a thread on Slack: you can reply to a message, and everyone sees your reply after the message you're replying to — causal order is preserved — but two simultaneous messages from different people may appear in different orders to different people.
The right model is not always the strongest one. It's the weakest one your users will never notice.
How It Works
Every read in a distributed system returns a value from some version of the data. The consistency model determines which versions are legally returnable. I'll often tell candidates: the quorum formula W+R>N is the only number you truly need to internalize for the entire topic.
Consider a distributed key-value store with one primary and three replicas. When a write SET balance=500 arrives:
- The primary receives the write and appends it to its write-ahead log. The write is committed at this point — durable in the primary.
- Replication begins — the change propagates to replicas via WAL shipping or change streams. This takes time: 1–5ms same-rack, 10–30ms cross-DC, 100–500ms cross-region.
- Replicas apply the write when they receive it — each replica's view of the data advances independently.
- A read arrives — it hits a specific replica. The value it sees depends on which consistency model is enforced.
If W+R>N, at least one node in the read set must have seen the latest write — that overlap is the entire guarantee.
// Quorum-based consistency — the core mechanism behind tunable models
// N = total replicas, W = write quorum, R = read quorum
// Strong consistency: W + R > N ensures at least one overlapping node
async function quorumRead(key: string, config: { R: number }): Promise<string> {
// Fan the read out to all replicas simultaneously
const reads = await Promise.allSettled(
replicas.map(r => r.get(key))
);
const successful = reads
.filter((r): r is PromiseFulfilledResult<{ value: string; version: number }> =>
r.status === 'fulfilled'
)
.map(r => r.value);
if (successful.length < config.R) {
throw new Error(`Read quorum not met: got ${successful.length}, needed ${config.R}`);
}
// For strong consistency: pick the value with the highest version/timestamp
// If W=2, R=2, N=3: at least one overlapping node MUST have the latest write
return successful.reduce((latest, current) =>
current.version > latest.version ? current : latest
).value;
}
// Cassandra-style consistency levels mapped to quorum formula:
// QUORUM: R = floor(N/2) + 1 — for N=3: floor(3/2)+1 = 2
// ONE: R = 1 — eventual consistency, fastest, can return stale
// ALL: R = N — strongest possible, latency = slowest replica
const CONSISTENCY_LEVELS = {
ONE: { R: 1, W: 1 }, // N=3: 1+1=2 < 3 ? stale reads possible
QUORUM: { R: 2, W: 2 }, // N=3: 2+2=4 > 3 ? strong consistency
ALL: { R: 3, W: 3 }, // N=3: 3+3=6 > 3 ? strongest, but blocks on any node down
};
Interview tip: W + R > N is the formula that matters
When asked how to achieve strong consistency in a distributed system, say: "With N replicas, configure write quorum W and read quorum R such that W + R > N. This guarantees at least one node in every read set overlaps with every write set, so the latest write is always visible. With N=3, QUORUM reads and writes (W=R=2) achieves this while tolerating one node failure."
sequenceDiagram
participant C as ?? Client
participant P as ?? Primary
participant R1 as ?? Replica 1
participant R2 as ?? Replica 2
Note over C,R2: Strong Consistency — W=2, R=2, N=3
C->>P: SET balance=500
activate P
P->>R1: Replicate (sync)
R1-->>P: ACK
P-->>C: OK — write complete
deactivate P
Note over R2: R2 gets update async<br/>~50ms later
C->>R1: GET balance
C->>R2: GET balance (concurrent read)
R1-->>C: 500 (latest version)
R2-->>C: 1000 (stale version)
Note over C: Client compares versions,<br/>returns 500 (correct)
Note over C,R2: Eventual Consistency — W=1, R=1
C->>P: SET balance=500
P-->>C: OK (async — R2 not updated yet)
C->>R2: GET balance
R2-->>C: 1000 (stale — replication lag!)
Key Components
| Component | Role |
|---|---|
| Consistency level | The contract chosen per read or per write: ONE, QUORUM, ALL, LINEARIZABLE, etc. Most distributed stores let you choose per-operation. |
| Replication factor (N) | Total number of replica nodes that hold a copy. Higher N increases durability and read throughput, but increases write coordination cost. |
| Write quorum (W) | Number of replicas that must confirm a write before success is returned. W=1 is fastest; W=N blocks until all nodes confirm. |
| Read quorum (R) | Number of replicas consulted on a read. W+R>N achieves strong consistency; W+R=N allows stale reads. |
| Replication lag | Time delta between a commit on the primary and visibility on a replica. The direct driver of consistency failures under eventual models. |
| Vector clock | Per-node version counter attached to each write. Tracks causal relationships. Two writes are concurrent if their vector clocks are incomparable. |
| Session token / read-after-write cursor | A logical timestamp carried by the client that tells the replica "only serve this read if you've applied at least this version." |
| Conflict resolver | The function called when two concurrent writes create divergent replica state. LWW, MVCC, or CRDT-based. |
The Five Consistency Models
Linearizability (Strongest)
Linearizability is the gold standard. It guarantees that every operation appears to execute atomically at a single point in time between its invocation and completion — meaning: if write W completes before read R begins, R must see W's value.
In plain terms: the system behaves as if there is one copy of the data, and every operation is instantaneous.
This is the "C" in single-node ACID. It's what makes a bank's ledger correct. You cannot have partial visibility into a committed write — once committed, all readers everywhere see it.
How it's implemented: Synchronous replication before ACK. Raft consensus. Google Spanner's TrueTime API.
Every write must wait for a quorum confirmation before returning success to the client.
Cost: Every write adds a network round trip to cross-cluster confirmation. Same-DC: 10–20ms. Cross-region: 100–500ms.
This is non-negotiable physics — you're paying the speed-of-light tax on every write.
When you need it: Distributed locks, leader election, financial account balances, inventory decrement at checkout, any system where two concurrent actors acting on stale data causes an audit failure or a real-world harm. When I'm reviewing a design and the interviewer hasn't specified consistency for money or inventory, I always ask — those two categories always need linearizability.
sequenceDiagram
participant CA as ?? Client A
participant DB as ?? Distributed DB
participant CB as ?? Client B
Note over CA,CB: Linearizable — write must be visible<br/>before any subsequent read returns
CA->>DB: Write: balance = $500
activate DB
Note over DB: Sync propagation to all replicas<br/>before committing
DB-->>CA: OK (all replicas confirmed)
deactivate DB
CB->>DB: Read: balance?
DB-->>CB: $500 ? — guaranteed fresh
Note over CA,CB: No matter which replica CB hits,<br/>it MUST return $500 after CA's write completes
Linearizability is the right choice when being wrong costs more than being slow.
Sequential Consistency
Sequential consistency relaxes the real-time requirement but preserves global ordering. All operations appear to execute in the same order to all observers, but that order doesn't need to match wall-clock time.
The key distinction from linearizability: Linearizability says if W finished before R started, R sees W. Sequential says all nodes see operations in the same order, but that order may not match real-world time. Operations might appear out of "real-time" order across different nodes, as long as the local programmatic order per process is respected.
In practice, Client A might see operations in order [W1, W2, R1] and Client B also sees [W1, W2, R1] — but W2 might have physically happened before W1 in wall-clock time. The ordering is consistent across clients, just not pinned to the clock.
Where it appears: CPU memory models (the Java Memory Model), consensus algorithms, and multi-primary databases that use a global sequence number. Harder to implement at global scale than linearizability (paradoxically) because you don't need real-time precision, but you still need global coordination.
In most interview discussions, sequential consistency is a stepping-stone concept — name it to show range, then move to the model you—d actually choose.
Causal consistency is the sweet spot for most distributed applications. It preserves the "cause-before-effect" relationship: if operation A causally precedes operation B, all nodes will see A before B. Concurrent, unrelated operations may appear in different orders on different nodes.
Two writes are causally related if:
- One write is a response to reading a value (you read the count, then increment it)
- A client performs them in the same session in order
- A
happens-beforerelationship exists in the system's vector clocks
Two writes are concurrent if neither causally precedes the other — two users independently updating different fields of a profile at the same moment.
How it's tracked: Vector clocks. Each node maintains a per-node counter vector. When a write is sent, the sender's vector is attached.
Receivers advance their own vector and reject writes that violate causal order.
// Simplified vector clock — each node tracks its own + all other known operations
type VectorClock = Record<string, number>; // { "node1": 3, "node2": 1, "node3": 5 }
function happensBefore(a: VectorClock, b: VectorClock): boolean {
// A happens-before B if all of A's counters are = B's counters
// and at least one counter is strictly less
const nodeIds = new Set([...Object.keys(a), ...Object.keys(b)]);
let strictlyLess = false;
for (const nodeId of nodeIds) {
const aVal = a[nodeId] ?? 0;
const bVal = b[nodeId] ?? 0;
if (aVal > bVal) return false; // A can't happen-before B if any counter is greater
if (aVal < bVal) strictlyLess = true;
}
return strictlyLess; // A ? B only if at least one counter is strictly less
}
function areConcurrent(a: VectorClock, b: VectorClock): boolean {
// Concurrent = neither happens-before the other
return !happensBefore(a, b) && !happensBefore(b, a);
}
Where it appears: Facebook's social graph uses causal consistency — if you comment on a post, observers always see the post before the comment. MongoDB's default in multi-document operations. CockroachDB's MVCC reads.
I'll often see candidates jump straight from eventual to linearizable — causal consistency is the model in between that handles most real-world ordering bugs at a fraction of the cost.
Session Consistency
Session consistency provides guarantees scoped to a single client session, rather than globally across all clients. Within a session, four sub-guarantees apply:
| Guarantee | Definition | Practical meaning |
|---|---|---|
| Read-Your-Writes | A client always sees its own writes | After you update your profile, you immediately see the update |
| Monotonic Reads | A client never sees a value older than one it already observed | If you see count=100, you'll never see count=95 on a later read |
| Monotonic Writes | A client's writes appear in the order they were issued | W1 is never visible without W2 if client wrote W1 then W2 |
| Writes Follow Reads | A write that follows a read will be at least as fresh as what was read | If you read V=5 then write V=10, no node can see V=10 before V=5 |
How it's implemented: Sticky routing (all reads for a session go to the same replica) or session tokens that encode the client's last-seen version (any replica with that version or newer can serve the read).
// Read-Your-Writes via session tokens — most production-safe implementation
interface SessionToken {
lastWriteTimestamp: number; // logical clock or hybrid logical clock
sessionId: string;
}
async function readWithSessionConsistency(
key: string,
sessionToken: SessionToken
): Promise<{ value: string; newToken: SessionToken }> {
// Find a replica that has applied at least up to lastWriteTimestamp
// If no replica is caught up, wait up to 100ms then fall back to primary
const replica = await findReplicaAtOrAfter(sessionToken.lastWriteTimestamp, {
timeoutMs: 100,
fallback: 'primary',
});
const result = await replica.get(key);
return {
value: result.value,
// Advance the token: future reads need to be at least this fresh
newToken: {
sessionId: sessionToken.sessionId,
lastWriteTimestamp: Math.max(sessionToken.lastWriteTimestamp, result.timestamp),
},
};
}
Where it appears: DynamoDB's strongly consistent reads are session-scoped. Every major cloud database SDK uses session tokens or sticky connections to implement read-your-writes. Even most "eventually consistent" systems implement session consistency by default because the UX of seeing your own writes is non-negotiable for any user-facing product.
That said, session consistency only protects your own writes — two different clients writing concurrently can still clobber each other with no warning.
Eventual Consistency (Weakest)
Eventual consistency makes one promise: if writes stop, all replicas will eventually converge to the same value. It makes no promise about when, and it explicitly allows stale reads during the convergence window.
BASE properties (contrast with ACID):
- Basically Available: The system always responds, even if the response might be stale
- Soft state: State can change over time due to propagation, even without new writes
- Eventually consistent: Given enough time without new writes, all replicas converge
The convergence window is typically 10–100ms within a data center, 100–500ms cross-region, but there is no formal upper bound. Under write contention, the window can be indefinite.
The hard part isn't the model — it's conflict resolution. When two clients write to the same key concurrently across two different replicas, the system has two —latest— values. That decision — how to merge them — determines whether you silently lose data or surface messy conflict logic to your API consumers. Three approaches:
| Strategy | How it works | When to use | Risk |
|---|---|---|---|
| Last-Write-Wins (LWW) | Each write carries a timestamp; highest timestamp wins | Simple, low-overhead | Clock skew causes silently lost writes |
| Multi-Version (MVCC) | Keep all concurrent versions; surface conflict to application | Shopping carts, collaborative edits | Application must implement merge logic |
| CRDTs | Use algebraic data types that merge automatically | Counters, sets, distributed flags | Limited to CRDT-safe data structures |
Last-Write-Wins silently loses data
LWW is the default in Cassandra, Riak, and many other stores. It uses the writing node's wall clock. If two nodes' clocks differ by even 1ms, writes can be silently overwritten by older values. This is not a theoretical concern: it is a production data loss scenario that has hit Netflix, GitHub, and countless others. If you use LWW, you must pair it with NTP monitoring and cross-datacenter clock synchronization.
Where it appears: Amazon DynamoDB (default), Apache Cassandra, Amazon S3, DNS, and any system that prioritizes availability over consistency (the A in CAP).
sequenceDiagram
participant CA as ?? Client A (us-east)
participant N1 as ?? Node 1 (us-east)
participant N2 as ?? Node 2 (us-west)
participant CB as ?? Client B (us-west)
Note over CA,CB: Eventual Consistency —<br/>concurrent writes to same key
CA->>N1: Write: item_qty=3 (t=1000ms, node1_clock)
N1-->>CA: OK (N2 not updated yet)
CB->>N2: Write: item_qty=7 (t=1001ms, node2_clock)
N2-->>CB: OK (N1 not updated yet)
Note over N1,N2: Replication fires —<br/>both writes propagate
N1->>N2: Replicate qty=3 (timestamp=1000)
N2->>N1: Replicate qty=7 (timestamp=1001)
Note over N1,N2: LWW: 1001 > 1000 ? qty=7 wins<br/>Client A's write is silently lost
Note over N1,N2: But if node2's clock is 2ms behind<br/>node1's clock: qty=3 wins instead
Eventual consistency is not a shortcut — it's a deliberate trade that you need to own end-to-end, from the write path through conflict resolution.
Conflict Resolution Strategies
Eventual consistency systems must handle the case where two writes with no causal relationship produce divergent replicas. This is unavoidable when writes are accepted at multiple nodes simultaneously.
Last-Write-Wins (LWW)
Every write carries a timestamp (usually the writing node's wall clock or a Hybrid Logical Clock). When two conflicting writes arrive at a replica, the one with the higher timestamp wins, and the other is silently dropped. The mistake I see most often in production incident reviews is teams discovering this only after data is already gone.
Safeguard: Use Hybrid Logical Clocks (HLC) instead of pure wall clocks. HLC combines wall clock with a monotonic counter: HLC = max(wall_clock, last_seen_timestamp) + counter. This ensures monotonic advancement even when NTP causes clock drift, dramatically reducing silent data loss.
Multi-Version Concurrency Control (MVCC)
The database retains all concurrent versions of a value and surfaces the conflict to the application, which must implement a merge function. Amazon Dynamo uses this with "siblings" — concurrent writes produce multiple versions that the client merges on next read.
Classic example: A shopping cart. User A adds "Socks" on their phone. User B adds "Shirt" on their laptop.
Both writes go to different replicas. With MVCC, both items survive in the cart as separate versions that are merged to ["Socks", "Shirt"]. With LWW, one write is silently lost.
CRDTs (Conflict-free Replicated Data Types)
CRDTs are data structures with a mathematically proven merge operation that is commutative, associative, and idempotent — meaning any order of merging any subset of writes always produces the same result.
| CRDT type | Example | Real use |
|---|---|---|
| G-Counter | Increment-only counter | View counts, event totals |
| PN-Counter | Increment + decrement | Likes, votes |
| G-Set | Add-only set | Tracking unique visitors |
| OR-Set | Add and remove set | Tags, features flags |
| LWW-Register | Last-write-wins record | Single-field mutations (Note: lossy, drops concurrent edits) |
Redis, Riak, and Figma's collaborative canvas use CRDTs. They eliminate conflict resolution code entirely — the data structure absorbs all concurrency.
Trade-offs
| Pros | Cons |
|---|---|
| Weaker models (eventual, session) have near-zero write latency overhead — reads and writes return in single-node response time | Stronger models add 10–500ms per write for cross-replica synchronization — paid on every single write indefinitely |
| Eventual consistency enables write availability during network partitions — the system keeps accepting writes even when replicas are separated | Eventual consistency produces stale reads — users can observe data that doesn't reflect the latest state, which is confusing or harmful in some contexts |
| Session consistency provides the most common UX guarantee (read-your-writes) at near-zero cost — implemented with a session token, not full synchronization | Conflict resolution code is complex, error-prone, and usually untested until production — LWW silently loses writes, MVCC requires application-level merge logic |
| Causal consistency eliminates most real-world consistency issues with vector clocks and low cross-node overhead | Vector clock overhead grows with the number of nodes — at thousands of nodes, clock sizes become impractical without pruning strategies |
| Linearizability makes correctness proofs trivial — the system acts like a single node | Linearizability is incompatible with availability during network partitions (CAP theorem P+A+C: choose two) |
The fundamental tension here is correctness vs. latency. Consistency is not free — every guarantee you add requires coordination between replicas, and coordination requires time proportional to the speed of light across your network topology. Pick the wrong model for the wrong reason and you—ll pay for it in either nanoseconds or customer complaints.
When to Use / When to Avoid
So when does this actually matter? Every system that accepts writes eventually hits a scenario where two readers get different answers. The model you chose at design time determines whether that—s an acceptable trade or a silent data corruption bug.
Use linearizability when:
- The data represents a real-world resource with a hard limit: inventory units, financial balances, available seats.
- Concurrent actors on stale data produces an irrecoverable error: two processes both see lock-not-held and both acquire a distributed lock.
- You need distributed leader election or consensus: Raft and Paxos are built on linearizable operations.
- The business cost of wrong data exceeds the latency cost of synchronization.
Use causal consistency when:
- You have comment/reply chains, threaded discussions, or dependent writes where order matters causally but not globally.
- Same-DC latency is acceptable but global synchronization is not.
- You want stronger guarantees than eventual without paying the full linearizability tax.
Use session consistency when:
- Users write data and immediately read it back (profile updates, cart modifications, form submissions).
- You use read replicas for scale-out but can't tolerate "your own write is invisible to you."
- This is the pragmatic default for 90% of user-facing web applications.
Use eventual consistency when:
- The data is aggregated or approximated by nature: view counts, recommendation rankings, search indexes.
- Writes happen at very high throughput and stale reads are acceptable (DNS TTL, CDN cache, social media feeds).
- You need maximum write availability and can tolerate brief inconsistency: you're refreshing a social feed and seeing a post from 2 seconds ago is acceptable.
- Business logic explicitly defers to conflict resolution: "last writer wins" for non-critical settings.
Avoid stronger-than-necessary consistency when:
- Writes are globally distributed: cross-region linearizability costs 100–500ms per write, destroying user experience for write-heavy workflows.
- The system must remain available during network partitions: linearizability requires coordination, which is impossible across a split network.
- Every millisecond of write latency is a user experience metric: e-commerce checkout, real-time gaming, high-frequency trading.
Match the model to the data's tolerance for wrongness — not to your engineering team's comfort level.
Session consistency is not the same as ACID transactions
ACID transactions guarantee consistency across multiple operations on multiple keys in a single atomic unit. Session consistency is scoped to a single client's reads and writes on potentially different nodes. Many applications mistake session consistency for a safety net against all consistency bugs — but a Read-Your-Writes guarantee says nothing about what two different clients see concurrently.
Real-World Examples
Google Spanner — Linearizability at Global Scale via TrueTime
Google Spanner achieves externally consistent (linearizable) reads and writes across data centers worldwide using a custom API called TrueTime. Google—s datacenters deploy dedicated time master servers equipped with GPS receivers and atomic clocks. Individual servers periodically query these time masters via TrueTime—s TT.now() RPC — returning not a single timestamp, but an interval [earliest, latest] bounded by uncertainty (typically 1–7ms).
Before a write commits, Spanner waits until the latest time is definitively in the past, guaranteeing strong causality across the globe without communication overhead. The result: full linearizability at the cost of a ~7ms commit hold. Spanner powers Google Ads, which processes $200+ billion per year — the correctness requirement justifies the latency.
Amazon Dynamo and DynamoDB — Eventual Consistency as a Design Principle
The original Dynamo (Amazon's internal key-value store, described in the 2007 paper) was designed explicitly for eventual consistency with tunable read/write quorums. The guiding insight: for Amazon's shopping cart, availability — the ability to add items even during regional outages — matters more than perfect consistency.
A customer seeing a cart with an extra item they didn't add (due to a merge conflict) is a minor inconvenience. A customer unable to add items to their cart costs revenue. I tell candidates to memorize that framing — it—s the clearest articulation of availability-over-consistency in any company—s public engineering documentation.
Today, DynamoDB offers per-request strongly consistent reads (charged at 2x read capacity units) — but the default is still eventually consistent because at Amazon's scale, the latency difference matters economically.
Cassandra at Apple — Tunable Consistency at 100K Nodes
Apple runs the world's largest Cassandra deployment: over 100,000 nodes serving iCloud. Cassandra exposes full control over consistency level per operation: reads and writes can independently use ONE (fastest, stale), QUORUM (W+R>N, strong), LOCAL_QUORUM (strong within a DC only), or ALL (strongest, requires every node up). Apple uses LOCAL_QUORUM for user-facing iCloud operations to guarantee freshness within a region while keeping cross-region reads eventually consistent.
At 100K nodes, choosing ALL would make every operation wait for the slowest node in the cluster — operationally impossible. Tunable consistency lets Apple dial exactly the guarantees different data types need.
The pattern across all three: they chose the weakest model their data could tolerate, not the strongest model they could build.
How This Shows Up in Interviews
Here—s the honest answer about what interviewers want to hear: name the specific model, justify the choice in one sentence, and flag the failure mode. Candidates who say —strong consistency— without naming linearizability or sequential, or who say —eventual— without mentioning a conflict resolution strategy, signal they—ve memorized a spectrum but haven—t thought through the implications.
When to proactively bring up consistency models:
- During requirements gathering: "Before I design the data layer, I want to understand the consistency requirement. When a user writes data, does the next read from a different user need to see it immediately? Could we tolerate 100ms stale data?"
- When proposing replication: "I'll add read replicas for scale-out. Worth noting: reads will become eventually consistent with a 50ms replication lag. For this use case — the user's own dashboard — I'd implement Read-Your-Writes routing so the author always sees their own changes."
- When a system has money or inventory: "For payment flows I'd use a strongly consistent store — something like CockroachDB or Spanner — because inventory decrements require linearizable compare-and-swap. Showing incorrect stock counts to concurrent purchasers creates overselling."
Depth expected at senior/staff level:
- Name the specific models (not just "strong" and "eventual") and know what distinguishes them
- Know the W+R>N quorum formula and when it breaks
- Explain the difference between consistency in the CAP sense (linearizability) and in the ACID sense (invariant preservation)
- Know at least two conflict resolution strategies and their failure modes
- Have specific data points: replication lag ranges, consistency overhead numbers, LWW clock skew risks
| Interviewer asks | Strong answer |
|---|---|
| "How do you handle write conflicts in your eventual consistency design?" | "Depends on the data type. For counters: use a CRDT PN-counter — mathematically conflict-free. For records where last-writer-wins is acceptable: use a Hybrid Logical Clock, not wall clock, to avoid drift-induced data loss. For anything where both writes must survive: use MVCC — store both versions and merge on next read, the way Dynamo handles cart items." |
| "What consistency level would you use for a Ticketmaster-style seat reservation?" | "Linearizability — specifically compare-and-swap on the seat's state. Two concurrent buyers must not both transition a seat from AVAILABLE to RESERVED. This requires either a serializable transaction or a distributed lock with linearizable semantics. The latency cost is 10–20ms per booking, which users in the checkout flow tolerate." |
| "Your Cassandra cluster is showing stale reads causing double-debits. What do you do?" | "First, switch the debit operation from ONE to QUORUM or LOCAL_QUORUM consistency. With N=3, W+R=1+1=2 < 3 means stale reads are possible; QUORUM gives W+R=4 > 3. If that's not enough, move the debit logic to a strongly consistent store (like a relational DB) and use Cassandra only for audit log reads. Never use eventual consistency for financial operations." |
| "Google Spanner vs. CockroachDB — when would you choose each?" | "Both offer linearizable distributed transactions. Spanner uses TrueTime (GPS/atomic clocks) for external consistency — available only on GCP. CockroachDB uses Hybrid Logical Clocks and achieves serializable isolation (1 level below external consistency) on any infrastructure. Choose Spanner if you're fully on GCP and need the 7ms commit overhead optimized for global scale. Choose CockroachDB for cloud-agnostic multi-region deployments." |
| "We have eventual consistency and we're fine — why would we ever change?" | "You're fine until one data type crosses the threshold where stale reads produce a business event: an oversold ticket, a double payment, a failed regulatory audit. The risk isn't the technology — it's that eventual consistency bugs are silent and show up months after deployment. The right time to verify your requirements is during design, not after a customer reports missing money." |
Test Your Understanding
Q1. Your e-commerce site uses eventual consistency for inventory. You have 50 units of a viral item in stock. In a 200ms consistency window, you receive 500 requests all reading qty=50. All 500 proceed to checkout. How many orders succeed? How many fail? What specific mechanism determines the cutoff? And why does simply adding more replica nodes make this problem worse, not better?
Q2. Your team routes all reads to the primary database to implement "read-your-writes." At 10K requests/second, this works. At 100K requests/second, the primary is at 95% CPU. A junior engineer proposes: "Let's add 10 read replicas but keep read-your-writes by routing writes AND the reads immediately following writes to the primary, and all other reads to replicas." What specifically breaks under this design, and what is the production-safe implementation?
Q3. Two concurrent checkout flows read the same seat row (status='available') for seat A1 at a concert. Both decide "available — proceed." Both fire UPDATE seats SET status='sold' WHERE seat_id='A1'. Both receive success. Now seat A1 has been sold twice. What consistency model would prevent this? Why doesn't a regular BEGIN; SELECT ... FOR UPDATE solve this in a distributed database with sharded tables?
Q4. You're designing a food delivery platform. A driver's GPS position is updated every 3 seconds, and 100,000 customers might be viewing that driver's position simultaneously. Your architect says: "We need strong consistency — customers must see the latest position." Your CTO counters: "Eventual consistency with a 3-second staleness bound is fine — the driver position is only relevant every GPS update anyway." Who is right? What one number makes this an easy decision?
Q5. A startup's single-node MySQL provides effective linearizability. They add 5 read replicas for scale. An engineer confirms: "All reads still go to a specific replica per session — we have session consistency." Three months later, after a maintenance restart, one replica was accidentally configured without read_committed isolation. Users of that replica start seeing dirty reads — values written by transactions that haven't yet committed. What consistency model does that replica now provide, and why is it weaker than eventual consistency?
Q6. You have a distributed like counter: 1 billion posts, each receiving up to 10,000 like events/second during viral moments. You start with a linearizable counter on a single Redis node. At 500K likes/second globally, the Redis primary saturates. A colleague suggests: "Shard the counter into 100 Redis nodes — each shard holds 1/100th of the load." What specific consistency problem does sharding create for a single counter, and show two designs that preserve accuracy under this load at different consistency trade-offs.
Q7. Your platform uses eventual consistency for user profile data. A data scientist notices that 0.02% of user records have a null email field — but your signup form requires a non-null email. You've confirmed the bug isn't in application code. Upon investigation, you find that a schema migration 3 months ago added an email column without a database-level NOT NULL constraint. What consistency failure mechanism explains how records with null email exist, and how would you prevent this class of bug going forward?
Q8. Two competing architecture proposals: (A) Use CockroachDB (linearizable) for your entire data layer. (B) Use Cassandra (eventual) with application-level checks. Your system processes financial transactions AND social activity feeds. Under what specific conditions does (A) become worse than (B), and what hybrid architecture resolves the tension without double the operational complexity?
Quick Recap
- A consistency model is a contract that defines which values are legally returnable for a read, given the history of writes — it's not a configuration flag, it's a fundamental property of your system design.
- The spectrum runs from linearizability (every read returns the most recent write globally; operations appear atomic) down to eventual consistency (replicas converge when writes stop; stale reads are normal and expected).
- Session consistency (Read-Your-Writes + monotonic reads) covers 90% of user-facing web application requirements at near-zero overhead — implement it with session tokens carrying write timestamps, not with routing all reads to the primary.
- The W+R>N quorum formula is how tunable consistency systems achieve strong consistency: with N=3 replicas, W=2 writes and R=2 reads guarantee at least one overlapping node has the latest value.
- Last-Write-Wins is dangerous because wall clock skew can silently order writes incorrectly — always use Hybrid Logical Clocks (HLC) or version vectors instead of raw timestamps for conflict resolution.
- Linearizability kills cross-region write performance — a linearizable commit from Asia to a 3-region cluster adds 100–500ms of physics-limited latency; use it only for data where stale reads cause real business or audit harm.
- In an interview, the signal is knowing which model to name, why it fits, and what the alternative costs — "we need strong consistency for inventory because concurrent stale reads produce overselling, implemented via compare-and-swap on the write path with W+R>N quorum reads" is a complete answer.
Related Concepts
- Replication — Replication creates the replicas that make consistency a problem to solve. Async WAL shipping is the mechanism that produces replication lag, which is the physical cause of stale reads under eventual consistency.
- CAP Theorem — The theoretical framework for why you must choose between linearizability and availability during a network partition. Consistency models dictate how far you slide down from the "C" in CAP when optimizing for availability and partition tolerance.
- Databases — Your database engine determines which consistency levels are even possible. MVCC in Postgres, LWT in Cassandra, TrueTime in Spanner — the implementation sets your ceiling.
- Caching — Cache invalidation is a consistency problem in disguise. A cache entry is an eventually consistent replica of the DB — TTL and event-driven invalidation are consistency mechanisms, even if they're never called that.
- Sharding — Cross-shard transactions require distributed protocols (2PC, Saga) that compound consistency challenges. Sharding changes what consistency is physically achievable per-operation.