CAP Theorem

TL;DR

CAP Theorem says a distributed system can guarantee at most two of three properties simultaneously: Consistency, Availability, and Partition Tolerance — never all three.
In practice, partition tolerance is non-negotiable: network partitions happen in every production distributed system. The real choice collapses to C (return an error rather than serve stale data) or A (return whatever data you have, even if stale).
CP systems (HBase, ZooKeeper, etcd, Spanner) sacrifice availability during a partition — they return errors rather than serve potentially inconsistent data.
AP systems (Cassandra, DynamoDB, CouchDB, Riak) sacrifice consistency during a partition — they return possibly stale data rather than fail requests.
Consistency in CAP means linearizability — every read returns the most recent write or an error — completely unrelated to ACID consistency and far stronger than eventual consistency. The CP vs AP decision is the real engineering choice in every distributed system.

The Problem It Solves

It is 11:42 PM on Cyber Monday. Your e-commerce platform runs in two data centers: US-East and EU-West. A BGP routing issue between AWS and a European telecom corrupts the inter-datacenter link.

For 90 seconds, neither data center can reach the other. Both regions keep running independently.

A user in Frankfurt clicks "Place Order." EU-West has their cart data. But their loyalty reward balance — updated by a purchase 30 seconds earlier in the US — hasn't replicated yet.

The EU-West node holds an outdated balance.

Your system must now make a choice it may never have consciously designed for:

Option A: Refuse to process the checkout until connectivity restores. The user sees an error. You lose the sale. Your data is correct.

Option B: Process the checkout using the stale balance. The loyalty discount is wrong. You make the sale. Your data is wrong.

There is no third option. Every distributed system picks one of these behaviors — whether the engineering team planned for it or not. The mistake I see most often is teams that discover their default behavior at 3 AM during an incident, not at the design whiteboard.

Two database nodes in US-East and EU-West separated by a broken network link. US-East shows balance=$2,000 (latest write) while EU-West shows balance=$1,000 (stale). A user request hits EU-West and faces an impossible choice: return stale data or return an error. — A network partition forces every distributed system into a binary choice: serve stale-but-available data, or be correct-but-unavailable. No architecture escapes this decision.

For every data store you sketch, the first question to answer is: CP or AP? This is the core of CAP Theorem — not an academic abstraction, but a specific operational decision that every distributed database makes, silently, the first time a network fails.

What Is It?

CAP Theorem, proposed by Eric Brewer in 2000 and formally proved by Gilbert and Lynch in 2002, states that any distributed data store can provide at most two of three guarantees simultaneously:

Consistency (C): Every read returns the most recent write, or returns an error. No node ever serves stale data. This means linearizability — the strongest possible consistency model. This is entirely different from the "C" in ACID (which refers to constraint enforcement) and has nothing to do with eventual consistency.
Availability (A): Every request to a non-failed node gets a non-error response. Not necessarily the most recent data — just a response, always. Note: CAP Availability is an absolute guarantee, not an SLA percentage. A system is either available in the CAP sense (responds to all requests) or it is not.
Partition Tolerance (P): The system continues operating even when network messages between nodes are lost, delayed, or reordered. The system does not require a perfectly reliable network to function.

Analogy: Think of two bank branches in different cities that share a central ledger via overnight courier. If the courier van breaks down (partition), each branch faces the same choice:

Consistency: Neither branch processes any transactions until the courier van is repaired and the ledger is re-synced. Customers are turned away. The ledger is never wrong.
Availability: Each branch keeps processing transactions from their last-known ledger copy. When the courier van is fixed, the branches reconcile. During the outage, some customers get incorrect balances.

The bank's choice depends on which outcome is worse: turning customers away, or occasionally showing wrong balances. Neither is zero-cost. My recommendation: before selecting any data store, explicitly state which failure mode your system can tolerate — and document that decision.

The CAP triangle with Consistency at top, Availability at bottom-left, and Partition Tolerance at bottom-right. The left edge (CA) shows traditional single-node RDBMS. The right edge (CP) shows ZooKeeper, HBase, etcd, and Spanner. The bottom edge (AP) shows Cassandra, DynamoDB, CouchDB, and Riak. — In real distributed systems, network partitions are unavoidable. CA without partition tolerance only works for single-node deployments. Every distributed system chooses either CP or AP.

How It Works

The Proof: Why You Can't Have All Three

The proof is short and worth understanding once — after it, CAP becomes intuitive rather than received wisdom. I find most engineers understand CAP conceptually but can't explain why you can't have all three — this proof closes that gap in about 30 seconds.

Imagine the simplest possible distributed system: two nodes, N1 and N2, both storing a variable x. Initially, x = 0 on both nodes.

Step 1: A client writes x = 1 to N1.
        N1 persists x = 1 locally.
        N1 tries to send the update to N2.
        But the network is partitioned — N2 never receives the write.

Step 2: Another client reads x from N2.

Now you must pick:
  If you want AVAILABILITY:
    N2 must respond. It returns x = 0.
    That is stale. Inconsistent. Wrong.

  If you want CONSISTENCY:
    N2 must return an error (or block until partition heals).
    It cannot return x = 1 — it hasn't received that write.
    So it cannot serve a response without violating consistency.

You cannot return x = 1 (correct) AND serve a response (available)
without N2 somehow knowing about the write — which requires
a working network. The network is broken. That's the partition.

Partition Tolerance means the system works despite this.
Given P is required, C and A cannot co-exist.

That is the complete proof. The partition is the forcing function. P is not optional in any real system running across multiple nodes, so the choice reduces to: sacrifice C or sacrifice A during a partition.

// Simplified: the binary branch every distributed system implements
async function read(key: string): Promise<Value | PartitionError> {
  const localValue = await localDb.get(key);
  const partitionDetected = !(await canReachQuorum());

  if (partitionDetected) {
    // CP BRANCH: sacrifice availability to guarantee correctness
    if (systemMode === "CP") {
      throw new PartitionError(
        "Quorum unreachable — cannot confirm data freshness"
      );
    }

    // AP BRANCH: sacrifice consistency to guarantee a response
    if (systemMode === "AP") {
      return {
        value: localValue,
        stale: true,
        lagMs: estimatedReplicationLag(),
      };
    }
  }

  // Normal path: no partition — quorum read guarantees strong consistency
  return await quorumRead(key);
}

The TypeScript above is a mental model — but the binary C-or-A branch it exposes is the actual decision your distributed system makes on every request during a partition. Name that choice explicitly in your design.

Key Components

Property	CAP Definition	Common Misunderstanding	Concrete Example
Consistency (C)	Every read returns the most recent write or an error. Formally: linearizability. The entire operation appears atomic — visible to all nodes simultaneously.	Confused with ACID consistency (schema constraints, referential integrity). These are completely different concepts.	After writing `balance = $200`, any read from any node must return `$200`. Returning the previous `$150` is a consistency violation.
Availability (A)	Every request to a non-failed node returns a non-error response. Binary — you respond or you don't.	Confused with high-availability SLAs (99.9% uptime). CAP Availability means every request, not most requests.	During a partition, every live node must respond to reads and writes, even if it cannot confirm data freshness.
Partition Tolerance (P)	The system continues to operate despite network partitions: messages lost, delayed, or nodes unreachable.	"With reliable enough infrastructure, partitions won't happen." Wrong. BGP flaps, switch failures, and NIC brownouts hit every cloud deployment.	US-East and EU-West lose inter-datacenter connectivity for 90 seconds. Both regions stay healthy internally and keep serving traffic.
Network Partition	A subset of nodes cannot communicate with another subset. Nodes are alive and running — the link between them is broken.	Confused with a total system outage (all nodes down). In a partition, nodes are healthy; only communication between them fails.	A top-of-rack switch failure splits a 6-node cluster into two groups of 3 that can't reach each other.
Quorum	A majority of nodes (⌊N/2⌋ + 1) that must agree before a read or write is confirmed. CP systems use quorum to maintain consistency.	"Use quorum everywhere to be safe." Quorum writes are slow and reduce write availability — only apply to data where correctness outweighs latency.	5-node cluster, quorum = 3 (⌊5/2⌋ + 1 = 2 + 1 = 3). A write fails if fewer than 3 nodes can be reached — ensuring no two disjoint partitions can both commit conflicting writes.
Eventual Consistency	AP systems' consistency model. Nodes may diverge temporarily during a partition, but converge to the same state once it heals.	Equivalent to "no consistency at all." Not true — eventual consistency has deterministic convergence rules (LWW, CRDTs); it just weakens ordering guarantees.	Two nodes write different values to the same key during a partition. When the partition heals, a deterministic conflict rule resolves which value survives.
Linearizability	The "C" in CAP. Operations appear to execute atomically at a single point in time — visible to all nodes in wall-clock order. The strongest possible consistency model.	"Linearizability is just another name for transactional consistency." No — ACID transactions enforce linearizability within one database; CAP linearizability refers to the distributed multi-node case.	A `transfer $100 from A to B` transaction either fully completes or fully fails. No read from any node can see the debit on A without seeing the corresponding credit on B.

Types / Variations

The real-world expression of CAP is two families of distributed systems, each optimized for their side of the partition decision.

Two-panel diagram. Left panel: CP system behavior during partition. Node B cannot form quorum and returns QuorumError to the user — consistent but unavailable. Right panel: AP system behavior during partition. Node B returns stale data with a warning — available but potentially inconsistent. — CP systems protect data correctness at the cost of service errors. AP systems protect service availability at the cost of potentially stale or conflicting reads. Neither is universally better — the right choice depends on what wrong data costs your business.

CP Systems: Refuse Rather Than Lie

A CP system's contract: "I would rather turn you away than give you wrong information."

ZooKeeper is the canonical CP system. ZooKeeper is a coordination service used for distributed locking, leader election, and configuration distribution. If ZooKeeper's quorum is disrupted and a minority of nodes cannot confirm the latest state, those nodes stop serving reads entirely — they return ConnectionLossException.

The reasoning: a stale leadership result is worse than no result at all. If two nodes both believe they are the leader because ZooKeeper served a stale election result, both execute critical sections simultaneously — a race condition that causes data corruption. ZooKeeper powers distributed coordination in Kafka, HBase, Hadoop, and every Kubernetes cluster.

When candidates list ZooKeeper in a design, I always follow up: "what happens to your service when ZooKeeper loses quorum?" If they can answer that, they understand CP.

When CP is the right call:

Distributed locks — a stale lock state means two processes enter a critical section simultaneously
Payment confirmation — a stale "payment succeeded" that hasn't propagated is a silent double-charge
Inventory decrements at point-of-sale — a stale "1 item in stock" at true zero inventory oversells
Auth token revocation — a stale "token valid" for 200ms after revocation is a security vulnerability
Leader election — split-brain leadership causes duplicate task execution and data corruption

AP Systems: Return and Reconcile

An AP system's contract: "I would rather give you something than make you wait. We will sort out any conflicts later."

Cassandra is the canonical AP system. Cassandra writes to any available node without waiting for quorum confirmation. Reads also return immediately from any available replica.

During a partition, each side accepts writes independently. When the partition heals, Cassandra's anti-entropy repair process reconciles divergent data using Last Write Wins (LWW) timestamp-based conflict resolution.

When AP is the right call:

User activity feeds and timelines — stale posts are invisible to users; the next refresh catches up
Recommendation engines — a ~500ms-stale recommendation is indistinguishable from fresh
View, like, and share counts — approximate counts are acceptable; counters merge cleanly after partition
Shopping cart contents — slightly stale cart data is annoying but not irreversible
DNS — responses can be cached (stale) for minutes without meaningful consequences to end users

Pick CP when stale data costs more than downtime. Pick AP when a timeout costs more than stale data. Most production systems use both — the choice is per data type, not per system.

CA is not a meaningful distributed option

CA — Consistency + Availability without Partition Tolerance — is only achievable on a single machine. The moment you have two nodes communicating over a network, partitions can and will occur. Anyone describing their distributed system as "CA" either means they're running a single-node database (valid) or doesn't understand CAP. In interviews, if you see CA listed as a design option for a multi-node distributed system, flag it.

PACELC: The Tradeoff Beyond Partitions

Ok, but here's the thing most people miss. CAP only defines system behavior during a partition — a relatively rare event. But real systems have an equally important tradeoff on the normal, no-partition path: latency vs. consistency for every single request.

PACELC (proposed by Daniel Abadi, 2012) extends CAP:

Partition → choose A (Availability) or C (Consistency)

Else (normal operation) → choose L (Latency) or C (Consistency)

System	Partition behavior (P→A or P→C)	Normal operation (E→L or E→C)
DynamoDB	PA — stays available during partition	EL — eventual consistency by default, fast writes
Cassandra	PA — any node responds during partition	EL — LOCAL_ONE reads skip quorum for speed
HBase	PC — requires HDFS quorum, fails if lost	EC — all writes go through quorum
ZooKeeper	PC — refuses reads if quorum lost	EC — all reads confirm with quorum (Zab protocol)
Google Spanner	PC — refuses if quorum unreachable	EC — TrueTime uncertainty window ~7ms per write
MongoDB	Configurable: `w:majority` → PC; `w:1` → PA	EL or EC depending on `readConcern`

Understanding PACELC helps you answer the follow-up most candidates can't: "What is your system's latency vs. consistency tradeoff on the happy path — when there is no partition?" I've seen this exact question stump senior engineers who had a confident CP vs. AP answer just thirty seconds earlier.

Trade-offs

Pros	Cons
Forces an explicit design decision — "are stale reads acceptable for this data?" is a question every distributed system should answer intentionally	The CAP framing is binary; real systems are more nuanced — PACELC captures the normal-path latency tradeoff that CAP ignores
CP systems give strong correctness guarantees — safety-critical data stays accurate even during failures	CP systems reduce availability — users receive errors during partitions, which is a measurable SLA degradation
AP systems maintain service availability — users keep receiving responses even when inter-node connectivity is lost	AP systems require conflict resolution (LWW, vector clocks, CRDTs) — all operationally complex to implement correctly
Framework makes database selection principled: pick Cassandra for feeds (AP) and ZooKeeper for locks (CP), not by marketing	CAP doesn't address latency — a system can be CP and still have 200ms writes; PACELC is needed for the full picture
Helps identify when a single-node database should stay single-node rather than be distributed unnecessarily	Many "consistency violations" in AP systems can be avoided with session consistency — making the CP/AP binary less relevant at the application layer

The fundamental tension here is data correctness vs. service availability — every replica you add to improve availability introduces a new window where that replica's data might diverge from the primary. The right answer is almost always different per data type: AP for feeds and analytics where stale data is invisible, CP for money and locks where stale data has real-world consequences. My recommendation: build a two-column table for your design — data type on the left, CP or AP on the right — and be ready to defend every row.

When to Use It / When to Avoid It

So when does this actually matter? Here's the decision in plain terms.

Choose CP (sacrifice availability during partitions) when:

Data correctness is more expensive than downtime: financial balances, inventory counts, session tokens, distributed locks
The downstream action is irreversible: shipping an order, charging a card, sending a legal document, dispatching emergency services
Regulatory compliance requires accurate data at all times: banking, healthcare (patient records), payment processing
Your system coordinates distributed behavior: leader election, task scheduling, configuration management

Choose AP (sacrifice consistency during partitions) when:

Stale data is invisible or inconsequential to end users: social feeds, recommendations, personalization, activity logs
You need multi-region active-active write throughput: global gaming leaderboards, content delivery, analytics ingestion
Your data naturally handles eventual consistency: counters that merge, sets that union, strings where last-write-wins is correct
Your consistency requirement can tolerate a bounded staleness window (e.g., "never more than 5 seconds behind")

If a data type doesn't map cleanly onto either list, default to CP — a wrong AP choice is often irreversible, while a wrong CP choice is merely slower.

Interview tip: it's a per-table decision, not a per-system decision

Don't say "we'll use an AP database." Say: "The payments table uses quorum reads and writes — CP behavior to prevent double-charges. The user timeline table uses eventual consistency — AP behavior, because a 200ms-stale timeline is invisible to users." Most production systems at scale use CP semantics for critical data and AP semantics for high-volume, low-stakes data — sometimes within the same database cluster by varying the read/write consistency level per query.

Real-World Examples

Here's how the best engineering teams actually apply this. In every case below, the CP vs. AP decision was made deliberately, per data type — not as a blanket system choice.

Amazon DynamoDB — AP with tunable consistency by design

Amazon built Dynamo after a 2004 incident where a shopping cart service became unavailable due to a cascading consistency failure. The original Dynamo paper (DeCandia et al., 2007) explicitly chose AP: "Dynamo targets applications that need to be highly available and whose designers are willing to accept stale data." DynamoDB returns stale data during partitions and offers eventually consistent reads (fastest, cheapest) and strongly consistent reads (slower, 2× the cost).

At peak, DynamoDB handles over 89.2 million requests per second globally. Amazon uses DynamoDB for shopping carts, user sessions, and recommendation data — but uses Aurora or custom transactional stores (CP semantics) for financial transaction finality. The lesson: Amazon consciously operates a mixed CAP architecture — AP for volume, CP for money.

Google Spanner — CP at planetary scale

Google Spanner, launched in 2012, achieves external consistency — effectively global linearizability across distributed transactions — across data centers worldwide. It uses TrueTime — GPS-disciplined atomic clocks in every Google data center — to bound clock uncertainty to ±4ms. Every write waits for TrueTime's uncertainty window to close before committing, guaranteeing that any subsequent read from any datacenter on Earth sees the write.

The cost: each write adds a mandatory ~7ms of latency — the TrueTime uncertainty window. Spanner sacrifices those 7ms to guarantee that a read anywhere in the world, at any time after a write confirmation, will see the write. Google uses Spanner for AdWords billing and Google Fi subscriptions — billions of dollars in transactions per day held at global CP guarantees with 99.999% availability.

What I find remarkable about Spanner is that it doesn't escape CAP — it's still CP. It just makes the availability sacrifice so small that the cost of correctness becomes nearly imperceptible.

Netflix on Cassandra — AP with intentional staleness bounds

Netflix migrated from Oracle to Cassandra for user activity data in 2011. At peak, Netflix writes over 1 million events per second to Cassandra. They explicitly chose AP: "We accept stale data in the recommendation engine to never see errors. A user seeing a recommendation 500ms behind reality is invisible. A user seeing a 503 error in the recommendation engine is not."

Netflix tunes Cassandra's consistency levels per query type: QUORUM for user preferences (few writes, correctness matters) and ONE for activity events (high volume, staleness acceptable). The Cassandra cluster spans 3+ AWS regions active-active — a topology that would be physically impossible with a CP system, where quorum across 3 continents would add ~200ms of write latency per event. The pattern across Amazon, Google, and Netflix is the same: CP for money and coordination, AP for everything else at scale.

How This Shows Up in Interviews

When to bring up CAP proactively

In any design involving multiple nodes storing the same data — replicated databases, distributed caches, multi-region deployments — state your CAP choice within the first components you sketch. "We'll use Cassandra here, which is an AP system. During a partition, reads might return data up to 200ms stale — that's acceptable for activity feeds. For payment confirmation, we'll use a separate CP service backed by a quorum-configured store." That one sentence separates candidates who understand distributed systems from those who just know database names.

The most common interview error on this topic is confusing CAP Consistency (linearizability) with ACID Consistency (constraint enforcement) — they are completely unrelated. Get that distinction locked down before anything else.

Depth expected at senior/staff level:

Know the difference between CAP Consistency (linearizability) and ACID Consistency (constraint enforcement) — they are unrelated; confusing them is the most common interview error on this topic
Identify real partitions that happen in production: BGP flaps, AZ isolation, hot-spot congestion dropping packets, kernel networking bugs
Choose CP vs AP per data type, not per system — explain which of your design's data is CP and which is AP
Know PACELC as the extension of CAP — the consistency-latency tradeoff matters on the normal path too, not only during partitions
Explain conflict resolution in AP systems (LWW, vector clocks, CRDTs) and when each breaks down

When an interviewer asks "can we avoid CAP with a better network?" — that's your cue. Address it directly: improving network reliability reduces partition frequency, not partition possibility.

Common follow-up questions and strong answers:

Interviewer asks	Strong answer
"Why can't we just have all three?"	"Partition tolerance isn't optional — every production distributed system experiences network partitions. Given P is required, the choice reduces to C or A during a partition. Two nodes cannot be consistent (agree on the same value) and available (respond immediately) if they can't communicate with each other."
"If we use a single-region database, does CAP still apply?"	"Within a single region, partition probability drops dramatically, but partitions still occur — switch failures, NIC brownouts, kernel networking bugs. More importantly, a single-node database isn't distributed, so CAP doesn't technically apply — but the moment you add a replica for high availability, CAP applies immediately. The replica is now a second node and the link between them can partition."
"Is Cassandra always AP? Can it be CP?"	"Yes — Cassandra's consistency is tunable per query. With `CONSISTENCY ALL`, Cassandra requires every node to acknowledge a write; if any node is down, the write fails — CP behavior. With `CONSISTENCY ONE`, only one node acknowledges — AP behavior. Most production deployments mix levels: `QUORUM` for critical user data, `ONE` for high-volume telemetry."
"What's the difference between CAP and PACELC?"	"CAP only describes behavior during partition events. PACELC extends it: on the normal path (no partition), every replication strategy still trades consistency for latency. Synchronous replication adds 20–100ms per write waiting for replicas to confirm. Async replication adds near-zero latency but serves potentially stale reads. PACELC makes this normal-path tradeoff explicit — CAP misses it entirely."
"How does Spanner achieve global strong consistency without violating CAP?"	"Spanner doesn't violate CAP — it's CP, choosing consistency over availability. What makes it extraordinary is how small its availability sacrifice is: TrueTime bounds clock uncertainty to ~4ms, so the 'wait for consistency' is only 7ms per write rather than hundreds of milliseconds. If Spanner's global quorum is disrupted, writes fail — it's available 99.999% of the time, but the 0.001% is a real CP unavailability event."

Test Your Understanding

Quick Recap

CAP Theorem states a distributed system can guarantee at most two of: Consistency (linearizability), Availability (always respond), and Partition Tolerance (survive network failures) — never all three simultaneously.
Partition Tolerance is non-negotiable in any real distributed system — network failures happen at every scale, in every cloud, on every hardware — the true choice is always CP vs AP.
CP systems (ZooKeeper, HBase, etcd, Spanner) sacrifice availability during partitions — they return errors rather than serve data that might be stale, because in their domains (locks, coordination, financial records), wrong data is worse than no data.
AP systems (Cassandra, DynamoDB, CouchDB, Riak) sacrifice consistency during partitions — they return their local view of data and converge after the partition heals via LWW or vector clock conflict resolution.
CAP Consistency (linearizability) is completely unrelated to ACID Consistency (constraint enforcement) — confusing the two is the most common and most costly interview error on distributed systems topics.
PACELC extends CAP: even without a partition, every distributed replication strategy trades latency for consistency on the normal path — synchronous quorum writes are consistent but slow; async replication is fast but stale.
The right answer is almost always per-data-type: CP for money, locks, auth tokens, and safety-critical data; AP for feeds, counts, recommendations, and telemetry where bounded staleness is acceptable.

Consistency Models — CAP Consistency is one point on a full spectrum. Eventual, monotonic, session, causal, and linearizable consistency are the complete model — reading this article next shows how to select the right model per data type rather than defaulting to the strongest or weakest.
Replication — The mechanical reality behind CP vs AP choices: how primary-replica setups implement replication lag, semi-sync replication, quorum writes, and WAL-based consistency — the actual system behavior that embodies the CAP tradeoff.
Databases — SQL vs NoSQL framing maps directly onto CAP choices: most NoSQL databases explicitly chose AP for horizontal scale; SQL databases historically chose either CA (single-node) or CP (distributed, strict quorum). Understanding CAP makes database selection principled.
Sharding — Sharding distributes writes horizontally, but each shard is itself a distributed system making its own CAP choice. Understanding CAP per-shard is necessary for designing correct sharded architectures where cross-shard transactions need explicit consistency models.
Microservices — In a microservices architecture, each service's data store makes its own CAP choice independently. Cross-service eventual consistency (saga pattern) emerges from composing AP-style services — CAP understanding is prerequisite to understanding saga correctness and failure compensation.

TL;DR

CAP Theorem says a distributed system can guarantee at most two of three properties simultaneously: Consistency, Availability, and Partition Tolerance — never all three.
In practice, partition tolerance is non-negotiable: network partitions happen in every production distributed system. The real choice collapses to C (return an error rather than serve stale data) or A (return whatever data you have, even if stale).
CP systems (HBase, ZooKeeper, etcd, Spanner) sacrifice availability during a partition — they return errors rather than serve potentially inconsistent data.
AP systems (Cassandra, DynamoDB, CouchDB, Riak) sacrifice consistency during a partition — they return possibly stale data rather than fail requests.
Consistency in CAP means linearizability — every read returns the most recent write or an error — completely unrelated to ACID consistency and far stronger than eventual consistency. The CP vs AP decision is the real engineering choice in every distributed system.

The Problem It Solves

It is 11:42 PM on Cyber Monday. Your e-commerce platform runs in two data centers: US-East and EU-West. A BGP routing issue between AWS and a European telecom corrupts the inter-datacenter link.

For 90 seconds, neither data center can reach the other. Both regions keep running independently.

A user in Frankfurt clicks "Place Order." EU-West has their cart data. But their loyalty reward balance — updated by a purchase 30 seconds earlier in the US — hasn't replicated yet.

The EU-West node holds an outdated balance.

Your system must now make a choice it may never have consciously designed for:

Option A: Refuse to process the checkout until connectivity restores. The user sees an error. You lose the sale. Your data is correct.

Option B: Process the checkout using the stale balance. The loyalty discount is wrong. You make the sale. Your data is wrong.

What Is It?

CAP Theorem, proposed by Eric Brewer in 2000 and formally proved by Gilbert and Lynch in 2002, states that any distributed data store can provide at most two of three guarantees simultaneously:

Consistency (C): Every read returns the most recent write, or returns an error. No node ever serves stale data. This means linearizability — the strongest possible consistency model. This is entirely different from the "C" in ACID (which refers to constraint enforcement) and has nothing to do with eventual consistency.
Availability (A): Every request to a non-failed node gets a non-error response. Not necessarily the most recent data — just a response, always. Note: CAP Availability is an absolute guarantee, not an SLA percentage. A system is either available in the CAP sense (responds to all requests) or it is not.
Partition Tolerance (P): The system continues operating even when network messages between nodes are lost, delayed, or reordered. The system does not require a perfectly reliable network to function.

Analogy: Think of two bank branches in different cities that share a central ledger via overnight courier. If the courier van breaks down (partition), each branch faces the same choice:

Consistency: Neither branch processes any transactions until the courier van is repaired and the ledger is re-synced. Customers are turned away. The ledger is never wrong.
Availability: Each branch keeps processing transactions from their last-known ledger copy. When the courier van is fixed, the branches reconcile. During the outage, some customers get incorrect balances.

How It Works

The Proof: Why You Can't Have All Three

Imagine the simplest possible distributed system: two nodes, N1 and N2, both storing a variable x. Initially, x = 0 on both nodes.

Step 1: A client writes x = 1 to N1.
        N1 persists x = 1 locally.
        N1 tries to send the update to N2.
        But the network is partitioned — N2 never receives the write.

Step 2: Another client reads x from N2.

Now you must pick:
  If you want AVAILABILITY:
    N2 must respond. It returns x = 0.
    That is stale. Inconsistent. Wrong.

  If you want CONSISTENCY:
    N2 must return an error (or block until partition heals).
    It cannot return x = 1 — it hasn't received that write.
    So it cannot serve a response without violating consistency.

You cannot return x = 1 (correct) AND serve a response (available)
without N2 somehow knowing about the write — which requires
a working network. The network is broken. That's the partition.

Partition Tolerance means the system works despite this.
Given P is required, C and A cannot co-exist.

// Simplified: the binary branch every distributed system implements
async function read(key: string): Promise<Value | PartitionError> {
  const localValue = await localDb.get(key);
  const partitionDetected = !(await canReachQuorum());

  if (partitionDetected) {
    // CP BRANCH: sacrifice availability to guarantee correctness
    if (systemMode === "CP") {
      throw new PartitionError(
        "Quorum unreachable — cannot confirm data freshness"
      );
    }

    // AP BRANCH: sacrifice consistency to guarantee a response
    if (systemMode === "AP") {
      return {
        value: localValue,
        stale: true,
        lagMs: estimatedReplicationLag(),
      };
    }
  }

  // Normal path: no partition — quorum read guarantees strong consistency
  return await quorumRead(key);
}

Key Components

Property	CAP Definition	Common Misunderstanding	Concrete Example
Consistency (C)	Every read returns the most recent write or an error. Formally: linearizability. The entire operation appears atomic — visible to all nodes simultaneously.	Confused with ACID consistency (schema constraints, referential integrity). These are completely different concepts.	After writing `balance = $200`, any read from any node must return `$200`. Returning the previous `$150` is a consistency violation.
Availability (A)	Every request to a non-failed node returns a non-error response. Binary — you respond or you don't.	Confused with high-availability SLAs (99.9% uptime). CAP Availability means every request, not most requests.	During a partition, every live node must respond to reads and writes, even if it cannot confirm data freshness.
Partition Tolerance (P)	The system continues to operate despite network partitions: messages lost, delayed, or nodes unreachable.	"With reliable enough infrastructure, partitions won't happen." Wrong. BGP flaps, switch failures, and NIC brownouts hit every cloud deployment.	US-East and EU-West lose inter-datacenter connectivity for 90 seconds. Both regions stay healthy internally and keep serving traffic.
Network Partition	A subset of nodes cannot communicate with another subset. Nodes are alive and running — the link between them is broken.	Confused with a total system outage (all nodes down). In a partition, nodes are healthy; only communication between them fails.	A top-of-rack switch failure splits a 6-node cluster into two groups of 3 that can't reach each other.
Quorum	A majority of nodes (⌊N/2⌋ + 1) that must agree before a read or write is confirmed. CP systems use quorum to maintain consistency.	"Use quorum everywhere to be safe." Quorum writes are slow and reduce write availability — only apply to data where correctness outweighs latency.	5-node cluster, quorum = 3 (⌊5/2⌋ + 1 = 2 + 1 = 3). A write fails if fewer than 3 nodes can be reached — ensuring no two disjoint partitions can both commit conflicting writes.
Eventual Consistency	AP systems' consistency model. Nodes may diverge temporarily during a partition, but converge to the same state once it heals.	Equivalent to "no consistency at all." Not true — eventual consistency has deterministic convergence rules (LWW, CRDTs); it just weakens ordering guarantees.	Two nodes write different values to the same key during a partition. When the partition heals, a deterministic conflict rule resolves which value survives.
Linearizability	The "C" in CAP. Operations appear to execute atomically at a single point in time — visible to all nodes in wall-clock order. The strongest possible consistency model.	"Linearizability is just another name for transactional consistency." No — ACID transactions enforce linearizability within one database; CAP linearizability refers to the distributed multi-node case.	A `transfer $100 from A to B` transaction either fully completes or fully fails. No read from any node can see the debit on A without seeing the corresponding credit on B.

Types / Variations

The real-world expression of CAP is two families of distributed systems, each optimized for their side of the partition decision.

CP Systems: Refuse Rather Than Lie

A CP system's contract: "I would rather turn you away than give you wrong information."

When candidates list ZooKeeper in a design, I always follow up: "what happens to your service when ZooKeeper loses quorum?" If they can answer that, they understand CP.

When CP is the right call:

Distributed locks — a stale lock state means two processes enter a critical section simultaneously
Payment confirmation — a stale "payment succeeded" that hasn't propagated is a silent double-charge
Inventory decrements at point-of-sale — a stale "1 item in stock" at true zero inventory oversells
Auth token revocation — a stale "token valid" for 200ms after revocation is a security vulnerability
Leader election — split-brain leadership causes duplicate task execution and data corruption

AP Systems: Return and Reconcile

An AP system's contract: "I would rather give you something than make you wait. We will sort out any conflicts later."

Cassandra is the canonical AP system. Cassandra writes to any available node without waiting for quorum confirmation. Reads also return immediately from any available replica.

When AP is the right call:

User activity feeds and timelines — stale posts are invisible to users; the next refresh catches up
Recommendation engines — a ~500ms-stale recommendation is indistinguishable from fresh
View, like, and share counts — approximate counts are acceptable; counters merge cleanly after partition
Shopping cart contents — slightly stale cart data is annoying but not irreversible
DNS — responses can be cached (stale) for minutes without meaningful consequences to end users

Pick CP when stale data costs more than downtime. Pick AP when a timeout costs more than stale data. Most production systems use both — the choice is per data type, not per system.

CA is not a meaningful distributed option

PACELC: The Tradeoff Beyond Partitions

PACELC (proposed by Daniel Abadi, 2012) extends CAP:

Partition → choose A (Availability) or C (Consistency)

Else (normal operation) → choose L (Latency) or C (Consistency)

System	Partition behavior (P→A or P→C)	Normal operation (E→L or E→C)
DynamoDB	PA — stays available during partition	EL — eventual consistency by default, fast writes
Cassandra	PA — any node responds during partition	EL — LOCAL_ONE reads skip quorum for speed
HBase	PC — requires HDFS quorum, fails if lost	EC — all writes go through quorum
ZooKeeper	PC — refuses reads if quorum lost	EC — all reads confirm with quorum (Zab protocol)
Google Spanner	PC — refuses if quorum unreachable	EC — TrueTime uncertainty window ~7ms per write
MongoDB	Configurable: `w:majority` → PC; `w:1` → PA	EL or EC depending on `readConcern`

Trade-offs

Pros	Cons
Forces an explicit design decision — "are stale reads acceptable for this data?" is a question every distributed system should answer intentionally	The CAP framing is binary; real systems are more nuanced — PACELC captures the normal-path latency tradeoff that CAP ignores
CP systems give strong correctness guarantees — safety-critical data stays accurate even during failures	CP systems reduce availability — users receive errors during partitions, which is a measurable SLA degradation
AP systems maintain service availability — users keep receiving responses even when inter-node connectivity is lost	AP systems require conflict resolution (LWW, vector clocks, CRDTs) — all operationally complex to implement correctly
Framework makes database selection principled: pick Cassandra for feeds (AP) and ZooKeeper for locks (CP), not by marketing	CAP doesn't address latency — a system can be CP and still have 200ms writes; PACELC is needed for the full picture
Helps identify when a single-node database should stay single-node rather than be distributed unnecessarily	Many "consistency violations" in AP systems can be avoided with session consistency — making the CP/AP binary less relevant at the application layer

When to Use It / When to Avoid It

So when does this actually matter? Here's the decision in plain terms.

Choose CP (sacrifice availability during partitions) when:

Data correctness is more expensive than downtime: financial balances, inventory counts, session tokens, distributed locks
The downstream action is irreversible: shipping an order, charging a card, sending a legal document, dispatching emergency services
Regulatory compliance requires accurate data at all times: banking, healthcare (patient records), payment processing
Your system coordinates distributed behavior: leader election, task scheduling, configuration management

Choose AP (sacrifice consistency during partitions) when:

Stale data is invisible or inconsequential to end users: social feeds, recommendations, personalization, activity logs
You need multi-region active-active write throughput: global gaming leaderboards, content delivery, analytics ingestion
Your data naturally handles eventual consistency: counters that merge, sets that union, strings where last-write-wins is correct
Your consistency requirement can tolerate a bounded staleness window (e.g., "never more than 5 seconds behind")

If a data type doesn't map cleanly onto either list, default to CP — a wrong AP choice is often irreversible, while a wrong CP choice is merely slower.

Interview tip: it's a per-table decision, not a per-system decision

Real-World Examples

Here's how the best engineering teams actually apply this. In every case below, the CP vs. AP decision was made deliberately, per data type — not as a blanket system choice.

Amazon DynamoDB — AP with tunable consistency by design

Google Spanner — CP at planetary scale

What I find remarkable about Spanner is that it doesn't escape CAP — it's still CP. It just makes the availability sacrifice so small that the cost of correctness becomes nearly imperceptible.

Netflix on Cassandra — AP with intentional staleness bounds

How This Shows Up in Interviews

When to bring up CAP proactively

Depth expected at senior/staff level:

Know the difference between CAP Consistency (linearizability) and ACID Consistency (constraint enforcement) — they are unrelated; confusing them is the most common interview error on this topic
Identify real partitions that happen in production: BGP flaps, AZ isolation, hot-spot congestion dropping packets, kernel networking bugs
Choose CP vs AP per data type, not per system — explain which of your design's data is CP and which is AP
Know PACELC as the extension of CAP — the consistency-latency tradeoff matters on the normal path too, not only during partitions
Explain conflict resolution in AP systems (LWW, vector clocks, CRDTs) and when each breaks down

When an interviewer asks "can we avoid CAP with a better network?" — that's your cue. Address it directly: improving network reliability reduces partition frequency, not partition possibility.

Common follow-up questions and strong answers:

Interviewer asks	Strong answer
"Why can't we just have all three?"	"Partition tolerance isn't optional — every production distributed system experiences network partitions. Given P is required, the choice reduces to C or A during a partition. Two nodes cannot be consistent (agree on the same value) and available (respond immediately) if they can't communicate with each other."
"If we use a single-region database, does CAP still apply?"	"Within a single region, partition probability drops dramatically, but partitions still occur — switch failures, NIC brownouts, kernel networking bugs. More importantly, a single-node database isn't distributed, so CAP doesn't technically apply — but the moment you add a replica for high availability, CAP applies immediately. The replica is now a second node and the link between them can partition."
"Is Cassandra always AP? Can it be CP?"	"Yes — Cassandra's consistency is tunable per query. With `CONSISTENCY ALL`, Cassandra requires every node to acknowledge a write; if any node is down, the write fails — CP behavior. With `CONSISTENCY ONE`, only one node acknowledges — AP behavior. Most production deployments mix levels: `QUORUM` for critical user data, `ONE` for high-volume telemetry."
"What's the difference between CAP and PACELC?"	"CAP only describes behavior during partition events. PACELC extends it: on the normal path (no partition), every replication strategy still trades consistency for latency. Synchronous replication adds 20–100ms per write waiting for replicas to confirm. Async replication adds near-zero latency but serves potentially stale reads. PACELC makes this normal-path tradeoff explicit — CAP misses it entirely."
"How does Spanner achieve global strong consistency without violating CAP?"	"Spanner doesn't violate CAP — it's CP, choosing consistency over availability. What makes it extraordinary is how small its availability sacrifice is: TrueTime bounds clock uncertainty to ~4ms, so the 'wait for consistency' is only 7ms per write rather than hundreds of milliseconds. If Spanner's global quorum is disrupted, writes fail — it's available 99.999% of the time, but the 0.001% is a real CP unavailability event."

Test Your Understanding

Quick Recap

CAP Theorem states a distributed system can guarantee at most two of: Consistency (linearizability), Availability (always respond), and Partition Tolerance (survive network failures) — never all three simultaneously.
Partition Tolerance is non-negotiable in any real distributed system — network failures happen at every scale, in every cloud, on every hardware — the true choice is always CP vs AP.
CP systems (ZooKeeper, HBase, etcd, Spanner) sacrifice availability during partitions — they return errors rather than serve data that might be stale, because in their domains (locks, coordination, financial records), wrong data is worse than no data.
AP systems (Cassandra, DynamoDB, CouchDB, Riak) sacrifice consistency during partitions — they return their local view of data and converge after the partition heals via LWW or vector clock conflict resolution.
CAP Consistency (linearizability) is completely unrelated to ACID Consistency (constraint enforcement) — confusing the two is the most common and most costly interview error on distributed systems topics.
PACELC extends CAP: even without a partition, every distributed replication strategy trades latency for consistency on the normal path — synchronous quorum writes are consistent but slow; async replication is fast but stale.
The right answer is almost always per-data-type: CP for money, locks, auth tokens, and safety-critical data; AP for feeds, counts, recommendations, and telemetry where bounded staleness is acceptable.

Consistency Models — CAP Consistency is one point on a full spectrum. Eventual, monotonic, session, causal, and linearizable consistency are the complete model — reading this article next shows how to select the right model per data type rather than defaulting to the strongest or weakest.
Replication — The mechanical reality behind CP vs AP choices: how primary-replica setups implement replication lag, semi-sync replication, quorum writes, and WAL-based consistency — the actual system behavior that embodies the CAP tradeoff.
Databases — SQL vs NoSQL framing maps directly onto CAP choices: most NoSQL databases explicitly chose AP for horizontal scale; SQL databases historically chose either CA (single-node) or CP (distributed, strict quorum). Understanding CAP makes database selection principled.
Sharding — Sharding distributes writes horizontally, but each shard is itself a distributed system making its own CAP choice. Understanding CAP per-shard is necessary for designing correct sharded architectures where cross-shard transactions need explicit consistency models.
Microservices — In a microservices architecture, each service's data store makes its own CAP choice independently. Cross-service eventual consistency (saga pattern) emerges from composing AP-style services — CAP understanding is prerequisite to understanding saga correctness and failure compensation.

CAP Theorem

TL;DR

The Problem It Solves

What Is It?

How It Works

The Proof: Why You Can't Have All Three

Key Components

Types / Variations

CP Systems: Refuse Rather Than Lie

AP Systems: Return and Reconcile

PACELC: The Tradeoff Beyond Partitions

Trade-offs

When to Use It / When to Avoid It

Real-World Examples

How This Shows Up in Interviews

Test Your Understanding

Quick Recap

Comments

CAP Theorem

TL;DR

The Problem It Solves

What Is It?

How It Works

The Proof: Why You Can't Have All Three

Key Components

Types / Variations

CP Systems: Refuse Rather Than Lie

AP Systems: Return and Reconcile

PACELC: The Tradeoff Beyond Partitions

Trade-offs

When to Use It / When to Avoid It

Real-World Examples

How This Shows Up in Interviews

Test Your Understanding

Quick Recap

Comments