The real tradeoff at the heart of the CAP theorem, what it means to choose availability over consistency during a partition, which real systems sit where on this spectrum, and how PACELC extends the picture.
23 min read2026-04-04hardavailabilityconsistencycap-theorempacelcdistributed-systems
Never for writes. Read replicas for dashboards are fine
Social media feeds
Overkill. A slightly stale like count is acceptable
Default. An unavailable feed is worse than a stale one
Configuration/metadata
Strongly prefer. Stale config can route to dead nodes
Only if config changes are rare and convergence is fast
User sessions
If session = auth token with permissions
If session = UI preferences or cart state
Inventory/booking
Yes, for the final decrement (prevent overselling)
For catalog browsing and availability estimates
Most real systems don't live at the extremes. Tunable consistency (per-query in Cassandra, per-read in DynamoDB) lets you pick the right tradeoff for each operation.
It's 3 AM. Your payment service's east-coast datacenter loses connectivity to the west-coast datacenter. A customer in New York submits a $500 payment. The east-coast node has $500 in the account balance. The west-coast node also has $500 (from before the partition). Both datacenters are receiving payment requests.
You have two choices.
Refuse payments on at least one side until connectivity is restored (CP). Customers get errors. Revenue stops. Your phone rings. But no one gets overcharged.
Accept payments on both sides and reconcile later (AP). Customers are happy, revenue flows. Until two payments totaling $1,000 clear against a $500 balance. Now you're eating the loss, or explaining to regulators why your system allowed negative balances.
This is the real decision behind the CAP theorem. It's not academic. It's the question you answer (explicitly or by accident) every time you design a distributed write path.
A CP system prioritizes correctness over availability. During a network partition, any node that cannot confirm it has the latest data refuses to serve requests.
# CP write path (simplified)def write(key, value): ack_count = replicate_to_nodes(key, value) if ack_count < quorum: # quorum = (N/2) + 1 raise UnavailableError # reject write, stay consistent return success# CP read pathdef read(key): responses = query_quorum_nodes(key) if not enough_responses(responses): raise UnavailableError # refuse stale read return latest_value(responses)
The cost is clear: when nodes can't communicate, the system rejects operations. Users see errors or timeouts. For a banking system, this is correct behavior: an error is better than a wrong balance. For a social media feed, this means users stare at a spinner instead of seeing content.
Real-world CP example: Google Spanner uses TrueTime (atomic clocks + GPS) to assign globally consistent timestamps. Every write requires a cross-region quorum. If a region is partitioned, writes to affected data stall until the partition heals. Spanner trades latency for linearizability.
An AP system prioritizes responsiveness. During a partition, every node continues to accept reads and writes using whatever data it has locally. Conflicts are resolved after the partition heals.
# AP write path (simplified)def write(key, value, vector_clock): store_locally(key, value, vector_clock) try: replicate_async(key, value, vector_clock) except PartitionError: queue_for_later(key, value) # will sync when partition heals return success # always respond# AP read pathdef read(key): return local_value(key) # might be stale, but always available# After partition healsdef reconcile(key, versions): if versions_conflict(versions): resolved = resolve(versions) # LWW, vector clocks, or app logic propagate(key, resolved)
The cost: stale reads and write conflicts. If two users update the same record on different nodes during a partition, you get two conflicting versions. Someone (the system or the application) must resolve them.
Real-world AP example: Cassandra with CONSISTENCY ONE. Any single replica responds to reads and accepts writes. During normal operation, background anti-entropy (read repair, Merkle tree sync) keeps replicas converged. During a partition, replicas diverge and converge after healing. The default conflict resolution is last-write-wins (LWW) based on client-supplied timestamps.
The fundamental tension is correctness vs. responsiveness. CP systems guarantee you never see wrong data, at the cost of sometimes seeing no data. AP systems guarantee you always see data, at the cost of sometimes seeing stale or conflicting data.
Choose consistency (CP) whenever incorrect data causes real harm:
Financial transactions. Double-spending, negative balances, duplicate charges. A banking system that shows the wrong balance is broken. An unavailable banking system is inconvenient but not dangerous.
Inventory management at the final decrement. Selling the last concert ticket to two people simultaneously means one gets a broken promise. The checkout must be strongly consistent at the point of commitment, even if browsing is eventually consistent.
Distributed coordination. Leader election (etcd, ZooKeeper), lock services, configuration management. A stale configuration that routes traffic to a dead node causes cascading failure.
Regulatory compliance. Financial reporting, healthcare records, audit logs. If regulators find two conflicting versions of a transaction, you have a problem.
For your interview: when the system manages money, identity, or coordination state, say "I'd choose consistency here" and explain that errors are preferable to incorrect data.
Choose availability (AP) when stale data is acceptable and downtime is not:
Social media feeds. A like count that's 3 seconds stale is invisible to users. A feed that shows a loading spinner for 30 seconds during a partition drives users away.
Content delivery. Blog posts, product descriptions, marketing pages. Slightly stale content is undetectable. Unavailable content means lost revenue.
Metrics and analytics dashboards. Real-time-ish is sufficient. A dashboard showing data that's 10 seconds old is perfectly functional. A dashboard that returns errors is useless.
DNS and service discovery. Returning a slightly stale IP address is better than returning no IP address. The client can retry against the stale endpoint; it can't retry against nothing.
Shopping cart state. Users rarely notice a cart that's a few seconds behind. They definitely notice "Service Unavailable" when trying to add items.
The CAP theorem only describes what happens during a partition. Most of the time, there is no partition. PACELC extends the model: during a Partition, choose Availability or Consistency. Else (normal operation), choose Latency or Consistency.
System
During Partition
Else (Normal)
Classification
Spanner
Consistency (reject writes)
Consistency (sync quorum)
PC/EC
Cassandra
Availability (accept writes)
Latency (local reads)
PA/EL
DynamoDB
Availability (eventual mode)
Configurable (strong or eventual)
PA/EL or PA/EC
MongoDB
Consistency (default majority)
Latency (local reads from primary)
PC/EL
PostgreSQL (sync repl)
Consistency (block writes)
Consistency (sync commit)
PC/EC
This matters because PACELC reveals the hidden tradeoff in normal operation. Cassandra is fast during normal operation (EL) because local reads don't wait for quorum. Spanner is slow during normal operation (EC) because every read waits for consensus. Most of the time, the Else choice (latency vs. consistency in normal operation) matters more than the Partition choice, because partitions are rare.
The most practical answer in interviews: most modern distributed databases let you choose per-query.
# Cassandra: tune consistency per querysession.execute( "SELECT * FROM users WHERE id = ?", [user_id], consistency_level=ConsistencyLevel.QUORUM # strong)session.execute( "SELECT * FROM feed WHERE user_id = ?", [user_id], consistency_level=ConsistencyLevel.ONE # fast, potentially stale)# DynamoDB: tune per readdynamodb.get_item( TableName='orders', Key={'order_id': {'S': '12345'}}, ConsistentRead=True # strong consistency, ~2x latency)
Cassandra's consistency levels form a spectrum: ONE (any single replica, fastest), QUORUM (majority of replicas, strongly consistent when R + W > N), ALL (every replica, slowest but guaranteed fresh). The formula is: if read_consistency + write_consistency > replication_factor, you get strong consistency for that operation pair.
Amazon DynamoDB: Originally built as PA/EL (the 2007 Dynamo paper). Eventually added strongly consistent reads as an option. Most Amazon services use eventual consistency by default for shopping, recommendations, and browsing. Order processing and inventory decrements use strongly consistent reads. Werner Vogels famously said: "Consistency is not a one-size-fits-all property."
Google Spanner: The most extreme CP system in production. Uses TrueTime (atomic clocks + GPS receivers in every datacenter) to assign globally consistent timestamps. Every write requires a cross-region quorum taking 10-15ms even between nearby regions. The tradeoff: Spanner can guarantee linearizability across the entire planet, but write latency has a hard floor imposed by the speed of light. Google accepted this for their advertising and financial systems where consistency is non-negotiable.
Cassandra at Apple: Apple runs one of the largest Cassandra deployments globally (over 150,000 nodes). Most operations use LOCAL_QUORUM (consistent within a datacenter, eventually consistent across datacenters). This gives sub-5ms reads in-region while tolerating cross-region partitions. For iCloud Keychain (passwords, security credentials), they layer application-level consistency on top of Cassandra's eventual consistency, using device-specific encryption and conflict-free merge semantics.
Interview tip: name the consistency requirement per data path
Don't say "I'd make the system consistent." Say: "The payment write path needs strong consistency to prevent double-spending. The product catalog reads can be eventually consistent with a 5-second staleness window. The user session can tolerate a few seconds of staleness across regions."
Common mistake: claiming you can have both
CAP is about partition behavior. During normal operation, you can have both consistency and availability. But when a candidate says "I'll just use a highly available AND strongly consistent database," the interviewer is testing whether you understand that during a partition (which WILL happen in any multi-node system), you must choose. Show you understand the tradeoff exists.
When to bring this up proactively:
Any multi-region architecture (the partition is the WAN link)
Database selection discussions (Cassandra vs. PostgreSQL vs. DynamoDB)
Whenever the system handles money, inventory, or coordination state
Depth expected at senior/staff level:
Explain PACELC (the latency-consistency tradeoff during normal operation)
Distinguish per-operation consistency from system-wide consistency
Interviewer asks
Strong answer
"Is your system CP or AP?"
"Neither globally. Payment writes are CP (quorum writes, consistent reads). Feed reads are AP (single replica, eventually consistent). We tune per operation."
"What happens to writes during a partition?"
"Depends on the data. Financial writes are rejected (CP). Cart updates are accepted locally and reconciled (AP with LWW). We lose revenue on rejected payments but never double-charge."
"How do you handle conflicts in an AP system?"
"Three strategies by data type: LWW for last-modified timestamps, app-level merge for shopping carts (union of items), and reject-and-retry for inventory decrements."
"Why not just use Spanner for everything?"
"Spanner is PC/EC: consistent everywhere, but writes take 10-15ms minimum (speed of light across regions). For a social feed that needs sub-5ms reads, that's too slow. We'd use Cassandra LOCAL_QUORUM for the feed and Spanner for financial data."
"How does DynamoDB handle this?"
"DynamoDB defaults to eventually consistent reads (fast, cheap). You can opt into strongly consistent reads per request at ~2x latency and cost. For writes, DynamoDB uses conditional writes (optimistic locking) to prevent conflicts."
The CAP theorem says during a network partition, you choose between consistency (reject requests to avoid stale data) and availability (serve requests with potentially stale data).
CP systems (Spanner, etcd, ZooKeeper) guarantee correctness at the cost of availability during partitions; choose them for money, coordination, and compliance.
AP systems (Cassandra, DynamoDB eventual mode, DNS) guarantee responsiveness at the cost of potential conflicts; choose them for feeds, caches, and content.
PACELC extends CAP: during normal operation, the tradeoff is latency vs. consistency, which matters more often than the partition scenario.
Tunable consistency (Cassandra CL levels, DynamoDB ConsistentRead) lets you choose per-query, making the tradeoff per-operation rather than system-wide.
If you choose AP, you must name your conflict resolution strategy: LWW, vector clocks, CRDTs, or application-level merge.