Replication

TL;DR

Replication is copying database data from a primary (writer) to one or more replicas (readers). The replica lags behind by milliseconds to seconds — it's always a stale snapshot.
Replicas absorb read traffic, scaling your read capacity horizontally. The primary still handles all writes and enforces consistency.
Replication failures expose consistency: if a write reaches the primary but not a replica, subsequent reads from that replica serve stale data — this is operationally normal and expected.
Synchronous replication waits for the replica to acknowledge before confirming the write to the client. Slow (adds 10–100ms per write), but zero data loss. Asynchronous replication confirms writes immediately, replicates in the background. Fast, but writes might be lost if the primary crashes mid-flight.
The real trade-off is consistency, availability, and latency. You can't have all three — replication forces you to choose which two matter most for each data type.
Replica lag is your blast radius for stale reads. At 100ms lag, your system might serve 100ms-old data. At 1s lag, you might serve 1-second-old data. That number must match your consistency requirements for each query.

Your SaaS platform serves users in three regions: US, Europe, and Asia. You have a primary database in us-east handling 100K queries/second, and 80% of those are reads — fetching user settings, product catalog, activity feeds. Your database has 50 read connections in its pool.

At peak load, you're hitting roughly 80K reads/second against a connection pool designed for 50 concurrent queries. Connection contention spikes query time from 5ms to 500ms. Users on the other side of the planet are also latency-limited — a 150ms round-trip to the US primary before you even get a result.

I see this exact setup in most system design interviews. The candidate spots "database is slow" and reaches for vertical scaling, completely missing the structural problem.

flowchart TD
  subgraph US["🌐 North America · 40K req/s"]
    Users1(["👤 US Users\n40K reads/s\nRound-trip: ~5ms"])
  end

  subgraph Europe["🌐 Europe · 35K req/s"]
    Users2(["👤 EU Users\n35K reads/s\n150ms cross-continental RTT"])
  end

  subgraph Asia["🌐 Asia · 25K req/s"]
    Users3(["👤 Asia Users\n25K reads/s\n250ms cross-continental RTT"])
  end

  subgraph USEast["🗄️ us-east · Primary Region"]
    DB1[("🟢 Primary DB\n100K queries/s\nConn pool = 50 connections\n5ms → 500ms under load")]
  end

  Users1 -->|"80K reads/s\nPool contention"| DB1
  Users2 -->|"35K reads/s + 150ms RTT\nWaiting for primary"| DB1
  Users3 -->|"25K reads/s + 250ms RTT\nWaiting for primary"| DB1

The problem is two-fold:

Connection pool exhaustion — 100K queries/second needs either massive connection pooling (expensive, requires thread pools on the app tier) or a distributed query cache upstream.
Geographical latency — Users on the opposite side of the planet always pay cross-continental network latency for every single query.

Adding more primary database instances doesn't help — each write still needs to hit the single primary, and all reads still see the same connection-pool ceiling. You need a way to distribute reads geographically and scale the read capacity independently from write capacity. Copies of the same bottleneck solve nothing.

What Is It?

Replication is the process of copying data from a single primary database (the authoritative writer) to one or more replica databases (read-only followers). Every write goes to the primary; the primary asynchronously (or synchronously) ships changes to its replicas; reads can be distributed across the primary and its replicas.

Analogy: Think of a publishing company with a single editor in New York (the primary) who approves all articles. Once approved, the editor ships copies to printing facilities in London, Tokyo, and Sydney (the replicas). Local readers in each city get the paper from their nearest facility — a few hours behind the editor's current pile.

Urgent corrections go back to the New York editor; the facilities re-issue an updated edition. Most readers won't see those corrections for hours. That lag is acceptable — newspapers aren't real-time.

But if someone asks "has this article been approved?" while the London facility is still printing an old edition, the London reader gives the wrong answer. The staleness isn't a bug — it's the trade-off you explicitly accepted when you distributed the copies.

flowchart TD
  subgraph US["🌐 North America"]
    Users1(["👤 US Users\n40K read requests/s"])
  end

  subgraph Europe["🌐 Europe"]
    Users2(["👤 EU Users\n35K read requests/s"])
  end

  subgraph Asia["🌐 Asia"]
    Users3(["👤 Asia Users\n25K read requests/s"])
  end

  subgraph USEast["🗄️ us-east"]
    Primary[("🟢 Primary DB\nAll writes\n20K writes/s\nSame-region reads")]
  end

  subgraph EUWest["🗄️ eu-west"]
    Replica1[("🔵 Read Replica\nAsync replication\n~50ms lag\n35K reads/s")]
  end

  subgraph APSoutheast["🗄️ ap-southeast"]
    Replica2[("🔵 Read Replica\nAsync replication\n~100ms lag\n25K reads/s")]
  end

  Users1 -->|"Reads"| Primary
  Users2 -->|"Reads · 150ms RTT → 50ms via replica"| Replica1
  Users3 -->|"Reads · 250ms RTT → 100ms via replica"| Replica2
  Primary -.->|"Replication stream · 50ms lag"| Replica1
  Primary -.->|"Replication stream · 100ms lag"| Replica2

With replicas in each region, European users now fetch data from the EU replica — the 150ms network cost drops to 50ms replica lag. Asian users use the Asia replica. The primary handles all writes; replication ships those changes async to replicas in the background.

Read capacity now scales linearly as you add replicas. For your interview: default to two replicas beyond your primary — one per distant region, one for failover.

flowchart TD
  subgraph USEast["🗄️ us-east (Primary Region)"]
    Primary[("🟢 Primary\nAll Writes + Reads\nus-east\n\n20K writes/s\n50K reads/s")]
  end

  subgraph EUWest["🗄️ eu-west (Replica Region)"]
    ReplicaEU[("🔵 Read Replica\nAsync Lag: 50ms\neu-west\n\n30K reads/s")]
  end

  subgraph APSoutheast["🗄️ ap-southeast (Replica Region)"]
    ReplicaAsia[("🔵 Read Replica\nAsync Lag: 120ms\nap-southeast\n\n20K reads/s")]
  end

  subgraph UsersLayer["👥 Users by Region"]
    UsersUS["👤 US Users\n50K req/s\nRound-trip: ~5ms"]
    UsersEU["👤 EU Users\n30K req/s\nRound-trip: 150ms RTT\n→ 50ms via replica"]
    UsersAsia["👤 Asia Users\n20K req/s\nRound-trip: 250ms RTT\n→ 120ms via replica"]
  end

  UsersUS -->|"Writes + Reads"| Primary
  UsersEU -->|"Reads only"| ReplicaEU
  UsersAsia -->|"Reads only"| ReplicaAsia
  Primary -.->|"Replication stream\n50ms latency"| ReplicaEU
  Primary -.->|"Replication stream\n120ms latency"| ReplicaAsia

How It Works

Write-Primary, Read-Anywhere Flow

sequenceDiagram
    participant C1 as 👤 Client (US)
    participant C2 as 👤 Client (Asia)
    participant P as 🟢 Primary (us-east)
    participant R1 as 🔵 Replica (eu-west)
    participant R2 as 🔵 Replica (ap-southeast)

    Note over C1,R2: WRITE PHASE

    C1->>P: UPDATE user SET name = Alice
    activate P
    P->>P: Persist to WAL · ~0.1ms
    P-->>C1: 200 OK · write confirmed
    deactivate P

    P-.->R1: Replicate · begin TX
    P-.->R2: Replicate · begin TX

    Note over C1,R2: ASYNC REPLICATION (runs in background)

    par EU Replica
      activate R1
      R1->>R1: Write to log
      R1->>R1: Apply TX · ~10ms
      deactivate R1
    and Asia Replica
      activate R2
      R2->>R2: Write to log
      R2->>R2: Apply TX · ~15ms
      deactivate R2
    end

    Note over C1,R2: READ PHASE (concurrent with replication)

    C2->>R2: SELECT name WHERE id = 1
    Note over R2: Lag ~15ms · TX not yet applied
    R2-->>C2: 200 OK · name = OldName (stale)

    Note over C2,R2: Asia sees OldName for ~15ms<br/>Eventually consistent

Client writes to primary — "UPDATE user SET name='Alice'". The primary confirms the write to the client after persisting it locally (~0.1ms).
Replication stream starts in parallel — The primary begins shipping the write to all replicas asynchronously.
Replicas apply writes sequentially — Each replica receives a log of transactions and applies them in order. This takes 10–100ms depending on network and replica I/O.
Readers might see stale data during the lag window — If a client reads from the Asia replica while that replica is still applying the write, they get the old value. This is replication lag, and it's operational.
Eventual consistency — After all replicas apply the transaction, reads see the new value everywhere.

Replication Internals: Log-Based Replication

Most modern databases (MySQL with binlog, PostgreSQL with WAL, MongoDB oplog) use log-based replication:

Primary database in us-east region with two replicas in eu-west and ap-southeast. Arrows show asynchronous replication streams (50ms to eu-west, 120ms to ap-southeast) and read traffic from regional users to their nearest replica. — Replicas absorb regional read traffic, reducing network round-trip latency from 150-250ms down to 50-120ms replication lag. All writes still route through the single primary.

// Simplified: how a primary replicates writes

// On the primary:
async function applyWrite(query: string): Promise<void> {
  // 1. Apply locally first
  await db.execute(query);

  // 2. Append to replication log
  const logEntry = {
    offset: nextLogOffset++,
    timestamp: Date.now(),
    query,
    hash: sha256(query), // For detecting divergence
  };
  await replicationLog.append(logEntry);

  // 3. Start sending to replicas asynchronously (don't wait)
  replicaServers.forEach((replica) => {
    replica.queue.enqueue(logEntry); // Non-blocking
  });
}

// On each replica:
async function consumeReplicationStream(): Promise<void> {
  while (true) {
    const entry = await primary.replicationLog.fetch(offset);
    if (!entry) {
      // Caught up — wait for new entries
      await sleep(10);
      continue;
    }

    // Apply transaction locally (same order as primary for consistency)
    await db.execute(entry.query);
    offset = entry.offset;
  }
}

The key insight: replicas don't compute independently — they replay the exact sequence of writes from the primary's log in order. This ensures logical consistency even if the replica's hardware or load differs from the primary.

The mistake I see most often is candidates assuming replicas can resolve conflicts or "catch up intelligently." They can't. A replica is a deterministic replay engine — it does exactly what the primary did, in the same sequence, every time.

Key Components

Component	Definition	Operational Impact
Primary (Leader)	The single authoritative database. Receives all writes. Replicates to followers asynchronously or synchronously.	All write consistency enforcement happens here. Primary failure = no writes. Must be resilient.
Replica (Follower, Secondary)	Read-only copy of the primary. Applies logs from primary in sequence. Accepts read traffic only.	Replicas can be scaled horizontally for read throughput. Replica failure = reads lose capacity; data is not lost.
Replication lag	Time window between when a write is confirmed on the primary and when that write is visible on a replica. Measured in milliseconds to seconds.	Clients reading from replicas might see stale data for lag duration. Lag is a direct function of network distance + replica I/O speed.
Replication stream / Write-Ahead Log (WAL)	The ordered sequence of write operations (SQL statements or row changes) that the primary sends to replicas. Replicas apply in exact order.	Size grows unbounded without retention policies. Old log entries are purged after replicas confirm receipt.
Logical vs. physical replication	Logical: Replicate SQL statements or row changes (statement-level DDL or row-level DML). Each replica re-executes the logical operation. Physical: Replicate disk blocks byte-for-byte (WAL or filesystem snapshots).	Logical is portable across DB versions and hardware (MySQL 5.7 primary with 8.0 replicas). Physical is faster (byte-copy, no interpretation) but requires identical DB versions. Most DBs default to logical (MySQL binlog, PostgreSQL WAL streams) for operational flexibility. In interviews, prefer logical for upgrading without downtime.
Semi-synchronous replication	Hybrid: primary waits for at least one replica to apply the write before confirming to the client. Safer than pure async; faster than fully sync.	Prevents data loss if primary crashes immediately after write (at least one replica has it). Still serves stale reads from other replicas.

Replication Topologies

Here's the honest answer on topology: Primary-Replica (Star) is the correct default for 90% of systems. It's simple, scales well, and has clear failure semantics. Stick with it until you have a specific reason not to.

Chained topologies add operational complexity without proportionate benefit. Dual-primary (bidirectional) introduces write conflict complexity that most applications can't handle. Unless your interviewer specifically asks for fault tolerance across all replicas or global write distribution, choose star topology and move on.

My recommendation: state "primary-replica" in your first sentence on topology. It signals you know the correct default without over-engineering it.

flowchart TB
  subgraph Star["⭐ Star — Primary-Replica (default for 90% of systems)"]
    direction TB
    SP[("🟢 Primary\nAll writes")]
    SR1[("🔵 Replica A")]
    SR2[("🔵 Replica B")]
    SR3[("🔵 Replica C")]
    SP -.->|"WAL stream"| SR1
    SP -.->|"WAL stream"| SR2
    SP -.->|"WAL stream"| SR3
  end

  subgraph Chain["🔗 Chained — Cascading Replica"]
    direction TB
    CP[("🟢 Primary")]
    CI[("🔵 Intermediate\nReplica · 1× lag")]
    CL1[("🔵 Leaf · 2× lag")]
    CL2[("🔵 Leaf · 2× lag")]
    CP -.->|"WAL"| CI
    CI -.->|"re-fan"| CL1
    CI -.->|"re-fan"| CL2
  end

  subgraph BiDi["↔ Bidirectional — Dual-Primary"]
    direction LR
    BA[("🟢 Primary A\nus-east")]
    BB[("🟢 Primary B\neu-west")]
    BA -.->|"US→EU writes"| BB
    BB -.->|"EU→US writes"| BA
  end

Primary-Replica (Star)

One primary, many replicas. The primary receives all writes; replicas consume from the primary's log. This is the most common topology.

Pros	Cons
Simple to understand and operate.	Primary is a single point of failure for writes. If it goes down, no writes possible.
Cascading failures don't happen (replicas don't replicate to each other).	All replicas see the same replication lag (bounded by primary's ability to ship log).
Natural sharding boundary: each replica can be a different shard.	Replicas lag proportional to network distance and replica capacity. Asia replicas lag more than EU replicas.

Primary-Replica-Replica (Chained)

Primary → Intermediate Replica → Leaf Replicas. The intermediate replica reads from the primary and replicates to the leaf replicas.

Pros	Cons
Reduces load on primary replication thread (intermediate handles fanning out).	Replication lag cascades: leaf replica lag ≥ intermediate lag ≥ primary lag. A single broken link breaks the whole chain.
Useful for geographically distributed replicas (primary in US, EU → Asia chain).	More moving parts — more operational complexity.
Can isolate replica failures (leaf replica crash doesn't affect primary or intermediate).	Debugging divergence is harder across three layers. Intermediate replica may apply writes out-of-order from leaf's perspective.

Bidirectional Replication (Dual-Primary)

Two primaries that replicate to each other. Both accept writes independently, and both ship their own writes to the peer.

Pros	Cons
Active-active: either primary can accept reads and writes.	Write conflicts are inevitable. If both primaries update the same row independently, how do you merge?
True high availability for writes (either primary can serve traffic).	Replication lag is bidirectional — complexity doubles.
	Debugging divergence requires reconciling both sides. Very operational lift.
	Most applications don't support true multi-master writes without application-level conflict resolution.

Consistency Levels: Read After Write

The critical question in any replication design: after I write, when can I read and see my own write?

Strong Consistency (Read-Your-Write)

After a write is confirmed to the client, a subsequent read from any replica will see that write.

How to achieve: Write hits primary, primary waits for at least one replica to apply before confirming (semi-sync). Client always reads from primary or a replica that has definitely applied the write. If reading from a specific replica, wait for replication log offset matching the write confirmation offset.

Cost: 50–200ms added per write (waiting for replica confirmation over the network). Useless for high-throughput write workloads.

I'll often see candidates default to strong consistency for all data "to be safe." Don't. Use it only where stale data has real-world consequences — payments, auth, inventory. Applying it everywhere turns a latency advantage into a liability.

Interview tip: name your consistency level and replication lag bound

When you add replication in an interview, immediately follow it with: "We use asynchronous replication with a 50ms replication lag on average, 200ms p99. Reads from replicas are eventually consistent; users reading from other regions might see data up to 200ms behind the primary. For critical data (payments, authentication), we read from the primary to ensure immediate consistency." This signals you understand the consistency/latency trade-off and where it matters operationally.

When to use: User authentication (after password change, every read sees the change), financial transactions (after a deposit, the balance increases immediately), inventory operations (after decrement, any query sees new stock).

Eventual Consistency

After a write, a replica might serve stale data for up to replication_lag_window seconds. After that window, all replicas converge on the write.

How it works: Write → primary confirms → async replication → replicas apply eventually. A client might read stale data for the lag duration.

Timeline showing T0 client writes, T0+10ms primary ACKs, T0+50ms EU replica has the write, T0+120ms Asia replica has the write. Stale data window spans 50ms between primary ACK and EU replica readiness. — A write is immediately visible on the primary, but replicas see it after their respective lag windows. Clients reading from lagged replicas during this window see stale (old) data.

Cost: Very low — no waiting. Writes are confirmed as fast as the primary can persist locally (~1ms).

When to use: User feeds (slightly stale posts are fine), analytics events (100ms delay before aggregation is irrelevant), recommendations (5-second-old user activity is fine).

Monotonic Consistent Read (Session Consistency)

Within a single session, if you read version X, your next read will see version X or later — never an earlier version.

How to achieve: Choose a replica at session start and read from it for the whole session (sticky reads). Or, track the replication log offset of past writes and ensure the read replica's lag is less than that offset.

Practical implementation: Set Cookie: replicationOffset=. On next request, route to a replica that's caught up to offset. If not caught up, wait or failover to primary.

Cost: Medium — may require read replicas "up to" certain log offsets to be available.

When to use: Real-time chat (reading your own message, then friends' messages). Transactional systems where you refund an order and then verify the refund (monotonic to your own state).

The rule across all three levels: the stronger the consistency guarantee, the higher the write latency penalty. Pick the weakest guarantee your use case can tolerate.

Failures and Recovery

Primary Failure

The primary database crashes, goes down for maintenance, or becomes unreachable.

Immediate effect: All writes fail (no write target). Reads from replicas continue unaffected.

Recovery options:

Promote a replica to primary — A human or automation tool elects one replica as the new primary. This replica stops reading and starts accepting writes.
- Data loss: Any writes confirmed to clients but not yet replicated to this replica are lost. Replicas behind the crash point lost all in-flight transactions.
- Example: Primary crashes with 200ms of unreplicated writes in flight. Those 200 writes are lost. Clients that got "write confirmed" are now seeing their write roll back when they read from the new primary. This is a very bad user experience and very hard to hide.

Three boxes showing: 1) Original state with primary in us-east and replicas in eu-west (50ms lag) and ap-southeast (120ms lag), 2) Primary crashes, 3) EU replica with least lag is promoted to new primary, resulting in data loss for writes in the last 50ms window. — When choosing which replica to promote, pick the one with the least replication lag. That minimizes data loss, but some writes are still lost (those between primary crash and promotion).

Failover with semi-sync replication — Configure the primary to wait for at least one replica to ACK before confirming writes. If primary crashes, at least one replica is guaranteed to have every confirmed write.
- Trade-off: Writes take 50–100ms longer (wait for network round-trip to replica). High-throughput write workloads suffer.

Promoting a replica always risks data loss

Even with semi-sync replication, if multiple replicas lag, you must choose which one becomes primary. The replica you choose will have the oldest confirmed write. All replicas that lagged won't have newer writes. You're choosing which data to discard — there's no "safe" option, only "least bad."

Replica Failure

A replica database crashes or falls behind the primary.

Immediate effect: Read traffic destined for this replica must failover to another replica or the primary. If this replica was the bottleneck, read latency spikes.

Recovery: Once the replica comes back online, it reconnects to the primary's replication stream and catches up. This can take seconds to hours depending on how far behind it fell and how busy the primary's replication log is.

During catch-up: The replica might be slower than necessary (rebuilding filesystem caches, catching up on writes recorded while it was down). It's operational to keep it read-only for a few minutes until it's fully caught up.

I'll treat a recovering replica like a new deployment — give it a few minutes on monitoring before flagging it healthy and adding it back to the read rotation.

Network Partition (Primary and Replicas Lose Contact)

The network between primary and replicas is interrupted. Primary and replicas can't communicate.

Immediate effect: Primary accepts writes normally. Replicas stop consuming replication log; they continue serving stale data. Clients reading from replicas see increasingly stale data as lag accumulates without bound (replication lag grows to infinity, not a plateau).

Queries about the same row on primary vs. replica diverge. A client writes to primary, reads from the now-isolated replica, and gets an old value. This violates read-your-write consistency expectations and is a bad user experience.

Operational handling: Detect partition via continuous lag monitoring. Alert if lag > threshold (e.g., 60 seconds) continuously for > 5 minutes.

Most production systems choose to serve stale data (with an HTTP header like X-Data-Lag: 45s to signal to clients that freshness is not guaranteed). A smaller percentage fail replica queries and force reads to primary, using a circuit breaker to prevent primary overload. Your choice depends on workload: feeds can serve 2-minute-old data; payments cannot.

Set your alert threshold based on your consistency requirements: if your SLA requires sub-second freshness, alert at 1s lag; if eventual consistency is acceptable, alert at 10s lag.

Resolution: Once the network heals, replicas reconnect and catch up from the replication log offset they last saw. Data converges again. Monitor catch-up time: if a replica takes > 30 minutes to catch up a partition, it's likely experiencing other problems (slow disk, high CPU) and may need manual intervention.

Every minute of partition is another minute of unbounded lag — treat it like any other production outage.

Network partitions expose a hard truth

A network partition doesn't break replication logic — it breaks the assumption that "lag is bounded." During a partition, lag is unbounded. Your system must be designed to handle it: either accept stale data (with alerting), or fail fast (with a circuit breaker). Choosing neither and hoping the partition heals is how silent data inconsistencies creep into production.

When to Use It

So when does replication actually earn its keep? Here's the honest breakdown.

Use replication when:

Your read traffic exceeds write traffic by more than 3–5×, and your primary database is hitting throughput ceilings.
You need to serve reads from multiple geographic regions and network latency is a factor (> 100ms one-way).
You want to run backups or analytics queries without impacting production write traffic (use a replica as the backup/analytics target).
You need high availability (failover the primary if it dies) — though this requires a replica to promote, so replication is a prerequisite.
Your data is read-heavy but not required to be strongly consistent — eventual consistency is acceptable.

The short version: if reads outnumber writes by 3× or more and you can tolerate some staleness, replication is the right tool. If your workload doesn't fit that profile, keep reading.

Be very careful (or avoid) replication if:

Your data is write-heavy (write scale ≈ read scale). Replication doesn't help; every write still routes through the primary. If your design is 50K writes/s and 50K reads/s, replicas won't solve your scaling problem — you need sharding.
Your application requires strong read-your-write consistency for all queries, for all data. In this case, replicas add complexity without benefit — you'd route all reads to the primary anyway. Exception: if only a subset of data requires strong consistency (e.g., user auth, payments), replicate everything but read the critical subset from the primary.
You have fewer than 3 nodes total. Replication needs at least primary + 2 replicas to survive a single failure. Fewer nodes and you're introducing complexity without resilience benefit.
Your dataset is small (< 10GB). A single primary with good indexing will handle millions of QPS. Replication overhead isn't worth it.

Interview tip: replicas don't solve write-heavy workloads

If your interviewer describes a system with balanced or write-heavy load ("50K reads, 50K writes per second"), don't suggest replication as your scaling solution. Every write still routes through the single primary; replicas are overhead. The right answer is sharding (horizontal write splitting) possibly combined with replication. Show you can recognize when a tool isn't the right fit.

Trade-offs

Pros	Cons
Scales read throughput horizontally — add replicas to add read capacity	Replication lag is fundamental — replicas are always stale
Distributes reads geo-graphically — much lower latency for distant users	Primary is still a bottleneck for writes — all writes funnel through single node
Provides failover capability — promote replica on primary failure	Replica failures can cascade if not monitored (one down replica = reduced capacity)
Offloads backups/analytics from primary (no impact on production)	Debugging divergence requires correlating primary and replica state — operational complexity
Natural fit for read-heavy workloads (OLAP, dashboards, feeds)	Write conflicts in multi-master designs require application-level resolution
Cheap horizontal scaling (replicas can be on cheaper hardware than primary)	Replication stream is another log to manage and retain (grows unbounded without pruning)
	Data loss risk on primary failure (for non-semi-sync replication)

The fundamental tension here is read scale vs. write consistency — every replica you add spreads read load but introduces a new replication lag window. The primary remains an unchecked write bottleneck regardless of how many replicas you add. Replicas buy you read scale, and nothing else.

Interview Guide: Replication in System Design

When replication comes up in a design interview, most candidates mention it and move on. Don't — the interviewer is listening for whether you understand the failure modes, not just the happy path.

When asked to design a system with replicas, use this framework:

Name your consistency level. "We use eventually consistent replication with a 50ms replica lag for most data. Auth requires read-your-write, so we read from the primary for that."
Specify the topology. "Primary in us-east handles all writes. Read replicas in eu-west and ap-southeast serve regional reads. One backup replica in us-west for failover."
Bound the replication lag via network and hardware. "Replicas are 50ms behind primary on average. Lag spikes to 500ms under heavy write load or network congestion."
Address failure modes. "On primary failure, we promote the eu-west replica (semi-sync, so it has all writes). We lose at most 10ms of unreplicated transactions. On replica failure, reads failover to the primary (cache or circuit breaker to prevent overload)."
Name a metric & a threshold. "We alert if replica lag exceeds 1 second. At that lag, we see stale reads > 1 second old. If lag exceeds 5 seconds, we fail replica queries to prevent serving 5-second-old data to the client."

Interviewer asks	Strong answer
"How do you handle reads immediately after a write?"	"Route the read to the primary for 200–500ms post-write, or use a replication offset cookie — return the WAL offset of the write in the response header, and route the next read to a replica only once its applied offset exceeds it. For session tokens, always read from the primary — no exceptions."
"What happens if the primary crashes mid-write?"	"With async replication, writes confirmed but not yet shipped to any replica are permanently lost when promotion happens. We use semi-synchronous replication for tier-1 data — primary waits for at least one replica ACK before confirming. Adds ~50ms per write but closes the data loss window entirely."
"How many replicas before you shard instead?"	"Beyond 5–7 replicas per primary, the primary's replication fan-out starts saturating its own NIC — 50K writes/s × 7 replicas × 1KB/write = 350MB/s outbound replication traffic. I'd shard before adding that eighth replica. More replicas also multiplies operational surface area."
"When would you choose multi-master over primary-replica?"	"Almost never for standard OLTP. Multi-master only makes sense when sub-100ms write latency is required in multiple regions AND writes rarely touch the same rows. For payments or user profiles, single-primary always — the write-conflict resolution complexity isn't worth it."
"How do you detect a replica falling behind before it causes visible stale reads?"	"Monitor replication lag continuously — binlog offset delta for MySQL, `sent_lsn - replay_lsn` in `pg_stat_replication` for PostgreSQL. Alert at 1s lag for consistency-sensitive tables; page on-call at 5s. If lag grows without a write surge, the replica has an I/O or disk problem and needs immediate triage."

Replication vs. Caching

Aspect	Replication	Caching
Purpose	Scale reads geographically and provide failover	Reduce DB load by serving hot data from memory
Consistency	Eventual (replicas lag)	Very eventual (TTLs are seconds to minutes)
Data coverage	Every row in the database	Only frequently accessed rows
Write cost	Adds minimal overhead (async)	Cache invalidation overhead on writes
Hardware	Same hardware as primary (full DB engine)	Cheap commodity hardware (in-memory store)
Query flexibility	Full SQL / query language supported	Simple key-value lookups
Latency	50–200ms (network to distant replica)	< 1ms (local or same-datacenter)

Use both: replicas for geographical distribution and failover. Cache for reducing hot-path latency on the primary.

Test Your Understanding

Real-World Examples

AWS RDS Multi-Region Read Replicas — Amazon RDS offers managed read replicas across AWS regions with replication lag typically < 100ms in the same region and 1–2 seconds across continents. Netflix uses read replicas to absorb feed reads (billions of queries per second for user activity timelines) while keeping the primary focused on writes (user preference updates, recommendation ingestion). Without replicas, a single primary would need connection pools with tens of thousands of concurrent connections — operationally infeasible.

Stripe's Payment Systems — Stripe uses semi-synchronous replication for payment records to prevent data loss on primary failure. Replicas exist in multiple regions for geographic redundancy, but replication lag is strictly bounded < 50ms because financial transaction durability is non-negotiable.

A write is confirmed to the client only after both the primary and at least one replica have persisted it to disk. This adds write latency but guarantees zero data loss on primary hardware failure. When the cost of losing data exceeds the cost of adding latency, you pay the latency.

Twitter/X Feed Infrastructure — Before investing heavily in caching (Redis), Twitter relied on MySQL read replicas to absorb feed read traffic. Posts are replicated to replicas in US, EU, and APAC regions (lag 100–500ms depending on write volume). Feeds tolerate eventual consistency — a 500ms delay before a post appears in followers' feeds is acceptable in a real-time platform.

Replicas handle ~70% of feed read traffic; only critical ops (new followers, block updates) read from the primary. This is textbook replication: eventual consistency where staleness is tolerable, strong consistency only where correctness is required.

Uber's Schemaless Datastore — Uber's custom key-value store uses read replicas in each regional datacenter to serve location lookups (driver positions, pickup locations). Replication lag is 50–100ms across continents. Drivers accept slight location staleness (freshness + 100ms is close enough for navigation); passengers see more frequent updates by reading from the primary during active ride flows.

Consistency Models — Go deeper into CAP theorem, causal consistency, linearizability, and other guarantees beyond eventual/strong consistency. Replication forces a trade-off; this article shows it in action.
Caching — Complements replication. Caches serve hot reads from memory (< 1ms); replicas distribute cold reads geographically (50–100ms). Use both for maximum scale.
Load Balancing — Required to route read traffic across replicas by geography and health. Replica selection logic lives in the load balancer.
CAP Theorem — Replication forces you to choose two of Consistency, Availability, Partition tolerance. This article shows the trade-off in action.
Sharding — When replicas alone can't scale write throughput, combine sharding (horizontal write splitting) with replication (vertical read scaling).

Quick Recap

Replication copies data from a primary (writer) to replicas (readers). Replication lag is the staleness window.
Scale reads horizontally by adding replicas. The primary remains the write bottleneck.
Consistency vs. Availability: Eventual consistency allows fast, distributed reads but requires accepting stale data. Strong consistency requires routing reads to the primary, losing latency benefits.
Replica lag is your operational dial. Tune it via network latency, replica hardware, and write throughput. It's a real number, not a theoretical abstraction.
Data loss on primary failure is the critical failure mode of async replication. Semi-sync replication reduces data loss at the cost of write latency. Choose your replication mode based on your data's recoverability tier.
Use replication when: reads >> writes, replicas can be in different regions, failover is required. Avoid when: writes and reads are balanced, strong consistency is non-negotiable for all data, or your dataset is tiny.