Two-phase locking (2PL)
How 2PL achieves serializability by splitting every transaction into a lock-growing phase and a lock-shrinking phase. Strict 2PL in practice, how deadlock detection works with real examples, and exactly when MVCC is the better choice.
The problem
A bank's account table has one row: account_id=1, balance=1000. Two transactions start concurrently.
Transaction A reads the balance: 1000. Transaction A calculates a new balance: 1000 - 800 = 200. Transaction B reads the balance: 1000. Transaction B calculates a new balance: 1000 - 800 = 200. Transaction A writes 200. Transaction B writes 200. Both transactions commit successfully. The account now has a balance of 200, but two withdrawals of 800 each were supposed to leave it at -600. The bank lost $600.
This is the lost update anomaly. It happens because Transaction B read the account's pre-A-write state and wrote its calculation on top of A's result. The final balance reflects only one withdrawal instead of two.
A second classic anomaly: the phantom read. Transaction A runs SELECT COUNT(*) FROM orders WHERE user_id = 42; it gets 5. Transaction B inserts a new order for user_id 42 and commits. Transaction A reruns the same query in the same transaction and gets 6. The phantom row appeared mid-transaction. If Transaction A's logic was "if user has less than 6 orders, grant discount," the discount decision is now based on inconsistent state read within a single transaction.
Both anomalies happen because concurrent transactions can observe each other's intermediate state. The correct guarantee is serializability: for any concurrent execution of N transactions, the result is equivalent to running those transactions in some serial order, one at a time. Two-phase locking is the classical protocol for enforcing it.
What it is
Two-phase locking (2PL) is a transaction isolation protocol that achieves serializability by enforcing one rule: once a transaction releases any lock, it may not acquire new locks.
This splits every transaction's lifetime into exactly two phases. In the growing phase, the transaction acquires shared locks for reads and exclusive locks for writes. It does not release anything. In the shrinking phase, the transaction releases locks. Once shrinking starts, no new locks can be added.
Think of it like checking books out of a library before a study session. You gather every book you need (growing phase) before sitting down to study. Once you start returning books (shrinking phase), you cannot grab new ones. This ensures that if two students are using the same set of books, their sessions cannot interleave in a way that corrupts their notes. One student's full set of books is available to them in a consistent snapshot throughout their session.
The phrase "two-phase" refers to the two lifecycle phases, not to the two-phase commit (2PC) distributed protocol. They solve different problems and share only the name.
How it works
Every row, page, or table that a transaction reads or writes must be locked before access. The lock manager maintains a global lock table mapping each resource to its current lock state and the queue of waiting transactions.
Without 2PL, both transactions read 1000 simultaneously and both write 200, losing one withdrawal. With 2PL, T2 waits until T1 commits, then reads the post-T1 balance of 200 and correctly produces -600.
The lock table structure looks like this in memory:
lock_table: map<ResourceID, LockEntry>
struct LockEntry:
lock_mode: UNLOCKED | SHARED | EXCLUSIVE
owners: list<TransactionID> // transactions currently holding the lock
waiters: queue<WaitRequest> // transactions waiting for the lock
struct WaitRequest:
transaction_id: TransactionID
requested_mode: SHARED | EXCLUSIVE
condition_variable: CV // woken when lock is granted
The lock point is the moment the transaction holds its maximum set of locks. In Strict 2PL (used by PostgreSQL, InnoDB, SQL Server), all locks are held until commit or rollback. No locks are released piecemeal during the transaction. This is a stronger variant than basic 2PL and prevents the "dirty read on partial rollback" anomaly that basic 2PL allows.
Lock types and the compatibility matrix
Not all locks conflict with each other. Shared locks exist because multiple readers can safely operate simultaneously. The lock manager consults a compatibility matrix before granting each request.
Lock Compatibility Matrix
(Row = held lock, Column = requested lock)
| SHARED (S) | EXCLUSIVE (X)
---------|------------|---------------
SHARED | COMPATIBLE | CONFLICT
EXCLUSIVE| CONFLICT | CONFLICT
Rules:
- Multiple transactions can hold S locks on the same row simultaneously
- An X lock can only be granted when no other transaction holds any lock on the row
- Upgrading S to X requires waiting for all other S holders to release
Real systems add intention locks for multi-granularity locking (row-level vs table-level locking decisions without scanning every row). The full matrix in PostgreSQL and InnoDB includes:
| Lock mode | Abbreviation | Used for |
|---|---|---|
| Shared | S | Plain reads in serializable isolation |
| Exclusive | X | Writes (INSERT, UPDATE, DELETE) |
| Intent Shared | IS | Signal: "I will place S locks on rows below" |
| Intent Exclusive | IX | Signal: "I will place X locks on rows below" |
| Shared Intent Exclusive | SIX | Table locked S overall, some rows X |
The critical production insight: PostgreSQL uses S locks only at the SERIALIZABLE isolation level (via predicate locking / SSI). At REPEATABLE READ and READ COMMITTED, it uses MVCC for reads and only acquires X locks for writes. This means SELECT statements in a BEGIN; ... COMMIT; block do NOT hold a shared lock by default. To lock a row for reading, you must use SELECT ... FOR SHARE or SELECT ... FOR UPDATE explicitly.
Transaction T1:
acquire lock(A) --+
read A | Growing phase
acquire lock(B) |
write B --+
(lock point: maximum locks held)
--+
release lock(A) | Shrinking phase
release lock(B) | (in Strict 2PL: happens at commit, all at once)
commit --+
Deadlocks: detection and prevention
2PL introduces deadlocks as an unavoidable consequence of holding locks for the duration of a transaction. If Transaction A holds a lock on row X and wants row Y, while Transaction B holds row Y and wants row X, neither can proceed. This is a deadlock cycle.
The wait-for graph is the canonical representation. Each node is a transaction. An edge from T1 to T2 means "T1 is waiting for a lock that T2 holds." A cycle in the wait-for graph is a deadlock.
Databases detect deadlocks by periodically scanning the wait-for graph for cycles (PostgreSQL: every ~1 second, triggered after a configurable deadlock_timeout, default 1 second). When a cycle is found, one transaction is selected as the victim, rolled back, and the cycle is broken. The other transaction can then proceed.
// Deadlock detection algorithm (simplified)
function detect_deadlocks():
graph = build_wait_for_graph(lock_table)
cycles = find_cycles(graph) // DFS with cycle detection
for cycle in cycles:
victim = select_victim(cycle) // lowest cost to abort
abort_transaction(victim)
release_all_locks(victim)
notify_waiters(victim)
The victim selection heuristic matters in practice. PostgreSQL aborts the transaction that has done the least work (fewest locks acquired). This minimizes wasted work. Applications must be written to handle deadlock errors (SQLSTATE 40P01 in PostgreSQL) and retry the entire transaction from the beginning.
Deadlock prevention (the lock ordering approach): The most reliable prevention strategy is to always acquire locks in a fixed global order. If every transaction always locks orders before shipments, the T1/T2 cycle cannot form. This is the single most important operational tip for teams experiencing frequent deadlocks: audit your transactions and enforce a canonical lock acquisition order.
-- Deadlock-prone: T1 locks user then account, T2 locks account then user
-- Fix: always lock in alphabetical order: accounts before users
BEGIN;
SELECT * FROM accounts WHERE account_id = 42 FOR UPDATE; -- lock first
SELECT * FROM users WHERE user_id = 7 FOR UPDATE; -- lock second
-- ... proceed with writes
COMMIT;
Deadlock prevention (wait-die / wound-wait): Distributed databases that cannot afford the cost of scanning a global wait-for graph use timestamp-based protocols:
- Wait-die: older transactions wait; younger transactions are aborted when they would form a cycle. "Old wait, young die."
- Wound-wait: older transactions preempt younger ones; younger transactions wait. "Old wound, young wait."
Both guarantee deadlock freedom at the cost of aborting more transactions than a detection approach would.
Strict 2PL vs. Rigorous 2PL vs. Conservative 2PL
Three named variants of 2PL offer different trade-offs between deadlock risk, abort behavior, and practical implementability.
| Variant | Lock release time | Deadlock risk | Abort risk | Used by |
|---|---|---|---|---|
| Basic 2PL | Any time after lock point | High | High (cascading aborts) | Academic only |
| Strict 2PL (S2PL) | Write locks held until commit; read locks may release earlier | Medium | Low (no dirty reads) | PostgreSQL, InnoDB default |
| Rigorous 2PL (R2PL) | ALL locks held until commit | Medium | None | SQL Server SERIALIZABLE |
| Conservative 2PL | All locks acquired before transaction starts | Zero deadlocks | Impractical (must know all data upfront) | Theoretical/batch ETL |
Basic 2PL allows cascading aborts: if T1 releases a read lock and T2 reads the data before T1 commits, and T1 aborts, T2 has read dirty data and must also abort. This cascade can propagate through a chain of transactions.
Strict 2PL eliminates dirty reads by holding write locks until commit. Since no uncommitted data is released, other transactions cannot read it. This is the default in all production databases.
Rigorous 2PL holds read locks too, enabling repeatable reads (the same SELECT returns the same data twice in the same transaction). This costs more lock contention but is required for SERIALIZABLE isolation level in locking-based systems.
Conservative 2PL requires knowing all data accessed before the transaction begins. This is impossible for interactive workloads but is used in batch ETL pipelines where access patterns are fully deterministic.
2PL vs. MVCC
2PL and MVCC (Multi-Version Concurrency Control) are the two dominant approaches to transaction isolation. They make fundamentally different trade-offs.
| Dimension | 2PL | MVCC |
|---|---|---|
| Read behavior | Blocks on write locks | Never blocks (reads old version) |
| Write behavior | Blocks on read/write locks | Checks for write conflicts at commit |
| Deadlocks | Yes, requires detection/prevention | No conventional deadlocks (but serialization failures) |
| Write-heavy workload | Better (conflict detected early, before work is done) | Worse (work done upfront, then aborted at commit) |
| Read-heavy workload | Worse (readers may block) | Better (reads never wait) |
| Disk/memory overhead | Lock table only | Lock table + old row versions |
| Isolation levels reached | SERIALIZABLE (Rigorous 2PL) | REPEATABLE READ naturally; SERIALIZABLE via SI |
PostgreSQL is a concrete example of both: it uses MVCC for reads (no reader ever waits for a writer) and 2PL-style exclusive locks for writes. SELECT FOR UPDATE explicitly acquires a 2PL write lock on top of MVCC. SERIALIZABLE isolation adds Serializable Snapshot Isolation (SSI), a variant of optimistic concurrency control that detects serialization anomalies at commit time rather than using predicate locks.
The practical rule: choose 2PL (pessimistic locking) when you have high write contention on specific rows (hot aggregates, counters, account balances). The lock detects the conflict early, before executing expensive business logic. Choose MVCC (optimistic) when reads dominate and write conflicts are rare, because skipping locks on reads removes almost all waiting.
Production usage
| System | Isolation mechanism | 2PL usage | Notable behavior |
|---|---|---|---|
| PostgreSQL | MVCC + SSI + advisory locks | X locks on writes; SELECT FOR UPDATE/SHARE for explicit 2PL reads | Deadlock detection every deadlock_timeout (default 1s); victim aborted |
| InnoDB (MySQL) | MVCC + record/gap locks | Strict 2PL for write locks; next-key locks for range queries | Gap locks on range queries prevent phantom inserts; cause most deadlocks |
| SQL Server | MVCC (RCSI) + traditional locking | Full 2PL under SERIALIZABLE; RCSI uses row versions for RC | READ_COMMITTED_SNAPSHOT_ISOLATION removes most lock waits at RC level |
| Oracle | MVCC | 2PL write locks; optimistic reads via undo segments | Unique: never blocks readers with writers; uses undo logs for old versions |
| DB2 | Traditional 2PL | Writer blocks readers by default | Lock escalation from row to page to table when row lock count exceeds threshold |
| CockroachDB | MVCC + distributed 2PL | Write intents as distributed locks; timestamp ordering | Distributed deadlock detection via wait-queue gossip; "write skew" detected via SSI |
Limitations and when NOT to use it
- Deadlocks are unavoidable in basic 2PL. Any system that holds multiple locks across multiple rows for the duration of a transaction can deadlock. Applications must handle
SQLSTATE 40P01and retry. Systems that cannot retry transactions (non-idempotent operations without compensating logic) are dangerous with 2PL. - Lock contention is a scalability wall for hot rows. A product's stock count row updated by every purchase will serialize all concurrent purchases through a single exclusive lock. At 10,000 orders/second, that one row becomes a global serialization point. The fix is not better locking; it is sharding the counter or switching to an optimistic approach (reservation + inventory reconciliation).
- Long-running transactions hold locks for their full duration. A transaction that reads a row at T=0 and does not commit until T=30s holds its lock for 30 seconds. Any writer waiting for that row waits 30 seconds. For long analytical queries mixed with OLTP writes, 2PL is catastrophic. The solution is read replicas for analytics, or MVCC isolation that separates read and write paths.
- Conservative 2PL is impractical for interactive workloads. Requiring a transaction to declare all data it will access before starting is only feasible in batch pipelines with fully predetermined access patterns.
- 2PL does not prevent all anomalies without Rigorous 2PL. Basic 2PL suffers cascading aborts. Even Strict 2PL allows phantom reads unless predicate/gap locks are used. Using
REPEATABLE READin InnoDB with 2PL-style semantics still requires that you understand next-key locks to avoid surprises. - 2PL is wrong for distributed systems using separate databases. Holding a 2PL lock in one database while waiting for a response from another database (for cross-service transactions) will lead to lock contention across service boundaries. Use Saga pattern with compensating transactions instead of distributed 2PL.
Interview cheat sheet
- When asked how databases prevent dirty reads, explain Strict 2PL: write locks are held until commit. No other transaction can read uncommitted data because the write lock blocks them. Then immediately contrast with MVCC: PostgreSQL skips the read-blocking entirely by giving readers a snapshot of old versions.
- When asked about deadlocks, describe the wait-for graph: nodes are transactions, edges are "waiting for" relationships, a cycle is a deadlock. State that PostgreSQL detects cycles after
deadlock_timeout(default 1s) and aborts the victim. Say the prevention rule: acquire locks in a consistent global order across all code paths. - When asked about the difference between Strict 2PL and Rigorous 2PL, state that Strict holds write locks until commit (blocks dirty reads) and Rigorous holds all locks (also blocks non-repeatable reads). Rigorous = SERIALIZABLE via locking. Strict = REPEATABLE READ for writers, lower concurrency cost than Rigorous.
- When asked why SELECT FOR UPDATE exists in PostgreSQL, explain: PostgreSQL normally uses MVCC for reads (no lock).
FOR UPDATEforce-acquires a 2PL exclusive lock, serializing the read against concurrent writes. Use it when: "read, modify, write" logic must see the absolute latest value (not a snapshot) and must block concurrent writers. - When asked about lock contention on a hot row, the answer is to eliminate the need for a single serializing lock: counter sharding (split into N rows, aggregate on read), rate limiting at the application layer so you batch updates, or switching the architecture from synchronous decrement to reservation + async reconciliation.
- When asked to compare 2PL with MVCC for a read-heavy analytics workload, always say MVCC wins: readers never block, writers never block readers, and old version cleanup (VACUUM in PostgreSQL) is an async background process. 2PL would make every read potentially contend with writes.
- When asked what isolation level SERIALIZABLE requires, distinguish: InnoDB SERIALIZABLE adds range/gap locks on top of Strict 2PL (full predicate locking). PostgreSQL SERIALIZABLE uses Serializable Snapshot Isolation (SSI), not traditional locking, and detects write skew at commit time.
- When naming the four classic transaction anomalies, give them in order of severity: dirty read (read uncommitted data), lost update (overwrite uncommitted write), non-repeatable read (same query, different result in same txn), phantom read (new rows appear mid-txn). Strict 2PL prevents the first three; Rigorous 2PL + predicate locks prevent all four.
Quick recap
- 2PL splits every transaction into a growing phase (acquire locks only) and a shrinking phase (release locks only), guaranteeing that concurrent transactions produce a result equivalent to some serial execution order.
- Strict 2PL (used by all production databases) holds write locks until commit, preventing dirty reads without requiring explicit lock-release logic in application code.
- The lock compatibility matrix determines whether a new lock request can be granted immediately (S+S compatible) or must wait (any X involved conflicts).
- Deadlocks are unavoidable with 2PL; detect them via wait-for graph cycle detection (PostgreSQL: every ~1s) or prevent them by always acquiring locks in a canonical global order.
- MVCC is better than 2PL for read-heavy workloads (readers never wait); 2PL is better for write-heavy workloads with high contention (conflict detected early, before wasted work accumulates).
- Long-running transactions with large lock sets are the most dangerous 2PL pattern: keep transactions short and lock sets small, or switch to Saga-based compensation for multi-step batch operations.
Related concepts
- MVCC -- The alternative to 2PL for read isolation: each transaction sees a snapshot of committed data, readers and writers never block each other, and old row versions are cleaned by VACUUM.
- Database locking -- Covers row vs. page vs. table lock granularity, deadlock detection in depth, and advisory locks alongside 2PL.
- Consistency models -- Serializability (what 2PL guarantees) in the broader spectrum from eventual consistency to linearizability.