MVCC (Multi-Version Concurrency Control)
Learn how MVCC lets readers and writers proceed without blocking each other by keeping multiple row versions, what vacuum does in PostgreSQL, and why MVCC does not eliminate all lock contention.
The problem with simple locking
The naive approach to concurrency: lock a row for reading, lock it for writing. One transaction at a time.
This prevents consistency problems but creates a new one. In a read-heavy system (most web applications), long-running reads block writes and writes block reads. A reporting query that takes 5 seconds to scan 10 million rows holds read locks for 5 seconds. Any write to those rows must wait. Your order processing system pauses because the analytics dashboard is running.
The goal: reads and writes should not interfere with each other. Readers should not wait for writers. Writers should not wait for readers.
MVCC is how every major relational database solves this.
What MVCC is
Multi-Version Concurrency Control keeps multiple versions of each row simultaneously. When a transaction reads a row, it sees the version that was current at the transaction's start, not whatever the current live state is. Writers create new versions rather than modifying rows in place.
Analogy: Think of a Google Doc with revision history. When you open the doc to read it, you see the version from when you opened it. While you're reading, someone else edits the doc, creating a new version. You still see the same version you started with. Neither of you is blocked. When you're done reading, the editor's new version is visible to anyone who opens the doc fresh.
How MVCC works in PostgreSQL
Every row in PostgreSQL carries two hidden system columns:
xmin: the transaction ID that created this row version.xmax: the transaction ID that deleted or updated this row (0 if the row is still live).
-- You can see these explicitly:
SELECT xmin, xmax, id, name FROM users WHERE id = 1;
-- xmin=1042 xmax=0 id=1 name='Alice'
When a transaction updates a row:
- PostgreSQL does NOT modify the existing row in place.
- It marks the old row's
xmaxwith the current transaction ID (logically "deleted"). - It inserts a new row version with
xminset to the current transaction ID.
-- Transaction txn_id=5000 runs: UPDATE users SET name='Alice B.' WHERE id=1;
-- Before:
-- xmin=1042 xmax=0 id=1 name='Alice' β original row, still visible
-- After:
-- xmin=1042 xmax=5000 id=1 name='Alice' β old version, deleted by txn 5000
-- xmin=5000 xmax=0 id=1 name='Alice B.' β new version, created by txn 5000
spawnSync d2 ENOENT
Three concurrent transactions, one row, two versions in the heap. T1 and T3 see different states of the same row because their snapshots were taken at different times. Neither had to wait. No locks were taken for the reads.
How reads see consistent data: Every transaction has a "snapshot" taken at transaction start. A row is visible if:
xminis β€ the current transaction's snapshot horizon (the row existed when we started)xmaxis either 0 (row is still live) orxmaxis greater than the snapshot horizon (deleted after we started)
This means a concurrent reader always sees the row as it was at the moment their transaction began, regardless of how many updates happen while they are reading.
Isolation levels and what they change
MVCC alone does not define the isolation level. The snapshot rules determine it:
| Isolation level | Snapshot taken at | What it prevents | What it allows |
|---|---|---|---|
| Read Committed | Start of each statement | Dirty reads | Non-repeatable reads, phantom reads |
| Repeatable Read | Start of transaction | Dirty + non-repeatable reads | Phantom reads (mostly) |
| Serializable | Start of transaction + SSI | All anomalies | Nothing (full ACID) |
Read Committed (PostgreSQL default): each statement sees a fresh snapshot. If you run the same query twice in one transaction, you might get different results if another transaction committed between your two queries.
Repeatable Read: the snapshot is taken at the first statement of the transaction. Repeated reads of the same rows return the same data, even if other transactions commit in between.
-- Transaction A: REPEATABLE READ
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- sees $100
-- Transaction B (concurrent): commits UPDATE accounts SET balance=200 WHERE id = 1;
SELECT balance FROM accounts WHERE id = 1; -- still sees $100 (same snapshot!)
COMMIT;
The VACUUM problem
MVCC creates old row versions every time a row is updated or deleted. These old versions are needed by transactions that still have them in their snapshots. Once no active transaction needs the old version, it becomes "dead" and takes up space without serving any purpose.
Without cleanup, tables grow indefinitely. Index entries for dead rows remain. Every query scans dead rows it must then filter. Eventually, a table with heavy updates has more dead rows than live ones.
VACUUM is PostgreSQL's background process that reclaims dead row space. It:
- Scans the table for rows where
xmaxis set and the deleting transaction is known committed and no active snapshot needs the old version. - Marks those pages as free (available for reuse).
- Updates index entries (VACUUM FULL rewrites the whole table; regular VACUUM in-place).
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.