MVCC (Multi-Version Concurrency Control)

The problem with simple locking

The naive approach to concurrency: lock a row for reading, lock it for writing. One transaction at a time.

This prevents consistency problems but creates a new one. In a read-heavy system (most web applications), long-running reads block writes and writes block reads. A reporting query that takes 5 seconds to scan 10 million rows holds read locks for 5 seconds. Any write to those rows must wait. Your order processing system pauses because the analytics dashboard is running.

The goal: reads and writes should not interfere with each other. Readers should not wait for writers. Writers should not wait for readers.

MVCC is how every major relational database solves this.

What MVCC is

Multi-Version Concurrency Control keeps multiple versions of each row simultaneously. When a transaction reads a row, it sees the version that was current at the transaction's start, not whatever the current live state is. Writers create new versions rather than modifying rows in place.

Analogy: Think of a Google Doc with revision history. When you open the doc to read it, you see the version from when you opened it. While you're reading, someone else edits the doc, creating a new version. You still see the same version you started with. Neither of you is blocked. When you're done reading, the editor's new version is visible to anyone who opens the doc fresh.

How MVCC works in PostgreSQL

Every row in PostgreSQL carries two hidden system columns:

xmin: the transaction ID that created this row version.
xmax: the transaction ID that deleted or updated this row (0 if the row is still live).

-- You can see these explicitly:
SELECT xmin, xmax, id, name FROM users WHERE id = 1;
-- xmin=1042   xmax=0   id=1   name='Alice'

When a transaction updates a row:

PostgreSQL does NOT modify the existing row in place.
It marks the old row's xmax with the current transaction ID (logically "deleted").
It inserts a new row version with xmin set to the current transaction ID.

-- Transaction txn_id=5000 runs: UPDATE users SET name='Alice B.' WHERE id=1;

-- Before:
-- xmin=1042  xmax=0     id=1  name='Alice'     ← original row, still visible

-- After:
-- xmin=1042  xmax=5000  id=1  name='Alice'     ← old version, deleted by txn 5000
-- xmin=5000  xmax=0     id=1  name='Alice B.'  ← new version, created by txn 5000

D2 render error.

spawnSync d2 ENOENT

Three concurrent transactions, one row, two versions in the heap. T1 and T3 see different states of the same row because their snapshots were taken at different times. Neither had to wait. No locks were taken for the reads.

How reads see consistent data: Every transaction has a "snapshot" taken at transaction start. A row is visible if:

xmin is ≤ the current transaction's snapshot horizon (the row existed when we started)
xmax is either 0 (row is still live) or xmax is greater than the snapshot horizon (deleted after we started)

This means a concurrent reader always sees the row as it was at the moment their transaction began, regardless of how many updates happen while they are reading.

Isolation levels and what they change

MVCC alone does not define the isolation level. The snapshot rules determine it:

Isolation level	Snapshot taken at	What it prevents	What it allows
Read Committed	Start of each statement	Dirty reads	Non-repeatable reads, phantom reads
Repeatable Read	Start of transaction	Dirty + non-repeatable reads	Phantom reads (mostly)
Serializable	Start of transaction + SSI	All anomalies	Nothing (full ACID)

Read Committed (PostgreSQL default): each statement sees a fresh snapshot. If you run the same query twice in one transaction, you might get different results if another transaction committed between your two queries.

Repeatable Read: the snapshot is taken at the first statement of the transaction. Repeated reads of the same rows return the same data, even if other transactions commit in between.

-- Transaction A: REPEATABLE READ
BEGIN;
SELECT balance FROM accounts WHERE id = 1;  -- sees $100

-- Transaction B (concurrent): commits UPDATE accounts SET balance=200 WHERE id = 1;

SELECT balance FROM accounts WHERE id = 1;  -- still sees $100 (same snapshot!)
COMMIT;

The VACUUM problem

MVCC creates old row versions every time a row is updated or deleted. These old versions are needed by transactions that still have them in their snapshots. Once no active transaction needs the old version, it becomes "dead" and takes up space without serving any purpose.

Without cleanup, tables grow indefinitely. Index entries for dead rows remain. Every query scans dead rows it must then filter. Eventually, a table with heavy updates has more dead rows than live ones.

VACUUM is PostgreSQL's background process that reclaims dead row space. It:

Scans the table for rows where xmax is set and the deleting transaction is known committed and no active snapshot needs the old version.
Marks those pages as free (available for reuse).
Updates index entries (VACUUM FULL rewrites the whole table; regular VACUUM in-place).

The problem with simple locking

The naive approach to concurrency: lock a row for reading, lock it for writing. One transaction at a time.

The goal: reads and writes should not interfere with each other. Readers should not wait for writers. Writers should not wait for readers.

MVCC is how every major relational database solves this.

What MVCC is

How MVCC works in PostgreSQL

Every row in PostgreSQL carries two hidden system columns:

xmin: the transaction ID that created this row version.
xmax: the transaction ID that deleted or updated this row (0 if the row is still live).

-- You can see these explicitly:
SELECT xmin, xmax, id, name FROM users WHERE id = 1;
-- xmin=1042   xmax=0   id=1   name='Alice'

When a transaction updates a row:

PostgreSQL does NOT modify the existing row in place.
It marks the old row's xmax with the current transaction ID (logically "deleted").
It inserts a new row version with xmin set to the current transaction ID.

-- Transaction txn_id=5000 runs: UPDATE users SET name='Alice B.' WHERE id=1;

-- Before:
-- xmin=1042  xmax=0     id=1  name='Alice'     ← original row, still visible

-- After:
-- xmin=1042  xmax=5000  id=1  name='Alice'     ← old version, deleted by txn 5000
-- xmin=5000  xmax=0     id=1  name='Alice B.'  ← new version, created by txn 5000

D2 render error.

spawnSync d2 ENOENT

How reads see consistent data: Every transaction has a "snapshot" taken at transaction start. A row is visible if:

xmin is ≤ the current transaction's snapshot horizon (the row existed when we started)
xmax is either 0 (row is still live) or xmax is greater than the snapshot horizon (deleted after we started)

This means a concurrent reader always sees the row as it was at the moment their transaction began, regardless of how many updates happen while they are reading.

Isolation levels and what they change

MVCC alone does not define the isolation level. The snapshot rules determine it:

Isolation level	Snapshot taken at	What it prevents	What it allows
Read Committed	Start of each statement	Dirty reads	Non-repeatable reads, phantom reads
Repeatable Read	Start of transaction	Dirty + non-repeatable reads	Phantom reads (mostly)
Serializable	Start of transaction + SSI	All anomalies	Nothing (full ACID)

Repeatable Read: the snapshot is taken at the first statement of the transaction. Repeated reads of the same rows return the same data, even if other transactions commit in between.

-- Transaction A: REPEATABLE READ
BEGIN;
SELECT balance FROM accounts WHERE id = 1;  -- sees $100

-- Transaction B (concurrent): commits UPDATE accounts SET balance=200 WHERE id = 1;

SELECT balance FROM accounts WHERE id = 1;  -- still sees $100 (same snapshot!)
COMMIT;

The VACUUM problem

VACUUM is PostgreSQL's background process that reclaims dead row space. It:

Scans the table for rows where xmax is set and the deleting transaction is known committed and no active snapshot needs the old version.
Marks those pages as free (available for reuse).
Updates index entries (VACUUM FULL rewrites the whole table; regular VACUUM in-place).

MVCC (Multi-Version Concurrency Control)

The problem with simple locking

What MVCC is

How MVCC works in PostgreSQL

Isolation levels and what they change

The VACUUM problem

Continue Reading with Premium

Comments

MVCC (Multi-Version Concurrency Control)

The problem with simple locking

What MVCC is

How MVCC works in PostgreSQL

Isolation levels and what they change

The VACUUM problem

Continue Reading with Premium

Comments