How PostgreSQL executes a query

The Interview Question

Interviewer: "Your dashboard query was fast in staging with 10K rows but takes 12 seconds in production with 50 million rows. You ran EXPLAIN ANALYZE and saw a sequential scan instead of an index scan. Walk me through how PostgreSQL decides which execution plan to use, and why it sometimes picks the 'wrong' one."

This question tests whether you understand what happens between typing SQL and getting rows back. The interviewer wants to hear about the query planner, cost estimation, statistics, and MVCC visibility rules. Surface-level answers like "it uses indexes" will not cut it.

What to Clarify Before Answering

You: "Before I dive in, let me scope this properly..."

"Should I walk through the full pipeline from SQL text to result rows, or focus specifically on the planner?"
"Are we talking about a single query on a standalone Postgres instance, or should I cover parallel query execution and replication?"
"Should I include how MVCC affects which rows the query can see, or keep it to the execution path only?"
"Is the interviewer interested in the write path (WAL, buffer pool) or the read path only?"

Why this matters: PostgreSQL query execution touches parsing, planning, execution, memory management, concurrency control, and durability. A strong candidate scopes the answer before diving in, showing they understand the breadth of the system.

The 30-Second Answer

PostgreSQL processes every query through a five-stage pipeline: parse, analyze, rewrite, plan, execute. The parser converts SQL text into a parse tree. The analyzer resolves table and column names against the catalog. The rewriter applies view definitions and rules. The cost-based optimizer then generates candidate execution plans, estimates their cost using table statistics (row counts, value distributions, correlation), and picks the cheapest one. Finally, the executor walks the plan tree, fetching rows through sequential scans, index scans, hash joins, or merge joins as the plan dictates. Concurrent access is managed through MVCC, where every tuple carries transaction visibility information (xmin/xmax), and each transaction sees a consistent snapshot of the database.

The Architecture Overview

PostgreSQL follows a process-per-connection model. When a client connects, the postmaster (main process) forks a new backend process dedicated to that connection. This is different from MySQL's thread-per-connection model. The process model provides stronger isolation (a crash in one backend does not affect others) at the cost of higher memory per connection (~5-10MB per backend process). Connection poolers like PgBouncer are essential for applications that need thousands of connections.

When a query arrives, it enters a dedicated backend process (one per connection). The parser tokenizes the SQL and builds a parse tree. The analyzer resolves all names against the system catalog, checking that tables and columns exist. The rewriter expands views and applies rewrite rules. The planner is where the real work happens: it generates multiple candidate plans, estimates costs using statistics from pg_statistic, and picks the cheapest one.

The executor then walks the chosen plan tree node by node. Each node (SeqScan, IndexScan, HashJoin, Sort) is a pull-based iterator: the parent calls ExecProcNode() on its child, which returns one tuple at a time. Data pages are fetched through shared_buffers (a shared memory cache), and writes go through the WAL buffer for durability.

I will now walk through each stage in detail.

Parse, Analyze, Rewrite: From SQL Text to Query Tree

The first three stages transform raw SQL text into an internal representation the planner can work with.

Parsing

The parser uses a Bison grammar (over 15,000 lines of grammar rules) to convert SQL text into a parse tree. At this stage, Postgres has no idea whether the tables or columns actually exist. It only checks syntax.

-- Input SQL
SELECT u.name, COUNT(o.id)
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.name
HAVING COUNT(o.id) > 5;

-- Parser output (simplified parse tree):
-- SelectStmt
--   targetList: [ColumnRef(u.name), FuncCall(COUNT, ColumnRef(o.id))]
--   fromClause: [JoinExpr(users AS u, orders AS o, ON u.id = o.user_id)]
--   whereClause: [OpExpr(>, ColumnRef(u.created_at), Const('2024-01-01'))]
--   groupClause: [ColumnRef(u.name)]
--   havingClause: [OpExpr(>, FuncCall(COUNT, ...), Const(5))]

Analysis

The analyzer resolves every name in the parse tree against the system catalog (pg_class, pg_attribute, pg_type). It verifies that users exists, that users.name is a valid column, that COUNT is a valid aggregate, and that the types are compatible. The output is a query tree with fully resolved OIDs (internal object identifiers).

Rewriting

The rewriter applies transformation rules. The most common use case: if users is actually a view, the rewriter replaces the table reference with the view's underlying query. It also applies row-level security policies and certain system-defined rules.

Why this matters in production

If your query suddenly changes plans after a seemingly unrelated schema change, check whether a view definition changed or a new rule was added. The rewriter transforms the query before the planner ever sees it, so EXPLAIN shows the rewritten version, not your original SQL.

The Cost-Based Optimizer: How Postgres Chooses a Plan

This is the most important part of the entire pipeline. The planner generates candidate execution plans and estimates their cost using a model based on I/O and CPU costs.

How Cost Estimation Works

Every plan node has two cost numbers: startup cost (cost before the first row is returned) and total cost (cost to return all rows). These are measured in abstract cost units that roughly correspond to sequential page reads.

The key tuning knobs:

Parameter	Default	Meaning
`seq_page_cost`	1.0	Cost of reading one page sequentially
`random_page_cost`	4.0	Cost of reading one random page
`cpu_tuple_cost`	0.01	Cost of processing one row
`cpu_index_tuple_cost`	0.005	Cost of processing one index entry
`cpu_operator_cost`	0.0025	Cost of evaluating one operator
`effective_cache_size`	4GB	Planner's estimate of total cache available

The ratio of random_page_cost to seq_page_cost (4:1 by default) is why Postgres often prefers sequential scans on small-to-medium tables. A sequential scan reads pages in order (leveraging OS readahead), while an index scan jumps to random pages.

Sequential Scan vs. Index Scan Decision

The planner uses table statistics to estimate how many rows a WHERE clause will filter. These statistics come from ANALYZE (run automatically by autovacuum or manually):

n_distinct: Number of distinct values in a column
most_common_vals / most_common_freqs: The top N values and their frequencies
histogram_bounds: Equal-frequency histogram dividing the remaining values

Join Strategies

When a query joins two or more tables, the planner chooses between three join algorithms:

Join Type	Best When	Complexity	Memory Use
Nested Loop	Inner table is small or has an index	O(N * M) or O(N * log M) with index	Minimal
Hash Join	No useful index, both tables fit in `work_mem`	O(N + M)	O(smaller table)
Merge Join	Both inputs are already sorted (index or explicit sort)	O(N log N + M log M)	O(sort buffers)

The planner also considers join order. For up to ~12 tables, it evaluates all possible orderings using dynamic programming. Beyond that, it switches to the Genetic Query Optimizer (GEQO), which uses a genetic algorithm to search a subset of orderings. This threshold is controlled by geqo_threshold (default 12).

Common trap: work_mem too low

The default work_mem is 4MB. When a hash join's build side exceeds work_mem, Postgres spills to disk in batches, which can be 10-100x slower. If your queries do large joins or sorts, increase work_mem per-session. But be careful: each sort or hash operation gets its own work_mem allocation, so a complex query with 10 hash joins could use 10x work_mem total.

MVCC: How Concurrent Reads and Writes Coexist

PostgreSQL uses Multi-Version Concurrency Control so that readers and writers operate without blocking each other. Every tuple (row version) on disk carries visibility information.

Tuple Header Visibility Fields

Each heap tuple has a header with these critical fields:

Field	Size	Purpose
`xmin`	4 bytes	Transaction ID that inserted this tuple
`xmax`	4 bytes	Transaction ID that deleted/updated this tuple (0 if live)
`t_ctid`	6 bytes	Pointer to the next version of this tuple (for updates)
`t_infomask`	2 bytes	Hint bits for committed/aborted status

When a transaction runs a SELECT, it does not see the "latest" version of every row. It sees the version that was visible at its snapshot.

Snapshot Isolation

A snapshot records:

xmin: The lowest still-running transaction ID at snapshot time
xmax: The next transaction ID that will be assigned
xip_list: Array of transaction IDs that were in-progress at snapshot time

A tuple is visible to a snapshot if:

xmin is committed AND xmin < snapshot's xmax AND xmin is not in xip_list
xmax is zero (not deleted), OR xmax is not yet committed, OR xmax is in xip_list

This visibility check happens for every tuple during a scan. For a sequential scan of 10 million rows, Postgres evaluates these rules 10 million times. The t_infomask hint bits optimize this by caching the committed/aborted status so Postgres does not need to look up the pg_xact (CLOG) file for every tuple.

Why readers never block writers

Unlike traditional locking systems where a SELECT can block an UPDATE (or vice versa), MVCC allows both to proceed simultaneously. The SELECT reads the old tuple version, and the UPDATE creates a new version. Neither waits for the other. The only scenario where transactions block each other is two concurrent UPDATEs to the same row, where the second UPDATE must wait for the first to commit or abort.

The VACUUM Process

When tuples are updated or deleted, the old versions remain on disk (they might still be visible to other transactions). VACUUM reclaims space from dead tuples.

VACUUM Type	When It Runs	What It Does	Impact
Regular VACUUM	Autovacuum trigger	Marks dead tuple space as reusable	Does not shrink file on disk
VACUUM FULL	Manual only	Rewrites entire table, reclaims disk space	Exclusive lock, very slow
Autovacuum	When dead tuples > threshold	Runs regular VACUUM automatically	Minimal impact if tuned
Anti-wraparound	When xid age > 200M	Prevents transaction ID wraparound	Can freeze system if neglected

The transaction ID wraparound problem

PostgreSQL uses 32-bit transaction IDs (roughly 4 billion before wraparound). VACUUM freezes old tuples to reclaim transaction IDs. If autovacuum falls behind, Postgres will refuse new writes at 2 billion transactions remaining and eventually shut down to prevent data corruption. This is the single most critical operational failure mode in PostgreSQL.

Index Internals: How B-Trees Speed Up Lookups

PostgreSQL's default index type is the B-tree (balanced tree). Understanding how it works explains why some queries use indexes and others do not.

B-Tree Structure

A B-tree index is a sorted, balanced tree where:

Leaf pages contain index entries (key value + pointer to heap tuple)
Internal pages contain separator keys that guide the search
Root page is the starting point for every lookup
All leaf pages are linked in a doubly-linked list for range scans

For a table with 10 million rows and a B-tree on an integer column, the tree is typically 3-4 levels deep. Each lookup requires 3-4 page reads (one per level) plus one heap page read to fetch the actual tuple.

// Simplified B-tree lookup for WHERE id = 42
function btree_search(root, key=42):
    page = root                    // Level 0: root page
    while page.is_internal:
        // Binary search within page for correct child pointer
        child_ptr = binary_search(page.entries, key)
        page = read_page(child_ptr)  // Level 1, 2: internal pages
    // Now at leaf level
    entry = binary_search(page.entries, key)
    if entry.key == key:
        return read_heap_tuple(entry.tid)  // Fetch actual row from heap
    return NULL

Index Types Beyond B-Tree

Index Type	Best For	Structure	Example Use Case
B-tree	Equality and range queries	Balanced tree	`WHERE id = 42`, `WHERE date > '2024-01-01'`
Hash	Equality only	Hash buckets	`WHERE email = 'foo@bar.com'` (rarely better than B-tree)
GIN	Multi-valued columns	Inverted index	Full-text search, JSONB containment, array overlap
GiST	Spatial and range types	Generalized search tree	PostGIS geometry, range types, nearest-neighbor
BRIN	Large naturally-ordered tables	Block range summaries	Time-series data on append-only tables

When BRIN beats B-tree

For append-only time-series tables with billions of rows, a BRIN index on the timestamp column can be 1,000x smaller than a B-tree (a few KB versus several GB). BRIN stores the min/max value per block range (128 pages by default). The tradeoff is that BRIN is useless if the data is not physically ordered by the indexed column. I reach for BRIN when the table is append-only and queries always filter by the ordering column.

WAL: How Writes Become Durable

The Write-Ahead Log is PostgreSQL's durability mechanism. The rule is simple: before any data page change is written to disk, the WAL record describing that change must be written and fsynced first.

The Write Path

The sequence for a write operation:

Backend generates a WAL record describing the change (essentially a diff of the page)
WAL record is appended to the WAL buffer in shared memory
The heap page in shared_buffers is modified (marked dirty)
At COMMIT, the WAL buffer is fsynced to the WAL segment file on disk
The dirty data page is written to disk later by the background writer or checkpointer

This separation is critical. WAL writes are sequential (fast), while data page writes are random (slow). By writing WAL first, Postgres can acknowledge a commit quickly and batch data page writes.

WAL Record Format

Each WAL record contains:

Resource manager ID: Which subsystem generated the record (heap, btree, transaction, etc.)
Transaction ID: Which transaction this change belongs to
LSN (Log Sequence Number): A monotonically increasing pointer into the WAL stream
Page diff data: The actual bytes that changed on the data page

The LSN is critical. Every data page header stores the LSN of the most recent WAL record that modified it. During recovery, Postgres compares the page LSN with the WAL record LSN. If the page LSN is already >= the WAL record LSN, that change was already flushed to disk and the record is skipped. This makes recovery idempotent.

Checkpoints and Recovery

A checkpoint flushes all dirty pages from shared_buffers to disk and writes a checkpoint record to WAL. After a crash, recovery starts from the last checkpoint and replays WAL records forward.

Parameter	Default	Purpose
`checkpoint_timeout`	5 min	Maximum time between checkpoints
`max_wal_size`	1 GB	WAL size that triggers a checkpoint
`wal_level`	replica	How much detail to write (minimal, replica, logical)
`synchronous_commit`	on	Whether to wait for WAL fsync before returning

Buffer Pool and Shared Memory

PostgreSQL caches data pages in a shared memory region called shared_buffers. Every read and write goes through this cache.

How the Buffer Pool Works

The buffer pool is an array of 8KB buffer slots. Each slot holds one data page (heap page, index page, or other). When a backend needs a page:

Check the buffer tag hash table for the page
If found (cache hit), pin the buffer and return it
If not found (cache miss), find a free or evictable buffer slot
Read the page from disk into the slot
Insert into the hash table and return

The buffer tag is a combination of (relation file node, fork number, block number). The hash table uses partition-based locking (128 partitions), so multiple backends can look up different pages concurrently without blocking each other.

Double Buffering: shared_buffers + OS Page Cache

PostgreSQL relies on a two-tier caching strategy. shared_buffers is the first tier, managed by Postgres with pin counts and usage tracking. The OS page cache is the second tier, caching any file data Postgres reads from disk.

When Postgres reads a page not in shared_buffers, the read goes to the OS. If the OS has the page in its page cache, the read completes from RAM (just a memory copy, ~microseconds). If not, it goes to disk (~milliseconds). This means the effective cache size is shared_buffers + OS page cache, which is why effective_cache_size should be set to ~75% of total RAM.

Ring Buffers for Large Scans

When a sequential scan reads a table larger than 1/4 of shared_buffers, Postgres uses a ring buffer (256KB, just 32 pages) instead of the main buffer pool. This prevents a single large table scan from evicting all cached pages that other queries depend on. Without this mechanism, a single SELECT * FROM huge_table would flush the entire buffer pool, causing every other concurrent query to re-read its data from disk.

The same ring buffer strategy applies to bulk writes (COPY, CREATE TABLE AS) and VACUUM operations. This is one of the most elegant design decisions in the buffer manager.

Clock-Sweep Eviction

When the buffer pool is full, Postgres uses a clock-sweep algorithm to find a page to evict. Each buffer has a usage counter (0-5). The clock hand sweeps through buffers:

If usage_count > 0, decrement it and move on
If usage_count == 0, evict this buffer (write to disk first if dirty)

This approximates LRU without the overhead of maintaining a linked list. Frequently accessed pages accumulate higher usage counts and survive longer.

// Simplified clock-sweep eviction
function find_victim_buffer():
    while true:
        buf = buffers[clock_hand]
        clock_hand = (clock_hand + 1) % num_buffers
        if buf.usage_count == 0:
            if buf.is_dirty:
                write_page_to_disk(buf)
            return buf
        else:
            buf.usage_count -= 1

Why shared_buffers should not be too large

There is a common misconception that you should set shared_buffers to 50%+ of RAM. Postgres relies heavily on the OS page cache as a second layer of caching. Setting shared_buffers to 25% of RAM (the typical recommendation) leaves room for the OS cache, which handles sequential scan readahead and double-buffers pages that were evicted from shared_buffers. On a 64GB machine, I typically set shared_buffers to 16GB.

Parallel Query Execution

Since version 9.6, PostgreSQL can parallelize certain operations. The planner creates a Gather or Gather Merge node that spawns background workers. Each worker processes a portion of the data and sends results back to the leader process.

Operations that support parallelism:

Sequential scans (each worker scans a subset of blocks)
Index scans and bitmap index scans
Hash joins and nested loop joins (inner side)
Aggregates (partial aggregation in workers, final aggregation in leader)

| Parameter | Default | Purpose | |-----------|---------|---------|| | max_parallel_workers_per_gather | 2 | Max workers per query | | min_parallel_table_scan_size | 8MB | Minimum table size for parallel scan | | parallel_tuple_cost | 0.1 | Planner cost of transferring a tuple between processes | | max_parallel_workers | 8 | Global max parallel workers |

Parallel query helps most on large analytical queries that scan millions of rows. It does not help (and adds overhead) for OLTP point lookups that return a single row. The planner accounts for the overhead of inter-process communication when deciding whether to parallelize.

What Happens When Things Break

Failure	What Happens	How to Detect	How to Fix
Crash mid-transaction	Uncommitted changes are lost, WAL replays from last checkpoint on restart	Server exits, `pg_ctl status` shows stopped	Restart Postgres, recovery is automatic
Stale statistics	Planner picks wrong plans (seq scan instead of index scan)	`EXPLAIN ANALYZE` shows estimated vs actual rows differ by 10x+	Run `ANALYZE` on affected tables
Transaction ID wraparound	Postgres refuses new writes, eventually shuts down	`WARNING: database must be vacuumed within X transactions` in logs	Emergency: run `VACUUM FREEZE` on affected tables
Bloated tables from long transactions	Sequential scans take 10x longer, disk usage spikes	`pg_stat_user_tables.n_dead_tup` grows continuously	Kill idle transactions, run `VACUUM`, consider `VACUUM FULL` during maintenance window
WAL disk full	Postgres halts all writes	`PANIC: could not write to file pg_wal/...`	Free disk space, archive old WAL segments, increase disk
shared_buffers thrashing	High I/O, slow queries across the board	`pg_stat_bgwriter.buffers_backend` high (backends writing their own pages)	Increase `shared_buffers`, reduce concurrent connections

The key insight about crash recovery

Postgres never needs a "repair" or "recovery mode" dialog like some databases. WAL-based recovery is fully automatic. The database reads the last checkpoint location from pg_control, replays all WAL records after that point, and is ready for connections. This typically takes seconds to minutes, even for large databases, because it only replays the WAL generated since the last checkpoint (not the entire database).

Performance Characteristics

Operation	Typical Latency	Throughput	Key Factor
Point lookup via B-tree index	0.1-1ms	50,000-100,000 QPS	Tree depth (3-4 levels) + 1 heap fetch
Sequential scan (cold cache)	100MB/s+ from disk	Limited by disk I/O	OS readahead, page size
Sequential scan (warm cache)	1-5GB/s from shared_buffers	Limited by CPU	Tuple deforming, visibility checks
Single-row INSERT	0.5-2ms (with fsync)	5,000-20,000 TPS	WAL fsync latency
Bulk COPY load	100K-500K rows/sec	Limited by WAL write throughput	Disable indexes during load
Hash join (in-memory)	O(N + M)	Limited by work_mem	Fits in work_mem vs spill to disk
VACUUM on 1M dead tuples	1-10 seconds	~100K-500K tuples/sec	I/O for reading/writing pages
Checkpoint (1GB dirty data)	5-30 seconds	Spread across checkpoint_timeout	Disk write throughput

The 5% rule of thumb for index scans

As a rough heuristic, Postgres prefers an index scan when the query returns less than ~5% of the table rows, and a sequential scan when it returns more. This threshold shifts based on random_page_cost, index correlation, and whether the data is cached. On SSDs with low random_page_cost, the crossover point can be as high as 20-30%.

How This Compares to Alternatives

Feature	PostgreSQL	MySQL (InnoDB)	SQLite	CockroachDB
Query planner	Cost-based, very sophisticated	Cost-based, simpler	Simpler cost-based	Distributed cost-based
MVCC implementation	Heap tuple versioning (xmin/xmax)	Undo log (rollback segments)	WAL-mode with snapshots	Timestamp-based MVCC
Index structure	Heap + separate B-tree	Clustered index (data in PK B-tree)	B-tree	LSM-based (RocksDB)
Dead tuple cleanup	VACUUM (explicit process)	Purge thread (automatic)	Not needed (WAL handles it)	Garbage collection
Parallel query	Yes (since v9.6)	Limited (since 8.0)	No	Yes (distributed)
Extension ecosystem	Rich (PostGIS, pg_vector, timescaledb)	Limited	Minimal	PostgreSQL-compatible
Connection model	Process per connection	Thread per connection	In-process	Process per node

I reach for PostgreSQL when I need a general-purpose relational database with strong consistency, a rich extension ecosystem, and a sophisticated query planner. I switch to MySQL when the team already has deep MySQL expertise and the workload is simple OLTP. I consider CockroachDB when I need horizontal scaling with serializable isolation, though the latency overhead of distributed consensus is real (2-10ms per write vs sub-millisecond in standalone Postgres).

SQLite is the right choice for embedded applications, mobile apps, or single-user tools where the simplicity of an in-process database outweighs the limitations of single-writer concurrency. I have used SQLite as a local cache in front of a remote Postgres database to great effect.

The InnoDB clustered index difference

The biggest architectural difference between PostgreSQL and MySQL/InnoDB is how they store data. PostgreSQL uses a heap (unordered) with separate B-tree indexes. InnoDB uses a clustered index where the primary key B-tree IS the table data. This means InnoDB primary key lookups are faster (no separate heap fetch), but secondary index lookups are slower (they must first look up the primary key, then traverse the clustered index). It also means InnoDB UPDATEs that change the primary key are expensive because they move the entire row in the B-tree.

Interview Cheat Sheet

When asked about query execution: "Postgres has a five-stage pipeline: parse, analyze, rewrite, plan, execute. The planner is cost-based and uses statistics from pg_statistic to estimate row counts and choose between scan types."
When asked about index vs. seq scan: "The planner compares the estimated cost of sequential I/O (reading all pages) versus random I/O (index lookups). The crossover is typically around 5-15% of table rows, but it depends on random_page_cost and whether data is cached."
When asked about MVCC: "Every tuple has xmin and xmax fields. Each transaction gets a snapshot that determines which tuple versions it can see. Readers never block writers and vice versa."
When asked about VACUUM: "Updates and deletes create dead tuples. VACUUM marks that space as reusable. Long-running transactions prevent vacuum from cleaning up, which causes table bloat."
When asked about transaction ID wraparound: "Postgres uses 32-bit transaction IDs. VACUUM freezes old tuples to reclaim IDs. If vacuum falls behind, Postgres will refuse writes at 2 billion transactions to prevent corruption."
When asked about WAL: "Write-Ahead Logging ensures durability. WAL records are written and fsynced before the transaction is acknowledged. Dirty data pages are flushed later by the checkpointer."
When asked about the buffer pool: "shared_buffers is a page cache using clock-sweep eviction. Set it to ~25% of RAM and let the OS cache handle the rest. Backends that write their own pages (buffers_backend) indicate the pool is too small."
When asked about join strategies: "Nested loop for small inner tables with indexes, hash join for larger unindexed joins that fit in work_mem, merge join when both sides are already sorted."
When asked about slow queries: "First: EXPLAIN ANALYZE to see actual vs estimated rows. Second: check if statistics are stale (run ANALYZE). Third: check if work_mem is causing disk spills. Fourth: check for sequential scans on large tables where an index would help."

Test Your Understanding

Quick Recap

PostgreSQL processes every query through a five-stage pipeline: parse, analyze, rewrite, plan, execute.
The cost-based optimizer estimates plan costs using table statistics (row counts, histograms, most common values) and cost parameters (sequential vs random I/O costs).
The planner chooses between sequential scan, index scan, bitmap scan, and index-only scan based on selectivity, correlation, and available indexes.
MVCC uses tuple-level xmin/xmax fields so that each transaction sees a consistent snapshot without blocking other transactions.
VACUUM is essential for reclaiming dead tuple space and preventing transaction ID wraparound, which is the most critical operational failure in PostgreSQL.
The Write-Ahead Log ensures durability by writing WAL records before data pages, enabling automatic crash recovery from the last checkpoint.
The buffer pool (shared_buffers) uses clock-sweep eviction and should be set to ~25% of RAM, with the OS page cache handling the rest.
For multi-table joins, the planner uses nested loops, hash joins, or merge joins, and switches to a genetic optimizer for queries joining more than 12 tables.

How DynamoDB works: Contrast with a fully managed NoSQL database that takes a completely different approach to query execution (no joins, no planner, partition-key routing only)
How RocksDB compaction works: The LSM-tree storage engine used by CockroachDB and many NewSQL databases, a fundamentally different storage approach than PostgreSQL's heap + B-tree
How garbage collection works: Similar to VACUUM in principle (reclaiming unused memory/space), but in a runtime environment rather than a database
How ClickHouse works: A columnar database optimized for analytical queries, showing the tradeoffs between OLTP (Postgres) and OLAP (ClickHouse) query execution