PostgreSQL query planner: statistics, cost model, and EXPLAIN
How PostgreSQL's query planner chooses execution plans using column statistics and a cost model. What EXPLAIN ANALYZE output means, how to identify bad plans, and how to fix them without rewriting queries.
The problem
Your e-commerce platform runs a query that joins orders, users, and products to generate a daily sales report. In development with 10,000 rows, it takes 12ms. In production with 50 million orders, the same query takes 47 seconds.
You add an index on orders.user_id. The query still takes 47 seconds. You check the execution plan with EXPLAIN ANALYZE and discover the planner is ignoring your index entirely. It chose a sequential scan across the full orders table, then a nested loop join against users, scanning all 2 million user rows for every batch of orders.
The index exists. The query is correct. The problem is not your SQL. The problem is the planner's decision: it estimated the join would return 12 rows (based on stale statistics from last month's data) when the real result set is 380,000 rows. With that wrong estimate, nested loop looked cheaper than hash join. The plan it chose is catastrophically wrong.
This is the problem the query planner solves, and the problem you face when the planner gets it wrong. PostgreSQL's query planner is a cost-based optimizer that evaluates multiple execution strategies and picks the cheapest. When its cost estimates are accurate, it makes excellent choices. When the estimates are wrong (stale statistics, skewed data, correlated columns), the plan can be orders of magnitude slower than optimal.
What it is
The PostgreSQL query planner is a cost-based optimizer that translates a declarative SQL query into a physical execution plan: a tree of operators (scans, joins, sorts, aggregates) that the executor runs to produce results. For each query, the planner generates candidate plans, estimates their cost using stored statistics about the data, and picks the cheapest one.
Analogy: Think of a GPS navigation system. You type in a destination (the SQL query). The GPS considers multiple routes: highway (fast but longer distance), side streets (shorter but more stops), toll road (expensive but avoids traffic). It picks the route with the lowest estimated travel time based on current traffic data. If the traffic data is stale (it thinks the highway is clear when it is actually jammed), it sends you the wrong way. The PostgreSQL planner works the same way: good statistics produce good plans, stale statistics produce catastrophic plans.
Optimizer vs. planner
PostgreSQL documentation uses "planner" and "optimizer" interchangeably. Other databases (MySQL, Oracle, SQL Server) call the same component the "query optimizer." The function is identical: translate SQL into the cheapest physical execution plan.
How it works
The planner transforms a SQL query into an execution plan through four stages: parsing, rewriting, planning, and execution. The planning stage is where all the interesting decisions happen.
The planner's core loop works by dynamic programming. For each table in the query, it generates all possible access paths (sequential scan, each available index scan, bitmap scan). For each pair of tables, it considers all join strategies. It builds the plan bottom-up, caching the cheapest sub-plan at each step.
function plan_query(tables, joins, filters):
// Step 1: Generate scan paths for each table
for each table in tables:
paths[table] = []
paths[table].add(seq_scan(table))
for each index on table:
if index matches any filter on table:
paths[table].add(index_scan(table, index))
if multiple index conditions exist:
paths[table].add(bitmap_scan(table, indexes))
paths[table] = cheapest(paths[table])
// Step 2: Generate join paths for each pair
for join_size = 2 to len(tables):
for each subset S of tables with size join_size:
for each way to split S into (left, right):
for join_method in [nested_loop, hash_join, merge_join]:
cost = estimate_cost(paths[left], paths[right], join_method)
paths[S] = keep_cheapest(paths[S], cost)
// Step 3: Add sort, aggregate, limit on top
plan = paths[all_tables]
if ORDER BY: plan = add_sort(plan)
if GROUP BY: plan = add_aggregate(plan)
return plan
For queries involving more than geqo_threshold tables (default: 12), PostgreSQL switches from exhaustive dynamic programming to a Genetic Query Optimizer (GEQO) that uses randomized search. This is because the number of possible join orderings grows factorially: 12 tables produce 479 million possible orderings.
The cost model: sequential scan vs index scan
Every plan node gets a cost estimate. The cost formula combines I/O cost (pages read from disk) and CPU cost (rows processed). The cost units are arbitrary but consistent, anchored to seq_page_cost = 1.0.
Key cost parameters and their defaults:
| Parameter | Default | What it controls |
|---|---|---|
seq_page_cost | 1.0 | Cost of reading one page sequentially (baseline) |
random_page_cost | 4.0 | Cost of reading one page randomly (index lookups) |
cpu_tuple_cost | 0.01 | CPU cost of processing one row |
cpu_index_tuple_cost | 0.005 | CPU cost of processing one index entry |
cpu_operator_cost | 0.0025 | CPU cost of one operator evaluation (comparison, function) |
effective_cache_size | 4GB | Planner's estimate of available OS + PostgreSQL cache |
A concrete example: table orders has 1 million rows across 50,000 pages. An index on user_id covers 2,700 index pages.
Sequential scan cost (scanning all rows, filtering to user_id = 123):
I/O: 50,000 pages Γ 1.0 (seq_page_cost) = 50,000
CPU: 1,000,000 rows Γ 0.01 (cpu_tuple_cost) = 10,000
Total: 60,000
Index scan cost (looking up user_id = 123, estimated 50 matching rows):
Index I/O: 3 levels Γ 4.0 (random_page_cost) = 12
Heap I/O: 50 rows Γ 4.0 (random_page_cost) = 200
CPU: 50 rows Γ 0.005 = 0.25
Total: 212.25
The index scan is 280x cheaper. The planner picks it. But if random_page_cost were misconfigured to 40.0, the index scan cost jumps to 2,012, and the planner might still pick the sequential scan for smaller result sets.
On SSD storage, set random_page_cost to 1.1 (not the default 4.0). SSDs have nearly equal sequential and random read latency. The default of 4.0 was calibrated for spinning disks and causes the planner to under-use indexes on SSD-backed systems. I have seen this single misconfiguration cause 10x slowdowns on production queries.
Join algorithms
When a query joins two or more tables, the planner picks a join algorithm for each pair. PostgreSQL supports three algorithms, each optimal for different data shapes.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.