Over-indexing anti-pattern

TL;DR

Every index you add costs a write amplification tax: for every INSERT, UPDATE, or DELETE, Postgres must also update all indexes on that table. A table with 12 indexes incurs 12 index writes per row insertion.
Indexes are not free storage. Each B-tree index is a full copy of the indexed columns, ordered and balanced. A table with many indexes can be double or triple its base table size.
The query planner has to choose between available indexes at query time. Too many indexes can cause the planner to make suboptimal choices, or spend extra time evaluating options.
The discipline: add indexes that are directly justified by a specific slow query with EXPLAIN ANALYZE evidence. Do not add indexes "just in case."
The fix: audit with pg_stat_user_indexes, drop indexes with 0 scans, consolidate single-column indexes into composites, and use partial indexes for hot subsets.

It's 3 a.m. and your on-call engineer gets paged. The order processing pipeline is backing up. Write latency on the orders table has crept from 2ms to 45ms over the past month. Nobody noticed because reads were still fast. The culprit? Twelve indexes on a single table, most added "just in case" during code reviews over the past year.

A developer reviewing a schema adds "helpful" indexes to a large orders table:

CREATE INDEX idx_orders_user_id ON orders (user_id);
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_created_at ON orders (created_at);
CREATE INDEX idx_orders_updated_at ON orders (updated_at);
CREATE INDEX idx_orders_amount ON orders (amount);
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at);
-- ... 5 more indexes for other query shapes

The table has 50 million rows. Each order insertion now updates 13 indexes. What was a 1ms primary key insert becomes a 12ms multi-index update. Write throughput drops by 10x. WAL log size explodes, causing replication lag. VACUUM takes longer because index entries must be cleaned too. A hot promotion flow that processes 5,000 orders/second now struggles to keep up with 500/second.

I've seen this exact failure at two different companies. Both times, the fix was the same: audit every index, drop the ones without a matching query, and consolidate overlapping indexes into composites.

Here's the before-and-after from one of those incidents:

Metric	Before (13 indexes)	After (4 indexes)
Insert latency (p99)	45ms	4ms
WAL generation	120 MB/min	35 MB/min
Replication lag	8 seconds	< 1 second
VACUUM duration	4 hours	25 minutes
Table + index size	180 GB	65 GB

Write amplification per index

For a B-tree index:

Every INSERT inserts a key into the B-tree: O(log n) per index
Every DELETE removes the key from the B-tree: O(log n) per index
Every UPDATE on an indexed column deletes the old key and inserts the new key: O(log n) per index, twice

At 50M rows, each B-tree operation is log₂(50,000,000) ≈ 26 comparisons. With 13 indexes: 338 comparisons per write. The page cache must hold 13× more index pages. Dirty pages from all 13 index structures compete for the shared buffer pool.

The math is simple: if you have N indexes, your write cost is roughly N times what it would be with just the primary key. For most OLTP tables, 3-5 well-chosen indexes is the sweet spot. Beyond that, you need a very good reason.

Why It Happens

Every one of these decisions sounds reasonable in isolation:

A developer adds an index during a code review because "we might query on this column someday." It takes 10 seconds to write the migration.
The DBA adds indexes during a performance incident to fix a slow query. The crisis passes, but the indexes stay forever. Nobody wants to drop an index that "fixed" something.
An ORM auto-generates indexes for every foreign key, and nobody reviews the migration output. I've seen Django and Rails projects with 2-3 auto-generated indexes per model relationship.
A team copies the indexing strategy from a read-heavy OLAP workload onto a write-heavy OLTP table. Different workloads need different indexing strategies.
Someone reads a "SQL performance tips" blog post that says "always index your WHERE columns" and applies it literally to every column.

TL;DR

Every index you add costs a write amplification tax: for every INSERT, UPDATE, or DELETE, Postgres must also update all indexes on that table. A table with 12 indexes incurs 12 index writes per row insertion.
Indexes are not free storage. Each B-tree index is a full copy of the indexed columns, ordered and balanced. A table with many indexes can be double or triple its base table size.
The query planner has to choose between available indexes at query time. Too many indexes can cause the planner to make suboptimal choices, or spend extra time evaluating options.
The discipline: add indexes that are directly justified by a specific slow query with EXPLAIN ANALYZE evidence. Do not add indexes "just in case."
The fix: audit with pg_stat_user_indexes, drop indexes with 0 scans, consolidate single-column indexes into composites, and use partial indexes for hot subsets.

The Problem

A developer reviewing a schema adds "helpful" indexes to a large orders table:

CREATE INDEX idx_orders_user_id ON orders (user_id);
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_created_at ON orders (created_at);
CREATE INDEX idx_orders_updated_at ON orders (updated_at);
CREATE INDEX idx_orders_amount ON orders (amount);
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at);
-- ... 5 more indexes for other query shapes

Here's the before-and-after from one of those incidents:

Metric	Before (13 indexes)	After (4 indexes)
Insert latency (p99)	45ms	4ms
WAL generation	120 MB/min	35 MB/min
Replication lag	8 seconds	< 1 second
VACUUM duration	4 hours	25 minutes
Table + index size	180 GB	65 GB

Write amplification per index

For a B-tree index:

Every INSERT inserts a key into the B-tree: O(log n) per index
Every DELETE removes the key from the B-tree: O(log n) per index
Every UPDATE on an indexed column deletes the old key and inserts the new key: O(log n) per index, twice

Why It Happens

Every one of these decisions sounds reasonable in isolation:

A developer adds an index during a code review because "we might query on this column someday." It takes 10 seconds to write the migration.
The DBA adds indexes during a performance incident to fix a slow query. The crisis passes, but the indexes stay forever. Nobody wants to drop an index that "fixed" something.
An ORM auto-generates indexes for every foreign key, and nobody reviews the migration output. I've seen Django and Rails projects with 2-3 auto-generated indexes per model relationship.
A team copies the indexing strategy from a read-heavy OLAP workload onto a write-heavy OLTP table. Different workloads need different indexing strategies.
Someone reads a "SQL performance tips" blog post that says "always index your WHERE columns" and applies it literally to every column.

Over-indexing anti-pattern

TL;DR

The Problem

Write amplification per index

Why It Happens

Continue Reading with Premium

Comments

Over-indexing anti-pattern

TL;DR

The Problem

Write amplification per index

Why It Happens

Continue Reading with Premium

Comments