Over-indexing anti-pattern
Learn why adding too many indexes degrades write performance, explodes storage, and slows query planning, and how to choose only the indexes your queries actually need.
TL;DR
- Every index you add costs a write amplification tax: for every INSERT, UPDATE, or DELETE, Postgres must also update all indexes on that table. A table with 12 indexes incurs 12 index writes per row insertion.
- Indexes are not free storage. Each B-tree index is a full copy of the indexed columns, ordered and balanced. A table with many indexes can be double or triple its base table size.
- The query planner has to choose between available indexes at query time. Too many indexes can cause the planner to make suboptimal choices, or spend extra time evaluating options.
- The discipline: add indexes that are directly justified by a specific slow query with EXPLAIN ANALYZE evidence. Do not add indexes "just in case."
- The fix: audit with
pg_stat_user_indexes, drop indexes with 0 scans, consolidate single-column indexes into composites, and use partial indexes for hot subsets.
The Problem
It's 3 a.m. and your on-call engineer gets paged. The order processing pipeline is backing up. Write latency on the orders table has crept from 2ms to 45ms over the past month. Nobody noticed because reads were still fast. The culprit? Twelve indexes on a single table, most added "just in case" during code reviews over the past year.
A developer reviewing a schema adds "helpful" indexes to a large orders table:
CREATE INDEX idx_orders_user_id ON orders (user_id);
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_created_at ON orders (created_at);
CREATE INDEX idx_orders_updated_at ON orders (updated_at);
CREATE INDEX idx_orders_amount ON orders (amount);
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at);
-- ... 5 more indexes for other query shapes
The table has 50 million rows. Each order insertion now updates 13 indexes. What was a 1ms primary key insert becomes a 12ms multi-index update. Write throughput drops by 10x. WAL log size explodes, causing replication lag. VACUUM takes longer because index entries must be cleaned too. A hot promotion flow that processes 5,000 orders/second now struggles to keep up with 500/second.
I've seen this exact failure at two different companies. Both times, the fix was the same: audit every index, drop the ones without a matching query, and consolidate overlapping indexes into composites.
Here's the before-and-after from one of those incidents:
| Metric | Before (13 indexes) | After (4 indexes) |
|---|---|---|
| Insert latency (p99) | 45ms | 4ms |
| WAL generation | 120 MB/min | 35 MB/min |
| Replication lag | 8 seconds | < 1 second |
| VACUUM duration | 4 hours | 25 minutes |
| Table + index size | 180 GB | 65 GB |
Write amplification per index
For a B-tree index:
- Every INSERT inserts a key into the B-tree: O(log n) per index
- Every DELETE removes the key from the B-tree: O(log n) per index
- Every UPDATE on an indexed column deletes the old key and inserts the new key: O(log n) per index, twice
At 50M rows, each B-tree operation is log₂(50,000,000) ≈ 26 comparisons. With 13 indexes: 338 comparisons per write. The page cache must hold 13× more index pages. Dirty pages from all 13 index structures compete for the shared buffer pool.
The math is simple: if you have N indexes, your write cost is roughly N times what it would be with just the primary key. For most OLTP tables, 3-5 well-chosen indexes is the sweet spot. Beyond that, you need a very good reason.
Why It Happens
Every one of these decisions sounds reasonable in isolation:
- A developer adds an index during a code review because "we might query on this column someday." It takes 10 seconds to write the migration.
- The DBA adds indexes during a performance incident to fix a slow query. The crisis passes, but the indexes stay forever. Nobody wants to drop an index that "fixed" something.
- An ORM auto-generates indexes for every foreign key, and nobody reviews the migration output. I've seen Django and Rails projects with 2-3 auto-generated indexes per model relationship.
- A team copies the indexing strategy from a read-heavy OLAP workload onto a write-heavy OLTP table. Different workloads need different indexing strategies.
- Someone reads a "SQL performance tips" blog post that says "always index your WHERE columns" and applies it literally to every column.
The anti-pattern emerges because adding an index is easy and dropping one feels risky. "What if something breaks?" So indexes accumulate like barnacles on a ship's hull. Nobody owns the index lifecycle. And that's where the problem starts.
How to Detect It
| Symptom | What It Means | How to Check |
|---|---|---|
| Write latency increasing over months | New indexes added without removing old ones | pg_stat_user_indexes for index count per table |
| Table storage 3x larger than row data | Index bloat exceeding table size | pg_total_relation_size vs pg_relation_size |
| VACUUM taking hours on a moderate table | Too many index entries to clean | pg_stat_user_tables for last_autovacuum duration |
| Replication lag growing steadily | WAL volume inflated by index writes | Check WAL bytes/second in monitoring |
| Indexes with 0 scans in production | Unused indexes that only cost writes | pg_stat_user_indexes WHERE idx_scan = 0 |
-- Quick audit: find tables with excessive indexes
SELECT
t.relname AS table_name,
COUNT(i.indexrelid) AS index_count,
pg_size_pretty(pg_total_relation_size(t.oid)) AS total_size,
pg_size_pretty(pg_relation_size(t.oid)) AS table_size
FROM pg_class t
JOIN pg_index i ON t.oid = i.indrelid
WHERE t.relkind = 'r'
GROUP BY t.relname, t.oid
HAVING COUNT(i.indexrelid) > 5
ORDER BY COUNT(i.indexrelid) DESC;
If a table has more than 5 indexes, that's worth investigating. More than 10 is almost always over-indexed.
Another useful check: compare index size to table size.
-- Compare index-to-table size ratio
SELECT
t.relname AS table_name,
pg_size_pretty(pg_relation_size(t.oid)) AS table_size,
pg_size_pretty(pg_total_relation_size(t.oid) - pg_relation_size(t.oid)) AS index_size,
ROUND(100.0 * (pg_total_relation_size(t.oid) - pg_relation_size(t.oid)) /
NULLIF(pg_relation_size(t.oid), 0), 1) AS index_to_table_pct
FROM pg_class t
WHERE t.relkind = 'r'
ORDER BY pg_total_relation_size(t.oid) DESC
LIMIT 10;
If your indexes are 200-300% of the table size, you're almost certainly carrying dead weight.
The Fix
The fix is not "remove all indexes." The fix is building an evidence-based indexing discipline: every index must justify its existence with a real query that needs it.
The process: audit what you have, profile your queries, consolidate overlapping indexes into composites, drop the unused ones, and add partial indexes for hot subsets.
Fix 1: Profile actual query patterns
-- Find slow queries in PostgreSQL
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
Build indexes for slow queries that have EXPLAIN ANALYZE evidence of sequential scans. Don't guess. Don't index speculatively. Every index should trace back to a specific slow query that you can show evidence for.
-- EXPLAIN ANALYZE shows where indexes help
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;
-- If you see "Seq Scan on orders" with high cost, that's your signal
-- If you see "Index Scan using idx_orders_user_id", no new index needed
Fix 2: Consolidate with composite indexes
Before adding a new index, check if a composite index can cover multiple query shapes:
-- Don't add both:
CREATE INDEX idx_orders_user_id ON orders (user_id);
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
-- Just create the composite (it covers single-column user_id queries too)
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
A composite index on (user_id, status) covers queries on user_id alone (leftmost prefix rule) and queries on (user_id, status) together. This is one of the most powerful techniques for reducing index count. One composite can replace two or three single-column indexes.
Fix 3: Drop unused indexes
-- PostgreSQL: find indexes that have never been used
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;
An index with 0 scans in production has never been used and is pure overhead. Drop it.
One caveat: check over a full month before dropping. Some indexes support monthly batch jobs or reporting queries that only run during billing cycles. If idx_scan has been 0 for 30+ days, it's safe to remove.
Fix 4: Use partial indexes for hot subsets
If most queries only look at recent or active records, a partial index covers the hot subset at a fraction of the cost of a full index:
-- Only index PENDING orders (the slice queries actually touch)
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'PENDING';
-- 98% of orders are COMPLETED, excluded from this index
-- Index is 50x smaller (2% of table) and just as fast for the queries that matter
Which fix to use?
Severity and Blast Radius
Over-indexing is a slow-burn problem, not a sudden outage. Write latency degrades gradually (over weeks or months), which makes it hard to attribute to a single change. By the time it's noticeable, you may have 10+ excess indexes and removing them requires careful analysis.
Blast radius is moderate but wide: write throughput drops, replication lag grows, VACUUM slows down, and storage costs creep up. Recovery takes hours to days because you need to profile queries before dropping indexes safely. You can't just drop everything and hope.
Severity rating: medium. It won't take your system down in an instant, but it will slowly choke your write path until something else tips over (a promotion, a batch import, a traffic spike).
The good news: recovery is straightforward once you've done the analysis. Dropping an unused index is instant. The hard part is knowing which ones to drop, not the act of dropping them.
When It's Actually OK
- Read-heavy OLAP tables that are loaded in bulk and rarely updated. If writes happen once a day in a batch, 15 indexes cost almost nothing. The write amplification only matters during the batch load.
- Small reference tables (under 100K rows) where index overhead is negligible. A lookup table with 500 countries can have 10 indexes and nobody will notice.
- Covering indexes that eliminate table lookups entirely for critical queries. The write cost is justified by the read speedup, especially for high-QPS queries.
- Short-lived staging tables that exist only during ETL, then get dropped. Index them aggressively if it speeds up the pipeline.
- Development and testing environments where write performance doesn't matter. Add whatever helps you debug faster, but don't deploy the same indexes to production.
The key question: does this table receive more than a few hundred writes per second? If not, the index overhead is likely negligible regardless of count. If it does, every extra index has a measurable cost on write latency, WAL volume, and replication lag.
How This Shows Up in Interviews
When designing a schema, you don't need to enumerate every index. Mention the indexes that directly support your most important queries and explain why. This shows the interviewer you think about indexes as a cost, not just a benefit.
"I'd add a compound index on (user_id, status) to cover the user order history query and the pending orders check with one index. I'd avoid indexing low-cardinality columns like status alone since the query planner often prefers a seq scan on those anyway."
A strong answer also mentions the write cost: "Every index I add slows down writes on this table. For a write-heavy orders table processing thousands of inserts per second, I'd keep the index count to 3-4 maximum and rely on composites to cover multiple query shapes."
Red flag to interviewers: a candidate who says "I'll add indexes on all the columns we query on" without considering write impact or composite coverage. That tells them you've never operated a write-heavy table at scale.
Show your indexing discipline
"I only add indexes backed by EXPLAIN ANALYZE evidence from a real slow query. Speculative indexes cost every write and help no reads."
Quick Recap
- Every index adds write amplification: each INSERT, UPDATE, or DELETE must update all applicable B-tree indexes.
- Storage doubles or triples because each index is a full, ordered copy of the indexed columns.
- Too many indexes confuse the query planner and slow down VACUUM, replication, and backups.
- Add indexes based on EXPLAIN ANALYZE evidence from actual slow queries, not speculatively.
- Composite indexes cover multiple query shapes via the leftmost prefix rule.
- Partial indexes cover hot subsets at a fraction of the cost of a full index.
- Audit regularly: any index with 0 scans in production is pure write overhead.
- Set up quarterly index reviews. Treat your index list like code: it needs maintenance.
- When in doubt, start with fewer indexes. You can always add one later with EXPLAIN ANALYZE evidence. Removing one is harder because you need to prove nothing depends on it.
- Composite indexes are your best friend. One well-designed composite on
(a, b)replaces two separate single-column indexes onaandbfor most query shapes.