Normalization vs. denormalization
When to normalize your database schema for integrity and storage efficiency vs. when to denormalize for query performance, the practical decision criteria, and the cost of getting it wrong.
TL;DR
| Scenario | Normalize | Denormalize |
|---|---|---|
| Write-heavy OLTP | One update, one place, strong integrity | Writes touch multiple tables/documents to keep copies in sync |
| Read-heavy at scale | JOINs across 5+ tables become expensive | Pre-joined rows, single-table reads, no JOIN cost |
| Unknown future queries | Flexible schema supports new access patterns | Locked into the access patterns you denormalized for |
| Strong consistency required | One source of truth, no stale copies | Risk of one copy updated, other copies stale |
| Analytics/reporting | Normalized base + materialized views for queries | Pre-aggregated tables for dashboard reads |
Default instinct: start normalized, denormalize surgically where reads are the bottleneck. Denormalization is an optimization, not a starting point. Premature denormalization creates update anomalies that haunt you for years.
The Framing
Your e-commerce team ships a product catalog with a fully normalized schema. Orders, products, users, addresses, and categories each live in their own clean table with proper foreign keys. The schema is textbook-correct.
Then the product detail page slows down to 400ms. The query joins products, categories, brands, reviews (aggregated), inventory, and pricing_tiers across six tables. The query planner does its best, but six JOINs with a WHERE clause on a 50-million-row products table is fundamentally expensive. The DBA indexes everything aggressively. It drops to 200ms. Not enough.
A senior engineer proposes: "Let's create a product_detail_view table that pre-joins everything the product page needs into a single row." Reads drop to 3ms. But now every product update, price change, review submission, and inventory adjustment must also update this denormalized table. The team writes an event-driven sync pipeline. Two months later, product pages occasionally show stale prices because the sync lagged during a Kafka rebalance.
This is the trade-off. Normalization protects your writes and your sanity. Denormalization accelerates your reads and complicates everything else.
I have seen this exact story play out at three different companies. The pattern is always the same: start normalized, hit a read performance wall, denormalize the hot read path, then spend the next year managing the consistency of the denormalized copy.
How Each Works
Normalization: one fact, one place
Normalization eliminates redundancy by decomposing data into separate tables, each storing one type of fact. The normal forms define increasing levels of decomposition.
1NF: No repeating groups. Each column holds a single atomic value.
Bad: orders(id, items: "Widget,Gadget,Bolt")
Good: order_items(order_id, product_name)
2NF: Every non-key column depends on the ENTIRE primary key.
Bad: order_items(order_id, product_id, product_name)
product_name depends only on product_id, not the full key.
Good: products(product_id, product_name) + order_items(order_id, product_id)
3NF: No transitive dependencies. Non-key columns depend only on the key.
Bad: employees(id, department_id, department_name)
department_name depends on department_id, not on employee id.
Good: departments(id, name) + employees(id, department_id)
BCNF: Every determinant is a candidate key.
Handles edge cases where 3NF still allows anomalies in multi-column keys.
The benefit is update consistency. When a user changes their email, you update one row in the users table. Every query that needs that email joins to users and gets the current value. There is zero risk of a stale copy somewhere.
The cost is read complexity. Assembling a user's order history with product names, category labels, and shipping addresses requires joining 4-6 tables. At millions of rows per table, these JOINs become the bottleneck.
-- Normalized: assembling an order history page
SELECT o.id, o.created_at, u.name, u.email,
oi.quantity, p.name AS product_name, p.price,
c.name AS category_name
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id
JOIN categories c ON p.category_id = c.id
WHERE o.user_id = 'usr_456'
ORDER BY o.created_at DESC
LIMIT 20;
-- 5 tables, 4 JOINs -- at 50M orders: 80-200ms even with indexes
Denormalization: pre-compute the read path
Denormalization intentionally duplicates data so that the read query hits a single table (or document) with no JOINs. The query result is pre-assembled at write time.
-- Denormalized: single-table order history
CREATE TABLE order_history (
order_id UUID,
user_id UUID,
user_name TEXT, -- duplicated from users
user_email TEXT, -- duplicated from users
product_name TEXT, -- duplicated from products
category_name TEXT, -- duplicated from categories
quantity INT,
price_cents INT,
created_at TIMESTAMP
);
-- Read: one table, one index scan, 2-3ms
SELECT * FROM order_history
WHERE user_id = 'usr_456'
ORDER BY created_at DESC
LIMIT 20;
The benefit is read speed. No JOINs. The query planner uses a simple index scan on user_id. Even at 100M rows, this returns in single-digit milliseconds.
The cost is write amplification. Changing a user's name requires updating every row in order_history that references that user. Changing a product name requires updating every order that contains that product. If either update fails halfway, you have inconsistent data across the normalized and denormalized tables.
Head-to-Head Comparison
| Dimension | Normalized | Denormalized | Verdict |
|---|---|---|---|
| Read latency | JOINs required, grows with table count | Single-table scan, sub-10ms | Denormalized |
| Write simplicity | One row updated, one place | Multiple tables/rows must stay in sync | Normalized |
| Storage efficiency | No duplication | Duplicated columns across tables | Normalized |
| Data integrity | Foreign keys enforce referential integrity | Application must enforce consistency | Normalized |
| Query flexibility | Can answer unanticipated queries via JOINs | Locked into pre-computed access patterns | Normalized |
| Scale (reads) | JOINs degrade at 10M+ rows per table | Scales linearly with single-table indexes | Denormalized |
| Scale (writes) | One write per mutation | Write amplification grows with duplication | Normalized |
| Schema evolution | Add a column to one table | Add a column and backfill every denormalized copy | Normalized |
| Debugging | One source of truth, easy to audit | Stale copies create subtle bugs | Normalized |
The fundamental tension: read performance vs. write simplicity and data integrity. Every denormalization is a bet that your read pattern will not change, because changing it means rebuilding the denormalized structure.
When Normalization Wins
Normalize when your workload is write-heavy or your access patterns are unpredictable. These are the scenarios where duplication creates more problems than it solves.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.