Read vs. write optimization
How to design databases and caches for either read-heavy or write-heavy workloads, covering denormalization, read replicas, write-ahead logs, CQRS, and the tradeoffs of optimizing for one at the expense of the other.
TL;DR
| Scenario | Optimize for reads | Optimize for writes |
|---|---|---|
| E-commerce product catalog | Denormalize, add indexes, cache in Redis | Overkill; writes are rare compared to page views |
| IoT sensor telemetry | Not the bottleneck; reads are infrequent | Append-only LSM, batch inserts, minimal indexing |
| Social media feed | Precompute feeds, read replicas, CDN | Fan-out-on-write for celebrities creates write storms |
| Financial audit log | Occasional reads for investigations | Append-only, immutable, WAL-optimized |
| Real-time analytics dashboard | Materialized views, OLAP column store | Raw event ingestion must be write-optimized |
Default instinct: optimize for reads. Most web applications have a 100:1 read-to-write ratio. But know when you are in the minority: IoT, logging, event sourcing, and analytics ingestion are write-heavy by nature. The right answer depends on your read/write ratio, not on what sounds more sophisticated.
The Framing
Your e-commerce platform launches. The product table has a few thousand rows. Reads and writes are both fast. No one thinks about optimization.
Then the catalog grows to 2 million products. The homepage query that joins products, categories, reviews, and inventory takes 800ms. You add a B-tree index on category_id. It drops to 15ms. Problem solved.
Six months later, the warehouse integration pushes 50,000 inventory updates per minute. Each UPDATE touches the products table, which now has 12 indexes. Every write must update every index. Write latency jumps from 2ms to 45ms. The index that saved your reads is now strangling your writes.
This is the fundamental tension: every structure you add to speed up reads (indexes, materialized views, denormalized columns) becomes overhead on writes. Every optimization you make for writes (append-only logs, minimal indexing, batching) makes reads more expensive. You cannot optimize for both simultaneously on the same data path.
I have seen teams add indexes reactively (one for every slow query) until they have 15+ indexes on a single table and wonder why writes are crawling. The fix is not "remove all indexes." The fix is understanding which path (read or write) is your bottleneck and optimizing intentionally.
The honest answer: neither side is "better." Your read/write ratio determines where to invest. A 100:1 ratio means read optimization pays 100x more per unit of engineering effort. A 1:10 ratio means every write optimization matters far more than shaving milliseconds off reads.
How Each Works
Read optimization: make reads fast at the cost of write overhead
Read-optimized systems pre-compute and pre-organize data so that reads require minimal work. The B-tree is the canonical read-optimized data structure: data is always sorted, binary search finds any key in O(log n), and range scans follow sequential leaf pointers.
B-tree index lookup (PostgreSQL, MySQL):
1. Query: SELECT * FROM products WHERE category_id = 42
2. Traverse B-tree: root -> internal node -> leaf node (3-4 disk reads)
3. Leaf node contains row pointer -> fetch row from heap
4. Total: 3-5 random reads, predictable latency
Cost of maintaining the B-tree on writes:
1. INSERT: find correct leaf -> insert key -> possibly split node
2. UPDATE: update old entry, insert new entry (if indexed column changes)
3. Each index on the table = one additional B-tree update per write
4. 12 indexes = 12 B-tree updates per INSERT
Beyond indexes, read optimization includes denormalization (store redundant data to avoid joins), materialized views (pre-computed query results), caching (store hot data in Redis), and read replicas (horizontally scale read throughput). Every one of these techniques adds work on the write side.
For your interview: when someone says "our reads are slow," the optimization ladder is indexes, then cache, then read replicas, then denormalize. Each step trades write complexity for read speed.
Write optimization: make writes fast at the cost of read overhead
Write-optimized systems prioritize ingestion speed. The LSM tree (Log-Structured Merge Tree) is the canonical write-optimized data structure: writes go to an in-memory buffer (memtable), which is periodically flushed to disk as an immutable sorted file (SSTable). Reads must check multiple SSTables and merge results.
LSM tree write path (Cassandra, RocksDB, LevelDB):
1. Write arrives -> append to Write-Ahead Log (sequential I/O, fast)
2. Insert into in-memory memtable (sorted skip list or red-black tree)
3. When memtable is full (e.g., 64MB) -> flush to disk as immutable SSTable
4. Total write cost: 1 sequential append + 1 in-memory insert
5. No index updates, no page splits, no random I/O
LSM tree read path:
1. Check memtable (in-memory, fast)
2. Check each SSTable level, newest first
3. Bloom filter per SSTable: skip if key is definitely not present
4. Merge results from multiple SSTables
5. Total: potentially multiple disk reads across levels
Compaction runs in the background, merging SSTables to reduce the number of files a read must check. This is the write amplification tradeoff: the data is written once to the memtable, then rewritten during each compaction level. RocksDB's leveled compaction can produce 10-30x write amplification (each byte is rewritten 10-30 times across compaction levels).
Bloom filters are the critical optimization that makes LSM reads tolerable. A Bloom filter answers "is this key possibly in this SSTable?" with a configurable false positive rate (typically 1%). If the filter says "no," the SSTable is skipped entirely. Without Bloom filters, every read would scan every SSTable at every level.
Head-to-Head Comparison
| Dimension | Read-optimized (B-tree) | Write-optimized (LSM tree) | Verdict |
|---|---|---|---|
| Point read latency | O(log n), 3-5 disk reads, predictable | Must check memtable + multiple SSTable levels | B-tree |
| Range scan | Sequential leaf traversal, excellent | Must merge across SSTables | B-tree |
| Write throughput | Random I/O per index, degrades with index count | Sequential I/O only, scales linearly | LSM |
| Write latency | 2-10ms (depends on index count) | Sub-ms (memtable insert + WAL append) | LSM |
| Space amplification | ~1x (data stored once, indexes add 20-40%) | 1.1-2x (duplicates during compaction) | B-tree (slightly) |
| Write amplification | 1x per index (N indexes = N updates per write) | 10-30x (compaction rewrites at each level) | B-tree (fewer total rewrites) |
| Read amplification | 1 (data is in one place per index) | 1-5 (check multiple levels, mitigated by Bloom) | B-tree |
| Storage I/O pattern | Random reads and writes | Sequential writes, random reads | LSM (write path) |
| Compaction overhead | None (data sorted in-place) | Background CPU and I/O spikes | B-tree |
The fundamental tension: read latency vs. write throughput. B-trees keep data sorted for fast reads but pay on every write. LSM trees buffer writes for speed but defer the sorting cost to reads and background compaction.
When Read Optimization Wins
Choose read optimization when your workload is read-heavy and read latency directly affects user experience. This is most web applications.
E-commerce product catalog. A catalog with 2M products and 100K page views per minute generates a 1000:1 read-to-write ratio. Products change a few times per day. Indexes, denormalized views, Redis caching, and read replicas are all justified. The write-side cost (slower product updates) is trivial compared to the benefit of sub-millisecond reads.
User profile lookups. Authentication and profile services read user records on every request. Writes happen only during registration and profile edits. A composite index on (email, status), a Redis cache, and 2-3 read replicas handle millions of reads per second.
Search and discovery. Full-text search (Elasticsearch), autocomplete, and recommendation results are read-heavy by definition. The indexing cost is paid once during ingestion; the search benefit is multiplied across every user query.
OLAP data warehouses. Analytical queries scan millions of rows but the data is loaded in batch (nightly ETL). Column-oriented storage (Redshift, BigQuery, ClickHouse) is a read optimization: columns are stored contiguously, compression is better, and analytical queries read only the columns they need.
When Write Optimization Wins
Choose write optimization when your workload is write-heavy and write throughput is the bottleneck. These workloads often accept higher read latency in exchange for ingestion speed.
IoT and sensor telemetry. A fleet of 100K devices reporting metrics every 10 seconds generates 10K writes/sec. Reads are infrequent (dashboard queries, incident investigations). An LSM-based time-series database (TimescaleDB, InfluxDB) or append-only log (Kafka + ClickHouse) handles this efficiently.
Event sourcing and audit logs. These workloads are append-only by design. Events are never updated or deleted; they are only appended. An LSM tree or append-only log is the natural fit. Reads reconstruct state by replaying events, which is expensive but infrequent.
Metrics and logging. Services generating 50K log lines per second need a write path that does not block the application. Writes go to a buffer (Kafka, Fluentd), then batch-insert into a write-optimized store (ClickHouse, Elasticsearch). Reads happen during investigations, not on the hot path.
High-throughput ingestion pipelines. Data lakes ingest raw events from multiple sources. The priority is "get it on disk fast" rather than "make it queryable immediately." Append to Parquet files in S3, then build read-optimized views later.
The Nuance
The "read vs. write" framing is often a false dichotomy. Most real systems need both fast reads and fast writes. The question is not "pick one" but "how do I split the read path from the write path so each can be optimized independently?"
CQRS: separate the read and write models
Command Query Responsibility Segregation (CQRS) is the pattern that resolves this tension. Writes go to a write-optimized store (normalized, minimal indexes, append-only). Reads go to a separate read-optimized store (denormalized, heavily indexed, cached). An asynchronous sync mechanism keeps the read model updated.
The write side uses a normalized PostgreSQL schema with minimal indexes. The read side uses a denormalized schema (or even a different database like Elasticsearch) with aggressive indexing and caching. The Kafka-based sync introduces a staleness window (typically 100ms to a few seconds), but for most use cases this is acceptable.
I have seen CQRS reduce write latency by 3-5x (fewer indexes on the write path) while simultaneously improving read latency by 10x (denormalized read model + cache). The cost is operational complexity: you maintain two data stores and a sync pipeline.
Read/write ratio analysis
Before optimizing, measure your actual ratio:
Read/write ratio estimation (PostgreSQL):
-- Check reads:
SELECT seq_scan + idx_scan AS total_reads
FROM pg_stat_user_tables WHERE relname = 'products';
-- Check writes:
SELECT n_tup_ins + n_tup_upd + n_tup_del AS total_writes
FROM pg_stat_user_tables WHERE relname = 'products';
Common ratios:
Product catalog: 1000:1 reads:writes -> optimize reads aggressively
User sessions: 100:1 -> cache + replicas
Order processing: 10:1 -> balanced, lean toward reads
Event ingestion: 1:100 -> optimize writes, build read models later
Audit log: 1:1000 -> pure write optimization
The rule of thumb: if your ratio exceeds 10:1 in either direction, optimize for the dominant side. If it is between 1:10 and 10:1, consider CQRS to split the paths.
Real-World Examples
Facebook TAO (read-optimized, extreme). TAO (The Associations and Objects store) is Facebook's read-optimized graph cache. It serves billions of reads per second for the social graph (friends, likes, comments). Writes go to a MySQL primary, which replicates to thousands of TAO cache servers storing denormalized graph data. TAO accepts 1-2 second replication lag because social graph reads tolerate eventual consistency. The write path is intentionally simple (normalized MySQL) while the read path is massively scaled across multiple datacenters.
Cassandra (write-optimized, LSM-based). Cassandra uses LSM trees and was designed for write-heavy workloads (originally built for Facebook's inbox search). It achieves millions of writes per second via commit log + memtable only, no random I/O, minimal indexing (partition key only, secondary indexes are expensive and discouraged), and background compaction. The trade-off: reads on non-partition-key columns are slow. Cassandra requires you to design your table schema around read patterns (one denormalized table per query), making the application layer responsible for read optimization.
Uber Schemaless (CQRS-inspired). Uber built Schemaless on top of MySQL for their write-heavy ride data (trip events, GPS pings, payment events). The write path appends immutable cells optimized for high throughput. Read-optimized views are built asynchronously via triggers that populate secondary indexes and materialized views. This separation allowed Uber to scale write throughput to millions of events per second while serving reads from purpose-built views with sub-10ms latency.
How This Shows Up in Interviews
This trade-off appears in nearly every system design interview. Whenever you choose a database, add an index, introduce caching, or discuss data modeling, you are implicitly making a read-vs-write optimization decision. The interviewer wants to see that you make this choice deliberately.
What they are testing: Do you understand why adding an index speeds up reads but slows writes? Can you identify whether a workload is read-heavy or write-heavy and choose the appropriate optimization? Do you know when CQRS is justified and when it is overkill?
Depth expected at senior level:
- Explain B-tree vs LSM tree trade-offs with specific numbers (write amplification, read amplification)
- Know that over-indexing is a real production problem, not just a textbook concern
- Articulate CQRS as the resolution when both read and write performance matter
- Understand Bloom filters and why they make LSM reads practical
- Reference specific databases and their optimization leanings (PostgreSQL = B-tree/read, Cassandra = LSM/write)
| Interviewer asks | Strong answer |
|---|---|
| "How would you optimize this slow read query?" | "First, check the read/write ratio. If it is 100:1, I would add a composite index, then a cache layer, then read replicas. If closer to 1:1, I would consider CQRS to avoid slowing writes with more indexes." |
| "Why not just add more indexes?" | "Each index is a write-time cost. With 10 indexes, every INSERT updates 10 B-trees. At high write throughput, index maintenance becomes the bottleneck. Audit existing indexes with pg_stat_user_indexes first." |
| "This system ingests 100K events/sec. How would you store them?" | "Write-optimized: append to a WAL-backed LSM store like Cassandra or ClickHouse. Minimal indexes on the write path. Build read-optimized views asynchronously via CDC or materialized views." |
| "What is CQRS and when would you use it?" | "Separate write model from read model. Writes go to a normalized store with minimal indexes. Reads go to a denormalized store with heavy indexing and caching. Sync via Kafka/CDC. Justified when both paths are critical and 100ms-5s staleness is acceptable." |
Interview tip: state the read/write ratio explicitly
When designing any data layer, say: "The read/write ratio here is approximately X:1, so I will optimize for [reads/writes]." This shows you make data structure choices based on workload analysis, not gut feeling.
Quick Recap
- Read-optimized structures (B-trees, indexes, materialized views, caches, read replicas) make reads fast at the cost of write overhead. Every index added is a B-tree that every write must update.
- Write-optimized structures (LSM trees, WALs, append-only logs, minimal indexing, batching) make writes fast at the cost of read complexity. Reads must check multiple levels and merge results.
- B-trees keep data sorted in-place for fast reads with random I/O writes. LSM trees buffer writes sequentially, deferring sort cost to compaction.
- Bloom filters make LSM reads practical by skipping SSTables that definitely do not contain the target key, reducing read amplification from thousands to 1-2 disk reads.
- CQRS resolves the tension by splitting the write path (normalized, minimal indexes) from the read path (denormalized, heavily indexed, cached), synced via CDC or event streaming.
- In interviews, always state the read/write ratio before choosing an optimization strategy. A 100:1 ratio means read optimization pays 100x more per unit of effort.
Related Trade-offs
- Normalization vs. denormalization for the data modeling side of this trade-off
- SQL vs. NoSQL for how database engine choice reflects read/write optimization philosophy
- Caching vs. freshness for the read-optimization technique that introduces staleness
- Read replicas vs. caching for comparing two specific read optimization techniques
- CQRS for the full pattern that separates read and write paths