Materialized view

TL;DR

A materialized view stores pre-computed query results as a physical table, trading storage and staleness for sub-millisecond reads on expensive aggregations.
Regular views are just saved queries (re-executed every time). Materialized views store the actual result data.
Refresh strategies range from scheduled (simple, stale) to CDC-driven incremental (complex, near-real-time). Pick based on your freshness SLA.
The pattern extends beyond databases: CQRS read models in Redis, Elasticsearch indexes, and denormalized cache entries are all materialized views.
The fundamental trade-off is query speed vs data freshness and the operational cost of keeping materialized results current.

The Problem

Your e-commerce platform has a dashboard showing each user's lifetime order count and total spend. The query JOINs the users table with orders and runs GROUP BY with SUM and COUNT.

At 100K orders, the query takes 50ms. At 10M, it takes 3 seconds. At 100M orders, it takes 15 seconds. The dashboard loads on every admin page view.

Two hundred concurrent admins means 200 copies of this 15-second query running simultaneously. Your database CPU pins at 100%. Read latency spikes for the entire application, not just the dashboard.

Your database is normalized for writes. That's correct for data integrity. But your read queries pay the full cost of re-computing JOINs and aggregations on every single request, even when the underlying data barely changed since the last query.

Indexes help with filtering, but they can't eliminate the cost of aggregating 100M rows into a summary. Application-layer caching works for simple key-value lookups, but complex aggregations with multiple dimensions (group by date, segment, and region) don't map cleanly to cache keys.

The mistake I see most often in interviews: candidates jump to "add an index" when the real problem is an expensive aggregation. Indexes don't fix GROUP BY over 100M rows. You need to pre-compute the result.

Regular views don't help here

A regular database view (CREATE VIEW) is just a saved query. It re-executes the full JOIN and GROUP BY on every SELECT. It makes your code cleaner but does nothing for performance. A materialized view stores the actual result data as a physical table.

One-Line Definition

A materialized view pre-computes a query result and stores it as a physical table, giving reads constant-time speed regardless of the source data size.

Analogy

Think of a restaurant's daily specials board. The chef calculates ingredient costs and portions once each morning (the refresh), writes the final prices on the board (the materialized result), and customers read the board instantly. Nobody asks the kitchen to recalculate prices for each order. The board might not reflect a mid-day supplier price change until tomorrow's refresh, but for 99% of customers it's accurate enough. The trade-off: instant reads at the cost of occasional staleness.

Solution Walkthrough

The core mechanism is straightforward: compute the expensive query once, store the result, and serve reads from the stored copy.

Regular View vs Materialized View

-- Regular view: re-executes the full JOIN on every SELECT
CREATE VIEW order_summary AS
SELECT u.id, u.name, COUNT(o.id) AS order_count, SUM(o.total) AS lifetime_value
FROM users u LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;

-- Materialized view: stores the result as a physical table
CREATE MATERIALIZED VIEW order_summary_mv AS
SELECT u.id, u.name, COUNT(o.id) AS order_count, SUM(o.total) AS lifetime_value
FROM users u LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
WITH DATA;

-- Concurrent refresh: no lock on readers during rebuild
REFRESH MATERIALIZED VIEW CONCURRENTLY order_summary_mv;
-- Requires a UNIQUE INDEX on the MV

A query against order_summary_mv reads a pre-computed table. The duration is 2ms whether the source tables have 1M or 100M rows.

Refresh Strategies

The core design decision is how and when to keep the materialized view current. This choice defines the staleness window, write overhead, and operational complexity.

Scheduled full refresh runs a cron job that re-computes the entire view. Simple to implement. Appropriate for dashboards where hourly or daily data is fine (sales totals, analytics reports). The downside: the refresh itself is expensive on large tables, and data stays stale for the entire interval.

CDC-driven incremental refresh uses change data capture to apply row-level updates. When an order is inserted, a CDC event triggers an update to just that user's summary row. Far cheaper than re-aggregating 100M rows, but implementation complexity is significant.

Event-driven application-level refresh maintains the materialized view in application code, typically in Redis or a dedicated read store. Events from the write path drive real-time updates. This gives sub-millisecond reads with near-zero staleness, at the cost of building and maintaining the update logic yourself.

Streaming Materialized View Pipeline

For systems requiring near-real-time freshness at scale, a streaming pipeline continuously maintains the materialized view:

Tools like ksqlDB and Materialize specialize in this: they let you write SQL queries that are continuously and incrementally maintained as source data changes. Think of it as CREATE MATERIALIZED VIEW but across distributed Kafka topics instead of database tables.

I find streaming MVs to be the sweet spot for most production systems that need second-level freshness. The operational complexity is real, but the combination of Debezium + Kafka + Flink is well-proven at companies like LinkedIn and Uber.

Refresh Strategy Comparison

Strategy	Staleness	Write Overhead	Complexity	Best For
Scheduled full refresh	Hours	None	Low	CEO dashboards, weekly reports
Periodic concurrent	Minutes	Low	Low	Admin dashboards, near-real-time
CDC incremental	Seconds	Low	High	Search indexes, feed rendering
Event-driven (Redis)	Real-time	Medium	Medium	User-facing counters, leaderboards
Streaming (Flink/ksqlDB)	Sub-second	Low	Very high	Financial dashboards, fraud detection

Implementation Sketch

TL;DR

A materialized view stores pre-computed query results as a physical table, trading storage and staleness for sub-millisecond reads on expensive aggregations.
Regular views are just saved queries (re-executed every time). Materialized views store the actual result data.
Refresh strategies range from scheduled (simple, stale) to CDC-driven incremental (complex, near-real-time). Pick based on your freshness SLA.
The pattern extends beyond databases: CQRS read models in Redis, Elasticsearch indexes, and denormalized cache entries are all materialized views.
The fundamental trade-off is query speed vs data freshness and the operational cost of keeping materialized results current.

The Problem

Your e-commerce platform has a dashboard showing each user's lifetime order count and total spend. The query JOINs the users table with orders and runs GROUP BY with SUM and COUNT.

At 100K orders, the query takes 50ms. At 10M, it takes 3 seconds. At 100M orders, it takes 15 seconds. The dashboard loads on every admin page view.

Two hundred concurrent admins means 200 copies of this 15-second query running simultaneously. Your database CPU pins at 100%. Read latency spikes for the entire application, not just the dashboard.

Regular views don't help here

One-Line Definition

A materialized view pre-computes a query result and stores it as a physical table, giving reads constant-time speed regardless of the source data size.

Analogy

Solution Walkthrough

The core mechanism is straightforward: compute the expensive query once, store the result, and serve reads from the stored copy.

Regular View vs Materialized View

-- Regular view: re-executes the full JOIN on every SELECT
CREATE VIEW order_summary AS
SELECT u.id, u.name, COUNT(o.id) AS order_count, SUM(o.total) AS lifetime_value
FROM users u LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;

-- Materialized view: stores the result as a physical table
CREATE MATERIALIZED VIEW order_summary_mv AS
SELECT u.id, u.name, COUNT(o.id) AS order_count, SUM(o.total) AS lifetime_value
FROM users u LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
WITH DATA;

-- Concurrent refresh: no lock on readers during rebuild
REFRESH MATERIALIZED VIEW CONCURRENTLY order_summary_mv;
-- Requires a UNIQUE INDEX on the MV

A query against order_summary_mv reads a pre-computed table. The duration is 2ms whether the source tables have 1M or 100M rows.

Strategy	Staleness	Write Overhead	Complexity	Best For
Scheduled full refresh	Hours	None	Low	CEO dashboards, weekly reports
Periodic concurrent	Minutes	Low	Low	Admin dashboards, near-real-time
CDC incremental	Seconds	Low	High	Search indexes, feed rendering
Event-driven (Redis)	Real-time	Medium	Medium	User-facing counters, leaderboards
Streaming (Flink/ksqlDB)	Sub-second	Low	Very high	Financial dashboards, fraud detection

Materialized view

TL;DR

The Problem

One-Line Definition

Analogy

Solution Walkthrough

Regular View vs Materialized View

Refresh Strategies

Streaming Materialized View Pipeline

Refresh Strategy Comparison

Implementation Sketch

Continue Reading with Premium

Comments

Materialized view

TL;DR

The Problem

One-Line Definition

Analogy

Solution Walkthrough

Regular View vs Materialized View

Refresh Strategies

Streaming Materialized View Pipeline

Refresh Strategy Comparison

Implementation Sketch

Continue Reading with Premium

Comments