How materialized views pre-compute and cache expensive query results for fast reads, when to use them, how to keep them fresh, and the consistency tradeoffs of each refresh strategy.
24 min read2026-04-04mediummaterialized-viewcachingdatabasescqrsread-optimization
A materialized view stores pre-computed query results as a physical table, trading storage and staleness for sub-millisecond reads on expensive aggregations.
Regular views are just saved queries (re-executed every time). Materialized views store the actual result data.
Refresh strategies range from scheduled (simple, stale) to CDC-driven incremental (complex, near-real-time). Pick based on your freshness SLA.
The pattern extends beyond databases: CQRS read models in Redis, Elasticsearch indexes, and denormalized cache entries are all materialized views.
The fundamental trade-off is query speed vs data freshness and the operational cost of keeping materialized results current.
Your e-commerce platform has a dashboard showing each user's lifetime order count and total spend. The query JOINs the users table with orders and runs GROUP BY with SUM and COUNT.
At 100K orders, the query takes 50ms. At 10M, it takes 3 seconds. At 100M orders, it takes 15 seconds. The dashboard loads on every admin page view.
Two hundred concurrent admins means 200 copies of this 15-second query running simultaneously. Your database CPU pins at 100%. Read latency spikes for the entire application, not just the dashboard.
Your database is normalized for writes. That's correct for data integrity. But your read queries pay the full cost of re-computing JOINs and aggregations on every single request, even when the underlying data barely changed since the last query.
Indexes help with filtering, but they can't eliminate the cost of aggregating 100M rows into a summary. Application-layer caching works for simple key-value lookups, but complex aggregations with multiple dimensions (group by date, segment, and region) don't map cleanly to cache keys.
The mistake I see most often in interviews: candidates jump to "add an index" when the real problem is an expensive aggregation. Indexes don't fix GROUP BY over 100M rows. You need to pre-compute the result.
Comments
Regular views don't help here
A regular database view (CREATE VIEW) is just a saved query. It re-executes the full JOIN and GROUP BY on every SELECT. It makes your code cleaner but does nothing for performance. A materialized view stores the actual result data as a physical table.
A materialized view pre-computes a query result and stores it as a physical table, giving reads constant-time speed regardless of the source data size.
Think of a restaurant's daily specials board. The chef calculates ingredient costs and portions once each morning (the refresh), writes the final prices on the board (the materialized result), and customers read the board instantly. Nobody asks the kitchen to recalculate prices for each order. The board might not reflect a mid-day supplier price change until tomorrow's refresh, but for 99% of customers it's accurate enough. The trade-off: instant reads at the cost of occasional staleness.
-- Regular view: re-executes the full JOIN on every SELECTCREATE VIEW order_summary ASSELECT u.id, u.name, COUNT(o.id) AS order_count, SUM(o.total) AS lifetime_valueFROM users u LEFT JOIN orders o ON o.user_id = u.idGROUP BY u.id, u.name;-- Materialized view: stores the result as a physical tableCREATE MATERIALIZED VIEW order_summary_mv ASSELECT u.id, u.name, COUNT(o.id) AS order_count, SUM(o.total) AS lifetime_valueFROM users u LEFT JOIN orders o ON o.user_id = u.idGROUP BY u.id, u.nameWITH DATA;-- Concurrent refresh: no lock on readers during rebuildREFRESH MATERIALIZED VIEW CONCURRENTLY order_summary_mv;-- Requires a UNIQUE INDEX on the MV
A query against order_summary_mv reads a pre-computed table. The duration is 2ms whether the source tables have 1M or 100M rows.
The core design decision is how and when to keep the materialized view current. This choice defines the staleness window, write overhead, and operational complexity.
Scheduled full refresh runs a cron job that re-computes the entire view. Simple to implement. Appropriate for dashboards where hourly or daily data is fine (sales totals, analytics reports). The downside: the refresh itself is expensive on large tables, and data stays stale for the entire interval.
CDC-driven incremental refresh uses change data capture to apply row-level updates. When an order is inserted, a CDC event triggers an update to just that user's summary row. Far cheaper than re-aggregating 100M rows, but implementation complexity is significant.
Event-driven application-level refresh maintains the materialized view in application code, typically in Redis or a dedicated read store. Events from the write path drive real-time updates. This gives sub-millisecond reads with near-zero staleness, at the cost of building and maintaining the update logic yourself.
For systems requiring near-real-time freshness at scale, a streaming pipeline continuously maintains the materialized view:
Tools like ksqlDB and Materialize specialize in this: they let you write SQL queries that are continuously and incrementally maintained as source data changes. Think of it as CREATE MATERIALIZED VIEW but across distributed Kafka topics instead of database tables.
I find streaming MVs to be the sweet spot for most production systems that need second-level freshness. The operational complexity is real, but the combination of Debezium + Kafka + Flink is well-proven at companies like LinkedIn and Uber.
Each event modifies only the affected user's row in the materialized view. No full recomputation needed. The MULTI/EXEC block ensures the count and value update atomically in Redis.
For a database-level MV in PostgreSQL, the "implementation" is simpler:
-- Create the MV with a unique index for concurrent refreshCREATE MATERIALIZED VIEW order_summary_mv ASSELECT u.id, u.name, COUNT(o.id) AS order_count, SUM(o.total) AS ltvFROM users u LEFT JOIN orders o ON o.user_id = u.idGROUP BY u.id, u.name WITH DATA;CREATE UNIQUE INDEX ON order_summary_mv (id);-- Cron job calls this every 5 minutes:REFRESH MATERIALIZED VIEW CONCURRENTLY order_summary_mv;
Read-heavy workloads (100:1 read-to-write ratio or higher) where the same expensive aggregation runs repeatedly
Dashboard and reporting queries that JOIN multiple large tables with GROUP BY, SUM, COUNT, or AVG
CQRS read models where you maintain separate write-optimized and read-optimized stores
Search use cases where Elasticsearch serves as a materialized view over normalized relational data
Sub-millisecond latency requirements where even indexed queries on large tables miss the SLA
Multi-source aggregations that combine data from multiple services or databases
For your interview: if you see a design with expensive analytical queries hitting the OLTP database, say "I'd introduce a materialized view here" and immediately clarify whether you mean a database-level MV or an application-level one (Redis, Elasticsearch, etc.).
The Stale Read Surprise. A scheduled materialized view refreshes nightly. A user places an order at 9am, checks their dashboard at 9:05am, and their order count hasn't changed. Staleness is inherent to the pattern, and you must design the UX around it. If the use case requires immediate consistency after a user's own writes, use "read-your-own-writes" logic: check the event timestamp against the MV's last refresh, and fall back to the source query for that specific user.
The Full Refresh Lock. PostgreSQL's REFRESH MATERIALIZED VIEW takes an exclusive lock, blocking all reads for the duration. On a 100M-row aggregation, this can take minutes. The fix: use REFRESH MATERIALIZED VIEW CONCURRENTLY, which builds a new copy and swaps atomically. This requires a unique index on the view and doubles storage temporarily, but eliminates read blocking entirely.
Cascading Refresh Dependencies. Materialized view B depends on materialized view A. If A's refresh is delayed, B computes from stale data even after its own refresh completes. PostgreSQL has no built-in dependency management for MV refresh ordering. You need to manage refresh sequences explicitly, or use a streaming pipeline where dependencies are handled by topic ordering.
Unbounded Storage Growth. Each materialized view duplicates data. Five materialized views over a 100M-row table can consume significant storage if the views aren't aggregating aggressively. Monitor MV sizes and set up alerts for unexpected growth. Drop views that are no longer queried.
The Index Maintenance Trap. A materialized view is a table, so you can (and should) index it. But indexes on MVs are rebuilt during every full refresh. More indexes means slower refreshes. Keep MV indexes minimal: one for the primary lookup pattern, one for the concurrent refresh requirement.
Data can be stale (window depends on refresh strategy)
Decouples read performance from source table size
Storage cost: duplicated data plus indexes
Enables polyglot read stores (Redis, ES, etc.)
Operational complexity of refresh pipelines
Reduces OLTP database load from analytical queries
Cascading refresh dependencies are hard to manage
MVs are indexable and queryable (unlike simple caches)
Full refresh can lock reads (without CONCURRENTLY)
The fundamental tension is query speed vs data freshness. You can have instant reads or perfectly current data, but getting both requires increasingly complex streaming infrastructure.
LinkedIn uses materialized views extensively for feed rendering. The social graph data is normalized across multiple services, but the feed requires pre-joined data: who posted, their profile info, engagement counts, and relevance scores. LinkedIn maintains materialized views in Espresso (their custom data platform) that are incrementally updated by events. Feed rendering hits a single pre-computed store instead of fanning out to 6+ services per request.
Stripe builds its merchant dashboard on materialized views. Payment volume, dispute rates, and revenue breakdowns are aggregated from billions of transaction rows into pre-computed summaries updated in near-real-time. The raw transactions table is optimized for ACID writes while the dashboard reads from materialized aggregations in a separate read store. Stripe keeps dashboard load times under 200ms regardless of merchant transaction volume.
Netflix maintains materialized views of its content catalog for different consumption patterns: search (Elasticsearch), recommendations (custom serving infrastructure), and content delivery metadata. When a title's metadata changes in the primary content database, CDC streams the change to each materialized view. This replaced dozens of team-specific polling jobs, processing billions of events per day through their CDC pipelines.
Interview shortcut: name the refresh strategy
Don't just say "materialized view." Immediately say which refresh strategy: "I'd schedule a nightly refresh" or "I'd use CDC-driven incremental updates." The strategy choice is what separates a surface-level answer from a thoughtful one.
Materialized views appear whenever a design involves expensive analytical queries hitting an OLTP database. The cue: any dashboard, leaderboard, aggregation endpoint, or reporting feature that scans large tables.
When to bring it up: "The dashboard query JOINs orders with users across 100M rows. I'd create a materialized view, either database-level with REFRESH CONCURRENTLY on a schedule, or application-level in Redis updated by events, depending on the freshness requirement."
Depth expected:
At senior level: know that MVs exist, explain the staleness trade-off, mention REFRESH CONCURRENTLY
At staff level: compare refresh strategies, discuss CDC-driven incremental updates, explain how CQRS read models are materialized views
At principal level: design streaming MV pipelines (Flink, ksqlDB), discuss cascading dependencies, capacity plan storage for multiple MVs
Interviewer asks
Strong answer
"The dashboard query is too slow"
"I'd create a materialized view. For hourly-fresh data, a scheduled refresh. For real-time, an event-driven Redis aggregation. The query goes from 15s to 2ms."
"Won't the MV be stale?"
"Yes, by design. I'd set the refresh interval to match the business requirement. For a CEO dashboard, hourly is fine. For a user's own order count, I'd use read-your-own-writes to fall back to the live query."
"What about storage costs?"
"Each MV duplicates data, but aggregated MVs are often 100x smaller than the source. A 100M-row orders table might produce a 5M-row user summary MV. Monitor size, drop unused MVs."
"How does this differ from caching?"
"A cache stores the result of a specific query keyed by parameters. A materialized view stores a query result as a table you can query with different WHERE clauses. MVs are queryable and indexable; caches are key-value lookups."
A materialized view stores pre-computed query results as a physical table, converting O(N) aggregation queries into O(1) lookups.
Regular views are just saved queries (re-executed every time). Materialized views store the actual result data and must be explicitly refreshed.
Refresh strategies span a spectrum: scheduled full refresh (simple, stale) to CDC-driven incremental (complex, near-real-time). Match the strategy to the freshness SLA.
Use REFRESH MATERIALIZED VIEW CONCURRENTLY in PostgreSQL to avoid blocking reads during refresh. Always do this for MVs with live traffic.
Application-level materialized views (Redis, Elasticsearch) extend the pattern beyond the database, giving sub-millisecond reads with event-driven updates.
The fundamental trade-off is query speed vs data freshness. You can optimize for one, but getting both requires streaming infrastructure (Debezium, Flink, ksqlDB).
CQRS is the distributed-systems generalization of materialized views: separate write-optimized and read-optimized stores, connected by an event stream.