CQRS (Command Query Responsibility Segregation)
Learn how CQRS separates reads from writes into independent models so each can be optimized, scaled, and evolved without the other paying the cost.
TL;DR
- CQRS separates every system operation into two categories: Commands (intent to change state) and Queries (requests for data), with independent models, schemas, and code paths for each.
- The root driver is asymmetric load: production systems run 10β100Γ more reads than writes, yet a unified model forces both to compete on the same schema, the same indexes, and the same database connections.
- Commands go through validation and domain logic before mutating state. Queries skip all of that β they hit a denormalized read projection that is pre-computed exactly for what the UI needs. No JOINs. No aggregate reconstruction.
- The fundamental tension is consistency vs. performance: the read model is asynchronously updated after write events, meaning reads return a snapshot that may be milliseconds to seconds behind the latest write. This is eventual consistency, by design.
- CQRS adds real complexity β a second data store, an event bus, projection infrastructure, and operational overhead. Only reach for it when read and write workloads genuinely differ in volume, shape, or scaling requirements.
The Problem
It's 11:59 a.m. on a flash sale day. Your e-commerce platform is absorbing 50,000 concurrent product-page views per minute from users refreshing to see whether their target item is in stock. At the same time, your order flow is trying to write at 400 orders/minute.
Those seem like different problems at different scales. But they both touch the same Product aggregate, the same Inventory table, and ultimately the same PostgreSQL instance.
Here's where it breaks: your product-page query does a four-table JOIN across products, inventory, pricing, and promotions to assemble the display view. This query runs 833 times per second at peak. Each one takes 8ms.
Meanwhile, an order write needs to lock the inventory row to decrement stock atomically. The reads hold shared row locks. The inventory write queues behind them.
Orders start failing. Retry storms amplify the contention. By 12:01 p.m. your error rate is 12% and climbing β not because you ran out of database capacity, but because a complex read query and a critical write path are competing on the same row.
This is the pattern I see in almost every read-scaling incident before teams reach for CQRS β the bottleneck isn't hardware, it's data model contention.
The deeper issue: a single model can't be optimal for reading and writing simultaneously. Normalized schemas favor writes (less duplication, fewer update anomalies); denormalized schemas favor reads (fewer JOINs, pre-computed answers). No single table design can optimize for both.
Adding read replicas helps with read throughput, but doesn't help with query complexity β your expensive four-table JOIN still runs on every replica, every request. You need a different shape of data, not more copies of the same shape.
One-Line Definition
CQRS routes all write operations through a Command Model that enforces business rules and domain logic and routes all read operations through a Query Model purpose-built for the exact data shape each UI view needs, so each side can be independently optimized, scaled, and evolved.
CQRS takes its name from Command-Query Separation (CQS) β a method-level principle by Bertrand Meyer stating that every function is either a command (changes state, returns nothing) or a query (returns data, never changes state), never both. CQRS applies that principle at the architectural level: entire code paths, data models, and infrastructure are separated by role. Knowing this lineage matters in interviews because some interviewers use "CQS" when they mean "CQRS" β clarifying the distinction signals depth.
Analogy
Think of a busy hospital's medical records system.
When a surgeon files an operation note, it goes through a structured, validated, legally-binding intake process β specific form fields, mandatory signatures, routing to the patient's chart by code. That's the command side: business rules enforced, domain model applied, state changed on a transaction boundary.
When the billing department pulls a patient summary, they don't read the raw operation notes. They read a billing projection β a pre-assembled view that aggregates diagnosis codes, procedure codes, and dates from dozens of underlying records into a format the billing software can consume directly. No JOINs required at read time. No surgeon filing a note has any idea this projection exists.
Same underlying patient data. Two completely different models. Because the shape of data needed for a surgical note and the shape needed for a billing view are completely different problems.
CQRS applies the same logic to software: separate changing state from reading state. They have different validation requirements, different data shapes, and β at scale β fundamentally different load profiles.
The Full Architecture
The critical design constraint: the command side and query side must never talk to each other synchronously. Commands write to the Write Store, emit a domain event, and are done. Projections listen to events and maintain their own optimized read stores. This one-way async coupling is what gives each side the freedom to evolve and scale independently.
For your first interview: write that two-path diagram in the first 3 minutes, label the event bus, and move on. That one sketch separates CQRS-literate candidates from everyone else who just says "separate reads and writes."
Solution Walkthrough: The Command Side
A command is an expression of intent: PlaceOrder, CancelShipment, UpdateInventoryCount. It carries everything the handler needs to execute the intent and nothing else. It does not return data β it returns success/failure.
// command-types.ts β SKETCH
// Commands are named by intent, not by CRUD verb.
// "UpdateProduct" isn't a command name. "AdjustProductPrice" is.
interface PlaceOrderCommand {
readonly type: 'PlaceOrder';
readonly commandId: string; // idempotency key β critical for retry safety
readonly userId: string;
readonly items: Array<{ productId: string; quantity: number }>;
readonly shippingAddressId: string;
}
interface AdjustInventoryCommand {
readonly type: 'AdjustInventory';
readonly commandId: string;
readonly productId: string;
readonly delta: number; // positive = restock, negative = sale
readonly reason: 'sale' | 'return' | 'manual-adjustment';
}
type OrderCommand = PlaceOrderCommand | AdjustInventoryCommand;
The command handler is the entry point. It validates the command, loads the aggregate from the Write Store, applies the command to the aggregate (which enforces domain rules), persists the result, and publishes a domain event.
// place-order-handler.ts β SKETCH
// Real implementations use a command bus / dispatcher.
// This shows the mechanics without the plumbing.
class PlaceOrderHandler {
constructor(
private readonly orderRepo: OrderRepository,
private readonly inventoryService: InventoryService,
private readonly eventBus: EventBus,
private readonly idempotencyStore: IdempotencyStore,
) {}
async handle(cmd: PlaceOrderCommand): Promise<OrderId> {
// 1. Idempotency guard β a retry of the same commandId is a no-op
const existing = await this.idempotencyStore.get(cmd.commandId);
if (existing) return existing.orderId;
// 2. Validate inputs at the boundary (not inside the aggregate)
if (cmd.items.length === 0) throw new ValidationError('Order must have items');
// 3. Domain logic: check stock inside a transaction
const order = await this.orderRepo.withTransaction(async (tx) => {
// Load + reserve β aggregate enforces business invariants
const reserved = await this.inventoryService.reserveItems(cmd.items, tx);
if (!reserved.success) throw new InsufficientStockError(reserved.unavailable);
const newOrder = Order.create({
userId: cmd.userId,
items: cmd.items,
shippingAddressId: cmd.shippingAddressId,
});
await this.orderRepo.save(newOrder, tx);
await this.idempotencyStore.set(cmd.commandId, { orderId: newOrder.id }, tx);
return newOrder;
});
// 4. Publish event OUTSIDE transaction β projection updates are async
await this.eventBus.publish({
type: 'OrderPlaced',
orderId: order.id,
userId: order.userId,
items: order.items,
placedAt: new Date().toISOString(),
});
return order.id;
}
}
sequenceDiagram
participant C as π€ Client
participant CH as π₯ Command Handler
participant AG as π Order Aggregate
participant WDB as ποΈ Write Store
participant EB as β‘ Event Bus
C->>CH: PlaceOrder{commandId, items, userId}
CH->>CH: Check idempotency store
Note over CH: commandId not seen β proceed
CH->>AG: Order.create(items, userId)
activate AG
AG->>AG: Check business invariants<br/>(stock, address validity)
AG-->>CH: Order{id, state:PENDING}
deactivate AG
CH->>WDB: BEGIN transaction
CH->>WDB: INSERT orders + reserve inventory
CH->>WDB: SET idempotency(commandId)
CH->>WDB: COMMIT
WDB-->>CH: saved Β· orderId=abc123
CH->>EB: publish OrderPlaced{orderId, items}
EB-->>CH: acknowledged
CH-->>C: HTTP 202 Accepted Β· orderId=abc123
Note over C: Response has NO order data<br/>Query it separately
The command side never returns the object it just created. It returns an identifier. The client queries the read model to get the full view. This is not an accident β it's the contract.
Solution Walkthrough: The Query Side
A query expresses the shape of data a UI view needs. GetOrderSummaryForUser, GetProductListingPage, GetRecentActivityFeed. It carries filter parameters and returns a pre-assembled data structure. It contains zero business logic.
// query-types.ts β SKETCH
interface GetOrderSummaryQuery {
readonly type: 'GetOrderSummary';
readonly orderId: string;
readonly requestingUserId: string; // for authorization check only
}
// The query handler reads directly from the read projection.
// No JOIN. No aggregate load. Just a key-value lookup or a
// single flat table SELECT optimized for this exact query shape.
class GetOrderSummaryHandler {
constructor(private readonly readDb: ReadDatabase) {}
async handle(query: GetOrderSummaryQuery): Promise<OrderSummaryView | null> {
// Authorization check ONLY β projections contain no business logic
const view = await this.readDb.queryOne<OrderSummaryView>(
'SELECT * FROM order_summary_projection WHERE order_id = $1',
[query.orderId]
);
if (!view || view.userId !== query.requestingUserId) return null;
return view; // already in the shape the UI needs β no mapping
}
}
sequenceDiagram
participant C as π€ Client
participant QH as π Query Handler
participant RP as π Read Projection
participant WDB as ποΈ Write Store
Note over C,WDB: Happy path β projection up to date
C->>QH: GetOrderSummary{orderId: abc123}
QH->>RP: SELECT * FROM order_summary WHERE id=abc123
RP-->>QH: {id, status, items[], total, tracking} Β· < 2ms
QH-->>C: HTTP 200 Β· OrderSummaryView
Note over C,WDB: Edge case β query before projection updates
C->>QH: GetOrderSummary{orderId: abc123}
Note over QH: Order placed 50ms ago<br/>Projection lag: ~200ms
QH->>RP: SELECT * FROM order_summary WHERE id=abc123
RP-->>QH: (null) β not yet projected
QH-->>C: HTTP 404 or stale previous state
Note over C: Client sees "Order not found"<br/>even though it was placed.<br/>This is the "read your own write" problem.
That second path β where the client immediately queries their newly-placed order and gets a 404 β is the most common reason teams regret adopting CQRS without planning for it. I'll cover the mitigation in Failure Modes.
Read Model Projections
A projection is a continuously-updated, denormalized view of your data, computed by processing the stream of domain events. Key properties:
| Property | What it means |
|---|---|
| Purpose-built | Each projection stores exactly what one query type needs β no more, no less |
| Denormalized | Data is repeating intentionally: the OrderSummary projection might store the user's full name even though it's in the users table, because it was true at the time of the order |
| Append-friendly | Projections are updated by processing new events β never by reading the current state and merging |
| Rebuildable | If a projection is wrong or corrupt, delete it and replay all events from the beginning β the final state is identical because events are the source of truth |
| Storage-agnostic | Each projection can live in Redis (< 1ms key lookups), Elasticsearch (full-text search), Redshift (analytics), Postgres (complex queries) β choose the tool that best fits the query pattern |
| Idempotent | Processing the same event twice must produce the same output β mandatory because at-least-once delivery guarantees duplicate events will arrive. Use UPSERT keyed on the event's id in every projection handler. |
The insight most candidates miss: you can have as many projections as you have query patterns. A single OrderPlaced event might update OrderSummary, UserOrderHistory, InventoryLevels, RevenueMetrics, and SearchIndex β five independent stores, each optimized for a completely different query shape. The same event fans out to all of them.
Implementation Sketch: The Full Loop
// projection-handler.ts β SKETCH
// Shows how a projection handler processes an event and updates its read store.
interface OrderPlacedEvent {
type: 'OrderPlaced';
orderId: string;
userId: string;
items: Array<{ productId: string; name: string; quantity: number; price: number }>;
placedAt: string;
}
class OrderSummaryProjectionHandler {
constructor(private readonly readDb: ReadDatabase) {}
async handle(event: OrderPlacedEvent): Promise<void> {
// Compute the denormalized view from the event data directly.
// No database reads needed β the event carries everything.
const total = event.items.reduce((sum, item) => sum + item.price * item.quantity, 0);
await this.readDb.upsert('order_summary_projection', {
order_id: event.orderId,
user_id: event.userId,
status: 'PENDING',
total_amount: total,
item_count: event.items.length,
items_json: JSON.stringify(event.items),
placed_at: event.placedAt,
});
// That's it. No JOIN. No aggregate load. Pure event β projection.
}
}
The projection handler's job is mechanical: event in β read store updated. It has no business logic. It cannot reject an event (the event already happened). If it fails, it retries from the event. This is what makes the read side eventually consistent β it's just a consumer catching up to a stream.
When It Shines
So when does adopting CQRS actually pay off? The short answer: when your read and write workloads genuinely differ in volume, complexity, or shape β and you have the team capacity to operate the resulting infrastructure.
CQRS is the right answer when:
- Read traffic is 10Γ or more than write traffic, and read queries are hitting normalized tables that need denormalized views (analytics dashboards, search pages, feed aggregations)
- Different UI views need radically different data shapes from the same underlying domain objects β a product listing page, a checkout cart view, an order history timeline, and a seller analytics dashboard all read from the same core
Orderdata but need it in completely different shapes - Write complexity is high β rich domain models with business invariants, inventory locks, multi-step validations where keeping that logic in one place (the aggregate) is architecturally sound
- You need to consume domain events in multiple downstream systems β billing, notifications, analytics, search indexing β and a unified event stream is cleaner than each system polling the DB
- You need an audit trail of what happened and why, not just the current state
Avoid CQRS when:
- Your team is fewer than 6β8 engineers who will need to maintain this β the operational overhead is real and will dominate sprint capacity
- The application is CRUD at heart β a back-office admin tool, a configuration portal, a CMS where reads and writes are roughly in balance
- You already have eventual consistency problems you haven't solved β CQRS adds more eventual consistency to deal with, not less
- You're at an early stage where the domain model is still changing rapidly β projection schemas are expensive to migrate
The rule I use: if three engineers can't sketch the complete event flow, the command handlers, and the projection rebuild strategy in under 20 minutes, you're not ready to ship CQRS to production.
Failure Modes & Pitfalls
1. The "Read Your Own Write" Problem
You place an order. You immediately redirect to /orders/abc123. The projection hasn't updated yet β it takes 80β200ms. The page shows 404. From the user's perspective, their order vanished.
The fix: Your command handler returns the orderId. Your UI displays an "Order placed β loading confirmation..." state and polls the read API with exponential backoff for up to 2 seconds. If your read side has event-driven updates (Kafka consumer lag ~50ms), this window is very small. You can also expose a "version token" with the command response β check that the projection's last-applied version is β₯ the command's version before returning data.
The most common CQRS UX bug
If users can't immediately see the result of their own action, they'll retry β causing duplicate commands. Command idempotency (the commandId field in your command type) is the only safe defense. Without it, your "place order" button becomes a "place multiple orders" button under any network hiccup.
2. Command Idempotency Is Not Optional
HTTP requests are retried by proxies, clients, and service meshes. A PlaceOrder command arriving twice means two orders. Every command handler must be idempotent: the same commandId arriving a second time must return the same result as the first time without re-executing side effects.
Store the commandId and its result in a durable idempotency store (Redis with TTL, or a side table in your write DB) inside the same transaction as the state change. Never store idempotency separately β that creates a TOCTOU race.
3. Eventual Consistency Surprises in Multi-Step Flows
A customer places an order, and the shipping service needs to read the order details 50ms later to generate a shipping label. If the shipping service reads from the projection and the projection hasn't updated yet, it gets stale or missing data.
This isn't a projection bug β it's an architecture misunderstanding. Services that are part of the same transactional flow should consume events, not read from projections. The shipping service should subscribe to OrderPlaced events directly rather than polling the read model. Projections are for user-facing reads, not service-to-service coordination.
4. Aggregate Boundary Mistakes
This is the highest-leverage decision in CQRS implementation β and where I see the most mistakes in design reviews.
Too-large aggregates (e.g., Order that contains Shipment, Invoice, PaymentMethod, and Promotions) become transaction hot-spots. Every command on any of those sub-concerns locks the entire aggregate. Write throughput craters.
Too-small aggregates (e.g., separate OrderAggregate and ShipmentAggregate that both need to be consistent) push you toward distributed transactions or saga patterns β which are more complex than what you tried to avoid.
The rule: an aggregate boundary should equal one transaction boundary and one unit of invariant enforcement. If two concepts must always be consistent with each other, they belong in the same aggregate. If they can be eventually consistent, they can be separate aggregates. The tests: "can these two be updated in parallel without a conflict?" and "does a business rule span both?"
5. Schema Evolution for Events β The Hardest Part of CQRS
Once an event is published and consumed by projections, changing its schema is a production incident waiting to happen. Events are the contract between the command side and every downstream consumer.
Add a field? Consumers must handle its absence in old events. Remove a field? Every consumer that reads it breaks. Rename a field? All existing events in the event store have the old name.
The technique: event upcasting. When you read an old event from the store, you pass it through an upcaster β a function that transforms the old schema to the new schema before the projection handler sees it. The event store keeps the original; the upcaster makes it appear as the new schema to all consumers.
// event-upcaster.ts β SKETCH
function upcastOrderPlacedV1toV2(event: OrderPlacedV1): OrderPlacedV2 {
return {
...event,
type: 'OrderPlaced',
version: 2,
// V1 had "shippingAddress" as a string; V2 split it into an object
shippingAddress: parseAddressString(event.shippingAddress),
// New field added in V2 β default value for all historical events
loyaltyPointsUsed: 0,
};
}
This is manageable with a few events. At 50+ event types in a long-lived system, upcaster maintenance becomes a non-trivial engineering burden. Plan for it explicitly.
Trade-offs
| Pros | Cons |
|---|---|
| Read and write sides scale independently β add read replicas or optimize read stores without touching write logic | Two data stores, an event bus, and projection infrastructure β 3Γ more moving parts than CRUD |
| Read queries hit purpose-built projections β no JOINs, sub-millisecond reads, tailored exactly to UI needs | Eventual consistency is real β reads may be milliseconds behind writes; UI must handle this gracefully |
| Write side enforces clean domain invariants without read concerns polluting the model | Event schema evolution is hard β changing published event formats is a production migration, not a refactor |
| Projection bugs are recoverable β replay all events to rebuild any projection from scratch | Command idempotency is mandatory β every command handler needs a durable idempotency mechanism |
| Domain events create a natural audit log of what happened and why | Debugging distributed event flows is harder than debugging a single function call stack |
| Multiple projections from the same events β search, analytics, and the main app all read from the same event stream | Team learning curve is steep β developers must understand aggregates, events, projections, and eventual consistency simultaneously |
The fundamental tension here is correctness on writes vs. performance on reads. CQRS accepts weaker read consistency guarantees in exchange for letting each side be as efficient as it can possibly be. If your system requires strong read-after-write consistency for every user action, CQRS will fight you at every step.
CQRS + Event Sourcing
CQRS and Event Sourcing (ES) are frequently paired but are independent patterns. You can have CQRS without ES, and ES without CQRS; they just work particularly well together.
CQRS without Event Sourcing: State is stored in a normal write DB (Postgres, MySQL). On each command, you UPDATE the row and publish an event to the bus. Projections consume events. The current state is always directly readable from the write DB. Simpler to operate; no event store required.
CQRS + Event Sourcing: State is stored as an append-only stream of events. To read current state, you load all events for an aggregate and replay them. No mutable state rows. The event store is the source of truth. Projections are derived views.
The event sourcing approach adds:
- Full audit trail: you can answer "what was the state of order 123 at 2:03 p.m. last Tuesday?"
- Event replay: rebuild any projection from any point in history
- Temporal queries: answer historical what-if scenarios
- Event-driven integration: downstream services subscribe to events as a natural integration point
It also adds:
- Event store capacity planning (millions of events accumulate fast)
- Snapshot strategies (loading 10,000 events per aggregate read-start is too slow β snapshots cache aggregate state every N events)
- Harder schema migration (events are immutable; upcasters are the only migration path)
My recommendation: start with CQRS-only using Postgres as the write store and Redis or Postgres read projections. Add Event Sourcing only when you have a concrete requirement for audit trails or event replay, and when your team has enough CQRS operational experience first.
Real-World Usage
Shopify β CQRS at checkout scale
Shopify processes millions of checkout operations per day, with massive read:write asymmetry during flash sales on popular Shopify storefronts (Kylie Cosmetics, Supreme). Their checkout service separates the command path (add to cart, place order, apply discount) from the query path (product listings, cart view, order history).
The read models are denormalized projections stored in Redis and served at the edge β product data rarely changes but is read millions of times per minute. The write path is isolated in a separate service with strict transactional guarantees. The architectural separation is what allows them to absorb the 100Γ to 1,000Γ traffic spikes during flash sales without query-side complexity affecting write throughput.
Microsoft Azure DevOps (formerly VSTS)
The Azure DevOps team published their CQRS journey in 2017. Their work item system (bugs, tasks, features) had a read:write ratio of approximately 100:1. The write model enforced work item validation rules (field dependencies, area path restrictions, state transitions).
The read model served the board view, backlog view, sprint view, and query results β each a different projection with a different denormalization strategy. The insight they shared: projection overhead consumed roughly 20% of their total infrastructure spend, but enabled them to serve the board view in under 5ms without any database JOINs. Without projections, the board view JOIN took 400ms.
LinkedIn β The Feed CQRS Pattern
LinkedIn's newsfeed is a textbook CQRS implementation. A write to the "post an update" command side triggers a fan-out event that updates pre-computed feed projections for each of the author's connections. The query side for a user's feed is a single key-value lookup β O(1), no joins, under 1ms.
The famous challenge: an influencer with 30 million connections posting an update triggers 30 million projection update events. LinkedIn's solution β hybrid push/pull where high-follower accounts use pull-on-read β is a direct consequence of projection fan-out cost at extreme scale. Even CQRS has an asymmetric load problem when one aggregate fans out to too many projections simultaneously.
The fan-out cost I see teams underestimate most consistently: CQRS has its own scaling ceiling, and it's on the projection consumer side, not the command side. LinkedIn hit it at 30 million followers. Your team will hit it much sooner if you add projections without monitoring per-projection consumer lag.
How This Shows Up in Interviews
Here's the honest picture: CQRS is asked in two modes. Either the interviewer wants you to propose it as a solution to a read-scaling problem they've described, or they ask you directly "how does CQRS work?" to probe architecture depth. The mistake I see most often is candidates describing CQRS as "just separate read and write databases" β which misses every interesting part.
When to bring up CQRS proactively
In any design that has heavy read traffic + complex domain logic (social feeds, e-commerce product pages, financial dashboards), explicitly say: "I'd use CQRS here β commands go through the domain model with full validation, and I'd maintain a denormalized projection for the read path. At 95% reads, the read projection will never need to touch the normalized tables." Then name your event bus (Kafka, SQS) and your projection store (Redis, Elasticsearch depending on query shape). That's enough for the first mention.
The common CQRS answer that signals junior thinking
Saying "CQRS means having separate read and write databases with read replicas for each" is wrong. Read replicas replicate the normalized write schema β you still need JOINs and indexes optimized for writes. CQRS's value is the denormalized projection that is purpose-built for the query shape. That's the key distinction. Get this wrong and the interviewer knows you've memorized the acronym, not the mechanism.
Depth expected at senior/staff level:
- Know the eventual consistency contract precisely: "the projection may be 50β200ms behind the write event, depending on event bus lag. The client must handle the case where a query immediately after a command returns a stale or absent view."
- Name the command idempotency solution: commandId + idempotency store inside the same DB transaction as the state change
- Name the aggregate boundary trade-off: too large = transaction hot-spot; too small = need saga for cross-aggregate consistency
- Know the shadow-replay pattern for fixing projection bugs in production without downtime
- Distinguish CQRS from CQRS + Event Sourcing and know when to add ES (audit trails, event replay, multi-consumer fan-out)
Common follow-up questions and strong answers:
| Interviewer asks | Strong answer |
|---|---|
| "What's the consistency guarantee for reads in CQRS?" | "Eventual consistency. The projection is updated asynchronously after the command is processed. Typical lag with Kafka is 50β200ms same-region. You need to design the UI to handle the window where a user queries their own recent write and gets a stale view β either via optimistic UI, polling with version check, or a 'pending' placeholder state." |
| "How do you handle a bug in a projection?" | "Shadow replay: deploy the fixed handler under a new projection name, replay all historical events against it in parallel with the live projection, once caught up atomically flip the read target, wait 30 mins, drop the old projection. The replay is idempotent β same events, same logic, deterministic output." |
| "Why is idempotency required in CQRS?" | "Network retries re-deliver commands. A second PlaceOrder with the same commandId must be a no-op, returning the same orderId as the first. Store commandId + result inside the same write transaction as the state change. Never store them in separate operations β that creates a race between the transaction and the idempotency record." |
| "When would you NOT use CQRS?" | "When the team is below 6β8 engineers, when the domain is simple CRUD, when eventual consistency would surface to users in ways they'd notice as bugs, or when the application is still in rapid iteration where the domain model changes weekly β projection migration debt accumulates fast." |
| "How does CQRS differ from just adding read replicas?" | "Read replicas replicate the write schema β you still need JOINs optimized for a normalized write model. CQRS projections are purpose-built: the OrderSummary projection might be a single flat table with everything the order page needs, pre-computed at write time. Projection reads are O(1) key lookups. Read replica reads are still O(n) JOINs, just on more hardware." |
Test Your Understanding
Variants
Partial CQRS
Apply the command/query split at the service level without separate data stores. The same database serves both writes and reads, but the code paths are cleanly separated into command handlers (with domain logic) and query handlers (thin DTO reads). There are no projections β queries read from the same normalized tables as writes.
This is CQRS in principle without the operational overhead. Use it when you want clean code organization and future scaling optionality, but aren't yet at the read:write scale asymmetry that justifies separate data stores. It's a 30-minute refactor to add proper projections later when the need arises.
Event-Driven CQRS (CQRS + Event Sourcing)
The write side becomes an append-only event store; all projections are rebuilt by replaying that history. Adds full audit trails, temporal queries, and guaranteed projection recovery β at the cost of snapshot management and immutable event schema contracts. See CQRS + Event Sourcing above for the full comparison and decision criteria.
Micro-CQRS
Apply CQRS at the component level within a single service β e.g., the order management component has separate command and query interfaces, with the query interface backed by a materialized view in the same database. No event bus, no separate stores, no distributed infrastructure. Good for medium-scale applications that want the design clarity of CQRS without the distributed systems overhead.
How to Choose
Quick Recap
- CQRS separates commands (write intent with business rules) from queries (read requests returning pre-computed projections) β each on its own model, code path, and optionally its own data store.
- The core driver is asymmetric load: reads are 10β100Γ more frequent than writes, but a unified model forces both to compete on the same indexes, locks, and schema β which is suboptimal for each.
- Eventual consistency is the price: projections are asynchronously updated from domain events, meaning reads may be 50β200ms behind writes. UI must be designed to handle this window intentionally.
- Command idempotency is mandatory: network retries will re-deliver commands; every command handler needs a durable
commandIdcheck inside the same DB transaction as the state change. - Aggregate boundaries are the highest-leverage design decision: too large = transaction hot-spots; too small = need saga coordination for cross-aggregate consistency.
- Projection bugs are recoverable via shadow replay, but only if the event store retains full history β this is the strongest operational argument for Event Sourcing alongside CQRS.
- Reach for partial CQRS first β same DB, clean code separation, no event bus β and graduate to full CQRS with separate stores only when read traffic overwhelms the normalized schema.
Related Patterns
- Event Sourcing β The natural complement to CQRS: stores state as an immutable event log instead of mutable rows, enabling full audit trails, event replay, and guaranteed projection rebuilds from history.
- Saga Pattern β Required when CQRS aggregate boundaries split concerns that must eventually be consistent with each other. Sagas orchestrate the compensating event chains between separate aggregates.
- Outbox Pattern β The mandatory pairing for publishing events from command handlers. Guarantees exactly-once event publication without distributed transactions by writing events transactionally to an outbox table and relaying asynchronously.
- Message Queues β The infrastructure backbone of CQRS's event bus. Understanding Kafka consumer groups, offset management, and at-least-once delivery is required to operate a CQRS system reliably at scale.
- Databases β CQRS lets you choose different databases for command and query sides. Understanding when Postgres, Redis, Elasticsearch, or a columnar store is right for a given projection shapes whether CQRS gives you performance wins or operational debt.