Monolith vs microservices

TL;DR

Use a monolith when your team is under 10 engineers, the domain is not yet fully understood, or you're building an MVP. Monolith is not a failure state. It's a starting point.
Use microservices when independent teams need independent deployment cadences, when components have wildly different scaling profiles, or when a compliance boundary (PCI, HIPAA) demands physical isolation.
The biggest microservices cost is not infrastructure. It's operational complexity: distributed tracing, eventual consistency, network failure handling, and the cognitive load of reasoning across service boundaries.
"Distributed monolith" is the worst outcome: you split services but kept shared databases or synchronous chains of 6 blocking calls. You now have all the complexity with none of the benefits.
A modular monolith with strong internal boundaries is almost always the right intermediate step before splitting into services.

The Framing

In 2019, a well-funded startup rewrote their 18-month-old Rails monolith into 23 microservices. The reason given in the retrospective: "We wanted to scale like Netflix."

Eighteen months later, they wrote a second post. Their p99 API latency had gone from 80ms to 380ms; on-call now required expertise in Kubernetes, Istio, distributed tracing, and 12 separate service repositories; and a bug in the user-profile service took down checkout through an undiscovered synchronous dependency chain. Three senior engineers quit.

The actual traffic when they migrated: 8,000 daily active users.

Now compare that with Shopify. For over a decade, Shopify ran one of the largest e-commerce platforms in the world on a single Ruby on Rails monolith. They served hundreds of thousands of merchants and survived Black Friday spikes that would topple most architectures.

Their secret was not microservices. It was a well-modularized monolith with relentless attention to database performance and caching.

The question is never "which is better?" It's "which is right for where we are today, given our team size, operational maturity, and domain clarity?"

I've seen this exact retrospective written at least a half-dozen times across different companies. The details change; the timeline doesn't.

How Each Works

The Monolith

A monolith is a single deployable unit where all application concerns live in the same process. Presentation logic, business logic, and data access code all run together and share the same memory space.

A monolith architecture showing a single deployable container with three internal layers: Presentation Layer with HTTP controllers and auth middleware, Business Logic Layer with UserService, OrderService, PaymentService, and InventoryService, and Data Access Layer with ORM and SQL query builders, all connected to a single shared PostgreSQL database below. — In a monolith, all layers run in one process and share both memory and a single database. The single deployment unit is the simplicity; the shared state is the trade-off.

When UserService calls OrderService, it's a function call in the same process: sub-microsecond, no network hop, no serialization, no retry budget. When you deploy, you build one artifact, push it, and restart one process.

The operational profile is simple: one binary to monitor, one log stream, one set of metrics, one deployment pipeline. Your junior engineer on-call at 2am is dealing with one thing that's broken, not hunting through twelve service dashboards to find which one threw the first error.

The monolith's advantage is operability and simplicity. Its disadvantage is coupling: everything deploys together, and a bug anywhere can affect everything.

I've seen teams describe themselves as 'microservices shops' while every service still deploys in lockstep from the same pipeline. The architecture label changed; the operational risk didn't.

Microservices

A microservices architecture splits the application into independently deployable services, each responsible for one well-defined business capability. The services communicate over the network, typically via REST, gRPC, or an async message queue.

When Order Service needs user data, it makes an HTTP or gRPC call to User Service. That's a network hop: 1-10ms under normal conditions, potentially much longer under load or during failures. The call can time out, fail, or return stale data.

Side-by-side diagram showing a monolith deploy on the left — where all three layers redeploy even for a one-function change — and a microservices deploy on the right, where only the changed Order Service restarts while User Service, Payment Service, and Notification Service keep running undisturbed. — Deployment scope determines blast radius. A one-function fix in a monolith restarts the entire application. In microservices, only the changed service restarts.

The core trade-off: you get independent deployability and independent scaling, but you take on the full complexity of distributed systems. Timeouts, retries, circuit breakers, distributed tracing, eventual consistency, and service discovery all become your problem.

Head-to-Head Comparison

Dimension	Monolith	Microservices
Deployment unit	Single artifact (JAR, binary, Docker image): deploy all or nothing	Per-service artifact, deployable independently
Inter-component latency	Sub-microsecond (in-process function call)	1-10ms per network hop (compounds across chains)
Failure isolation	A bug anywhere can crash everything	A failing service is isolated (if circuit breakers are configured)
Scaling granularity	Scale the entire app vertically or clone the whole thing	Scale individual services independently (CPU-heavy vs. I/O-heavy)
Data consistency	ACID transactions across all data	Each service has its own DB; cross-service requires Saga (2PC is possible but generally avoided due to coordination overhead and availability impact)
Operational overhead	One log stream, one APM service, one deploy pipeline	Per-service logging, tracing, alerting, deployment, runbooks
Developer onboarding	Clone one repo, run one service, debug one process	Learn service graph, network topology, local dev orchestration
Organizational fit	Small, unified team. Mono-repo friendly	Multiple teams with clear bounded contexts and on-call ownership
Domain modeling clarity	Boundaries are internal (modules, packages), easy to refactor	Boundaries are API contracts; changing them requires coordination and versioning
Testing complexity	One integration test suite	Per-service tests plus contract tests plus end-to-end across service boundaries
Time to first production deployment	Weeks for a minimal version	Months (infrastructure, service mesh, CI/CD pipelines, runbooks must exist first)
Right baseline team size	2-15 engineers	10+ engineers (at least one platform team of 3-4)

The fundamental tension here is independent deployability vs. operational simplicity. Microservices buy you one; monoliths give you the other.

When the Monolith Wins

Here's the honest answer on when to stay with (or start with) a monolith.

The domain isn't fully understood yet. Service boundaries drawn on day one are almost always wrong.

Changing one in a monolith is an afternoon refactor; changing one across services means a migration, versioned APIs, dual-write periods, and coordinated team deploys. Get the domain right first, inside the monolith.

Your team is under 10 engineers. A microservices architecture requires a platform engineering function: someone to own the CI/CD pipelines, the service mesh, the distributed tracing, the secrets management.

If this is the same person who's also writing features, they will be perpetually behind. The overhead eats the productivity gain.

Deployment cadence is shared anyway. If all your services deploy together because they share database migrations or version-matched APIs, you don't have independent deployability. You have a distributed monolith with all the operational costs and none of the benefits.

Stay in one place until you can genuinely release independently.

You're debugging your second incident this week. A monolith is much easier to debug: one log stream, one stack trace, one process to restart.

Before adding distributed systems complexity, push the monolith to its ceiling: read replicas, caching, async jobs, better indexing.

I often see teams reach for microservices when what they actually need is a better database query or a Redis cache. The root bottleneck rarely requires distributed systems.

The modular monolith is the underrated middle ground. Structure your monolith with strict module boundaries: each major domain (users, orders, payments) is a separate module with a published API surface, no direct cross-module database joins, and a separate schema. You get most of the architectural benefits without the operational cost.

Shopify ran this model profitably for over a decade.

When Microservices Win

So when does it actually make sense to split?

Different components have genuinely different scaling profiles. Your image-processing pipeline is CPU-bound and needs 16-core instances, your API tier is I/O-bound and runs on many small containers, and your reporting service is bursty once per day.

Forced co-location means over-provisioning everything to the worst case, or running identical instances where only 10% of the code is in active use.

Compliance requires physical isolation. PCI DSS scope reduction: if your payment processing is in a separate service, only that service's infrastructure needs to be in PCI scope. HIPAA data segmentation works the same way, and this is not a 'nice to have': it's a concrete regulatory benefit that saves months of audit work.

Teams are genuinely stepping on each other. If three teams are all making PRs to the same service and blocking each other's releases, that's a Conway's Law problem.

Conway's Law is not just descriptive: it's a design tool. Aligning team boundaries to service boundaries is the prerequisite for microservices to actually deliver their promised autonomy.

The cure is not just splitting the code: it's splitting ownership. Each service needs one team that fully owns it: product decisions, deployments, and the 3am on-call page. Without that assignment, you get a service without an owner and an incident without a responder.

In my experience, the team friction signal shows up before the scaling signal every time. By the time you're hitting CPU limits, you've usually already had the merge-conflict fight.

You need to experiment and ship at different rates. A/B testing, ML model updates, algorithm changes in a recommendation service should not be gated behind the review cycle of your checkout team. Independent services enable independent iteration rates.

For your interview: you need microservices when you have team-level autonomy requirements, not just technical scaling requirements. The organizational reason is almost always stronger than the performance reason.

The Nuance

The Distributed Monolith Anti-Pattern

This is the failure mode I see most often. A team splits their monolith into 10 services, but:

Order Service directly queries User Service's database (shared schema, just a different process)
Checkout Service makes 6 synchronous HTTP calls to 6 other services in sequence before returning a response
All 10 services deploy from the same pipeline in lockstep because service-b depends on service-a's v2 API

You now have all the complexity of microservices (distributed failure modes, network overhead, tracing requirements) and none of the benefits (independent scaling, independent deployability). This is worse than the original monolith.

The 'distributed monolith' is worse than both alternatives

If any two of the above are true, you have all the complexity of distributed systems with zero of the autonomy. Fix boundaries before adding more services.

The distributed monolith is the worst outcome: you paid the migration cost and ended up slower and more fragile than before.

The Modular Monolith: The Option Nobody Talks About

A modular monolith sits between "spaghetti code" and "microservices". It's a monolith where domain boundaries are enforced at the code level:

Each domain (users, orders, payments) is a separate module with a published interface
Cross-module access happens only through that interface, never direct DB access
Each module has its own schema namespace (separate tables, ideally separate DB credentials)
Module interfaces look like service interfaces; you're essentially pre-practicing the API contract

When you're ready to extract a service, you almost don't have to change the interface: just deploy it separately and switch in-process calls for HTTP calls. The modular monolith is the architecturally honest intermediate step.

The 'premature extraction' cost is real

Martin Fowler coined "microservices premium": the operational overhead that only becomes worth paying once you have multiple product teams with genuine service ownership. He argued the benefit only materializes above that threshold. Most teams that migrate at 10 engineers are paying the premium years before they earn the return.

The microservices premium is real. The question is not whether to pay it, but whether your team is large enough and your domain clear enough to earn the return.

You Must Be This Tall to Ride

Microservices have organizational prerequisites. If these don't exist, your migration will fail or create the distributed monolith anti-pattern:

Platform engineering function. Someone owns the CI/CD pipelines for all services, the container orchestration layer (Kubernetes), secrets management, and service discovery. This is a full-time job at 5+ services.
Distributed tracing from day one. Without OpenTelemetry or Jaeger, a 400ms p95 response in production is impossible to diagnose when the request touched 6 services. You will be blind.
Each service has one team on-call owner. A service without an on-call owner is a service that goes down and nobody fixes it at 3am.
Contract testing between services. Without Pact or similar, a schema change in User Service silently breaks Order Service in production because integration tests didn't catch it.
Feature flags and canary deployments. When a service has 100K requests/minute, you can't just deploy and hope. You need the ability to shift 5% of traffic to the new version and watch error rates before promoting.

Get these prerequisites in place before you cut the first service boundary, not after.

Service Boundaries: Where Migrations Actually Fail

Wrong service boundaries are worse than a monolith. Here's how to draw them right.

A decision guide for where to draw service boundaries, showing four signals that suggest splitting: different scale requirements, different security boundaries, different team ownership per Conway's Law, and different release cadences. If any signal is YES, define a bounded context and extract. If all are NO, keep it in the monolith and use modules/packages instead. — Service boundaries are not about code size. They're about ownership, scaling, and compliance. This guide surfaces the four real signals that justify extraction.

The theoretically correct approach comes from Domain-Driven Design: find bounded contexts, which are domains where a term means one consistent thing and has one consistent owner. "Order" in the commerce context and "Order" in the fulfillment context are different things: they share an ID but diverge in lifecycle, state machine, and data model, and that divergence is where the service boundary belongs.

In practice, the correct signals for splitting are:

Different scaling characteristics. Image processing wants 32 CPUs. API handling wants 64 small containers. Forced co-location means you provision for the extreme case across all instances.
Compliance isolation. Cardholder data (PCI) must be physically separated. Health records (HIPAA) must have audited access paths. Service boundaries here are mandated by law, not preference.
Team ownership and commit frequency. If Team A ships Payment Service 5 times a day and Team B ships Notification Service once a week, combined deployment is a drag. Split at the team boundary.
Genuine data isolation. If Service A never needs to join its data with Service B's data in the same query, they can have separate databases. The moment you need cross-service joins constantly, your boundary is wrong.

The wrong boundary is the most expensive mistake I see in real migrations. Once it's set in stone as a service contract, fixing it requires another multi-week extraction project.

Interview tip: the boundary question

When an interviewer asks how you'd structure services for a design like Uber or DoorDash, don't just list features. Say: "I'd separate the ride-matching engine from the driver-tracking service because their scaling profiles are completely different: matching is CPU-heavy and bursty, tracking is write-heavy and continuous. I'd keep user profiles and preferences in the same service initially because they're almost always accessed together." That reasoning shows you understand the why, not just the what.

Two-panel diagram comparing the shared database anti-pattern on the left — where User Service and Order Service both connect to a single shared database, creating tight coupling — versus the database-per-service pattern on the right, where User Service, Order Service, and Payment Service each own an isolated database with no cross-service schema access. — The shared database anti-pattern is the fastest way to create a distributed monolith. Isolated databases are not optional in microservices; they're the mechanism that makes independent deployment possible.

The database boundary is the architectural truth test: if two services share schema migrations, they are not actually independent services yet.

Real-World Examples

Company	Architecture	Key learning
Shopify	Single Rails monolith for 12+ years, serving 600K+ merchants, $120B+ GMV annually	Proved that a monolith with excellent database design, caching (Redis), and CDN can scale to enormous traffic without microservices. They eventually introduced "modular monolith" patterns internally to enforce domain separation. The constraint was team discipline, not technology.
Netflix	~1,000 microservices for video streaming, recommendation, and billing	Netflix was one of the earliest at this scale. What they don't tell you: they also built Hystrix, Eureka, Ribbon, Zuul, and Chaos Monkey because the platform had to exist before the services ran reliably. Budget 18 months of platform investment before the first service is stable.
Amazon	Full microservices transformation in the early 2000s (the famous Bezos API mandate)	The Bezos mandate was not "use microservices." It was "no direct database access between teams, all communication via APIs." This was primarily about organizational independence and eliminating hidden dependencies. The architecture followed the org, not the other way around.
Stack Overflow	Single monolith (primarily one SQL Server) handling ~15M+ page views per day	As of 2023, Stack Overflow (one of the top 50 US websites) still runs primarily on a monolith with approximately 9 web servers. The site's performance gains came from SQL optimization, caching layers, and CDN, not architectural decomposition.
Segment (acquired by Twilio)	Migrated FROM microservices back to a monolith in 2020 after hitting operational overhead	At 140+ engineers, Segment had built ~130 microservices. The overhead of maintaining them: operational runbooks, on-call rotations, 40% of engineering time on infrastructure. The overhead exceeded the benefit. They consolidated back to a controlled monolith + async event routing.

Segment's 'goodbye microservices' is required reading

Segment published a detailed post-mortem on their return to a monolith. The key finding: their services were too small ("nano-services"), so every feature required changes to 5+ repos, 5+ test suites, and 5+ deployments. They had split along technical lines (one data ingestion step per service) rather than business domain lines. The lesson: service granularity matters as much as the decision to split.

The pattern I see across all five: architectures built for the wrong team size or the wrong domain clarity eventually pay for it on-call.

How This Shows Up in Interviews

When to Bring It Up

You should raise the monolith vs. microservices trade-off in your system design interview as soon as the topic naturally arises, usually after defining non-functional requirements. Don't wait to be asked.

"Before I sketch the architecture, I want to flag one key decision: should this be a monolith or microservices? Given the team is still small and the domain is early, I'd start with a modular monolith and extract services only where we see scaling pressure or team friction. Does that approach make sense, or do you want me to assume we're already at scale?"

That question signals architectural maturity, not just knowledge of the pattern. Most interviewers will say "assume we're already at scale." By asking, you've already demonstrated that the correct default is to question the assumption, not blindly accept microservices as the starting point.

I've used this framing in interviews and the response is nearly always the same: the interviewer nods, says "assume scale," and the question alone has already differentiated you from the candidate who just started drawing boxes.

Depth Expected at Senior/Staff Level

Articulate the distributed monolith anti-pattern and how to detect it
Know the Conway's Law implication: service boundaries follow team boundaries, not feature boundaries
Know the platform engineering prerequisites (service discovery, distributed tracing, contract testing, secret management)
Know the Saga pattern and why cross-service ACID transactions require it
Know when a modular monolith is the right answer and how it differs from a poorly organized monolith
Articulate the latency tax: estimate inter-service call overhead and how it compounds across call chains
Give real examples of companies that stayed with a monolith at scale (Stack Overflow, Shopify); this shows you're not cargo-culting Netflix

Follow-up Q&A

Interviewer asks	Strong answer
"How do microservices communicate?"	"Synchronous: HTTP/REST for request-response where the caller needs an immediate answer. gRPC for high-throughput internal calls that benefit from binary protocol and streaming. Async: Kafka or SQS for event-driven flows where the caller doesn't need an immediate response: orders, notifications, analytics. The default I'd use: gRPC internal, REST external (client-facing), Kafka for anything post-transaction."
"What happens if a downstream service is unavailable?"	"Without resilience patterns, a synchronous call failure cascades upward; the caller gets a 500, and if the caller is also called synchronously, the failure propagates. The fix: circuit breaker (short-circuit calls to a failing service after N failures), timeout (never wait indefinitely), and fallback (cached or degraded response). In practice: a circuit breaker + 500ms timeout prevents one failing service from taking down everything else."
"How do you handle data consistency across services?"	"You give up ACID. Cross-service operations use eventual consistency via the Saga pattern: each service completes its local transaction and publishes an event. If a downstream step fails, you execute compensating transactions (refund the payment, restore the inventory). The Outbox Pattern ensures events aren't lost even if the event broker is temporarily down."
"When would you merge two services back together?"	"When two services change together on every PR (they should be one service), when a synchronous call between them is on the hot path and network overhead is measurable (check your trace spans; if service-to-service latency is 20% of total response time, that boundary has a cost), or when no single team owns both services but both are required for every feature. Service merging is a valid operation."
"How do you migrate from a monolith to microservices?"	"Strangler Fig pattern: don't rewrite, strangle. Put an API Gateway in front of the monolith. Identify the first service to extract based on a clear boundary signal (compliance, scaling, team ownership). Build the new service alongside the monolith, route a percentage of traffic to it, verify, then cut over. Never do a big-bang rewrite. Every retrospective from Amazon to Segment concludes the same: full rewrites slip, drift from the original requirements, and usually ship half-broken or get cancelled."

Test Your Understanding

Quick Recap

A monolith is a single deployable unit where all application concerns share one process and typically one database. It is simpler to develop, test, and operate, but couples deployment and scaling across all components.
Microservices decompose the application into independently deployable units. Each service owns its data and its deployment, but you take on the full cost of distributed systems: network failures, eventual consistency, distributed tracing, and service discovery.
The distributed monolith is the failure mode to avoid: services that share databases or synchronous call chains of 5+ hops have all the complexity with none of the autonomy.
The correct default for a new product or small team is a modular monolith: one deployable, enforced module boundaries, no cross-module database access, interfaces that look like future service APIs.
The signal to extract a service: compliance isolation requirement, genuinely different scaling profile, or team ownership friction that module separation cannot fix.
Before going microservices, you need: a platform team, distributed tracing, per-service CI/CD, circuit breakers, and an on-call assignment for each service. Without these prerequisites, you will create a distributed monolith.
In interviews, naming Shopify's decade-long monolith success and Segment's microservices-to-monolith reversal shows you understand this is a real tradeoff, not a ladder from worse to better.

Sync vs async communication: The communication pattern between microservices shapes failure modes, consistency guarantees, and latency. Choosing between REST, gRPC, and Kafka is the first decision after choosing microservices.
API gateway: The API Gateway is the entry point to a microservices cluster and handles cross-cutting concerns (auth, rate limiting, routing) that would otherwise be duplicated in every service.
Saga pattern: When microservices need to coordinate state changes across multiple databases, the Saga pattern is the standard replacement for ACID transactions. Required knowledge if you're designing any transactional microservices flow.
Service mesh: At 10+ services, a service mesh (Istio, Linkerd) automates the circuit breakers, retries, mutual TLS, and observability that you'd otherwise build per-service. The point at which it becomes worth the overhead.
CQRS: When microservices diverge on read versus write access patterns, CQRS provides the pattern to separate these paths without coupling service deployments to shared query models.