Strangler fig pattern
Learn how the Strangler Fig pattern replaces legacy monoliths incrementally, which seams to cut first, and why database migration is harder than the API layer will ever be.
TL;DR
- The Strangler Fig pattern replaces a legacy monolith by extracting functionality piece by piece into new services, routing traffic incrementally through a facade, until the old system handles nothing and can be decommissioned.
- The pattern gets its name from the strangler fig tree (genus Ficus): a vine that grows around a host tree, takes over its structure, and eventually the host rots away leaving only the vine standing as the new tree.
- Almost every team gets Phase 1 (add the facade) and Phase 2 (extract a few services) right. The 80% that fail do so in Phase 3 (database migration) and Phase 4 (actually decommissioning the monolith, not just ignoring it).
- The strangler facade is not a long-term architecture. It is a temporary routing layer that should get thinner over time, not fatter. If your facade is growing in feature logic, your migration is going backwards.
- Use this when the monolith is in active development, too risky to rewrite, and needs to coexist with new services during a multi-month migration. If you can fully rewrite in under three months, that is almost always cleaner.
The Problem
Your e-commerce platform was built in 2014. It is a Django monolith: 240,000 lines, one database with 340 tables, 12 engineers who deploy together every Tuesday at midnight, and one engineer who is the only person who understands the payment processing code. It has worked fine until now.
Now the CEO wants real-time inventory, a checkout supporting multiple payment providers, and per-region pricing. Every one of those features touches at least three modules in the monolith. The last feature like this took four months and introduced two production incidents.
The team proposes a rewrite. The tech lead estimates six months. The business rejects it: six months of zero delivery with unknown risk is not a trade they will make.
The strangler fig pattern is the engineering answer to this constraint: modernize incrementally, never stop shipping, keep the old system running as a fallback at every step.
One-Line Definition
The Strangler Fig pattern intercepts all incoming traffic via a facade, routes requests to either the legacy monolith or newly extracted services based on URL path or feature flags, and incrementally shifts routing until the monolith receives zero traffic and can be decommissioned.
Analogy
Strangler fig trees germinate in the canopy of a host tree, far above the ground. The seed sprouts roots that grow downward around the host trunk, wrapping tighter each year. The strangler gradually takes over the structural role of the host tree. Eventually the host rots away inside the strangler's root cage, leaving a hollow column with the new tree standing in its place.
Software migration works identically. The facade is the strangler seed: it wraps all traffic interception around the existing system. Services are the roots: they grow alongside the monolith, taking over specific routes one at a time. The monolith is the host tree: still running for months or years, processing less and less until it can be shut down quietly.
Users never experience a cutover moment. The host tree just stops being needed.
Solution Walkthrough
Phase 1: Introduce the strangler facade
You add a thin HTTP proxy (an Nginx config, an API gateway routing rule, or a small Node.js proxy) in front of the monolith. At this point, the facade routes 100% of traffic to the monolith. Nothing changes for users. This step is about getting the infrastructure in place before you need it.
The facade becomes the single entry point for all traffic going forward. I always tell teams to do this step in week one before writing a single line of new service code. If installing the proxy causes problems, you learn that early when nothing else is at risk.
Phase 2: Extract the first service (choose the right seam)
You identify the first functionality to extract and build a new service that implements it. The facade now routes specific paths (e.g., /auth/*) to the new service while everything else still goes to the monolith.
The choice of what to extract first matters. Pick something that is: genuinely isolated with minimal data dependencies, high-value to the business (demonstrates the new architecture working), and low-risk (not in the critical transaction path). Search services and reporting modules are often good first candidates.
Phase 3: The incremental extraction loop
Each sprint, one module gets extracted. The facade routing table grows. The monolith handles fewer routes.
New services handle more. This loop continues until the monolith hosts only the hard-to-extract core.
sequenceDiagram
participant U as ๐ค User Request
participant F as ๐ Strangler Facade
participant M as โ ๏ธ Legacy Monolith
participant S as โ
New Service
U->>F: GET /auth/login
Note over F: Route: /auth/* โ Auth Service (extracted)
F->>S: Forward request
S-->>F: 200 OK ยท JWT token
F-->>U: Response (transparent to caller)
U->>F: POST /checkout
Note over F: Route: /checkout/* โ Monolith (not yet extracted)
F->>M: Forward request
M-->>F: 200 OK ยท Order created
F-->>U: Response from monolith (transparent)
The facade is completely transparent to callers. From any client's perspective, they are talking to one coherent system throughout the entire migration.
Phase 4: Decommission the monolith
Once all routes are migrated, verify with observability that the monolith receives zero traffic for at least one full sprint. Keep it deployed as a safety net for two more sprints. Then shut it down.
Most teams never fully reach Phase 4 because of the database problem. That makes the database migration plan the first thing to establish, not the last.
Implementation Sketch
Here is a minimal strangler facade in TypeScript. This is a sketch; production implementations use Nginx, Kong, AWS API Gateway, or Envoy with proper observability.
// strangler-facade.ts โ SKETCH
// Production note: use a proper API gateway (Nginx, Kong, Envoy) rather than
// self-managed Node.js for anything beyond local development.
// This sketch shows routing logic only โ not the production mechanism.
interface RouteConfig {
pattern: RegExp;
target: "monolith" | "new-service";
serviceUrl?: string; // Required when target is "new-service"
featureFlag?: string; // Optional: gradual rollout via flag
}
const routeTable: RouteConfig[] = [
// Fully extracted โ all traffic goes to new service
{ pattern: /^\/auth\//, target: "new-service", serviceUrl: "http://auth-service:3001" },
{ pattern: /^\/users\//, target: "new-service", serviceUrl: "http://user-service:3002" },
// Partial rollout via feature flag โ only flagged users hit the new service
{
pattern: /^\/orders\//,
target: "new-service",
serviceUrl: "http://order-service:3003",
featureFlag: "orders-new-service",
},
// Not yet extracted โ monolith handles these
{ pattern: /^\/checkout\//, target: "monolith" },
{ pattern: /^\/.*/, target: "monolith" }, // catch-all: monolith is the safe default
];
async function routeRequest(req: Request, userId: string): Promise<Response> {
for (const route of routeTable) {
if (!route.pattern.test(req.path)) continue;
if (route.featureFlag && !isFeatureEnabled(route.featureFlag, userId)) {
// Flag is off: send to monolith even if route has a new service configured
return forwardTo(req, "http://monolith:8080");
}
const target =
route.target === "new-service" ? route.serviceUrl! : "http://monolith:8080";
return forwardTo(req, target);
}
return forwardTo(req, "http://monolith:8080"); // safe fallback
}
The route table is the entire migration state at any point in time. Every extracted service adds an entry. When the only remaining entry is the catch-all pointing to the monolith, you are done.
Feature flags are your emergency brake
Every route pointing to a new service should have a feature flag. If the new service has a production bug, flipping the flag routes traffic back to the monolith within seconds. No deployment, no incident war room. Without this safety valve, a bug in an extracted service requires a deployment to roll back, which means engineer intervention at 2 a.m.
Going Deeper: What the Tutorials Skip
This section contains the knowledge that separates someone who has read about Strangler Fig from someone who has shipped it. None of this is in the Wikipedia entry.
Database strangling: the real migration
The strangler fig pattern is almost always discussed as an API migration. Teams celebrate extracting the Auth Service. They show a demo of requests routing correctly. Then they discover: every new service they extracted is still talking to the same PostgreSQL database as the monolith.
True service independence requires each service to own its data. The path there is the expand/contract pattern:
- Expand: add the new column or table alongside the old one.
- Dual write: deploy application code that writes to both old and new schemas simultaneously for every new transaction.
- Backfill: migrate existing historical data from the old schema to the new.
- Switch reads: redirect all queries to the new schema. Dual writes continue until fully verified.
- Contract: after two sprints of monitoring, drop the old column or table.
This is measured in months per domain, not days. For a monolith with 340 tables, database strangling is a multi-year project even if the API extraction takes six months.
The shared database trap is the most common anti-pattern
Teams commonly claim 'We migrated to microservices' while 12 services all read the same PostgreSQL schema through shared ORM models. This is distributed monolith โ all the operational complexity of microservices with none of the independent deployability. The real test: can you upgrade, scale, and deploy each service completely independently without coordinating with any other team? If not, the database is still the monolith.
Dark launching: validate before you commit
Before routing real user traffic to a new service, send shadow requests. The facade sends each request to both the monolith (primary) and the new service (shadow), uses the monolith's response for the user, and logs any differences for analysis.
sequenceDiagram
participant U as ๐ค User
participant F as ๐ Facade (Shadow Mode)
participant M as โ ๏ธ Monolith
participant S as โ
New Orders Service
U->>F: POST /orders
F->>M: Forward request (primary โ blocks response)
F->>S: Fire shadow request (async โ user never waits)
M-->>F: 200 OK ยท total: $142.00
F-->>U: Return monolith response (user sees this)
S-->>F: 200 OK ยท total: $142.00 (match) โ
Note over F: Responses match โ new service validated
Note over F,S: Two days later โ mismatch detected
F->>S: Shadow request
S-->>F: total: $138.00 vs monolith $142.00
Note over F: MISMATCH โ log, alert, block cutover until fixed
I find that teams who skip dark launching cut over to new services and then spend the following two weeks in incident response. In my experience, two weeks of shadow mode will surface the majority of correctness bugs, pricing errors, and edge case mismatches before any user ever sees them.
Finding the right seams
Not all module boundaries are good seam candidates. The best seams are at domain boundaries, not technical ones. Domain-Driven Design calls these bounded contexts: areas of the codebase with their own consistent language and model.
The practical test: can you describe this module's responsibility in one sentence without referencing any other module? If you need to say "...and also interacts with inventory for stock reservation," it is not a clean seam.
I use three signals to find good seams in a monolith:
- Change frequency: files that change together in the same commits belong to the same domain.
- Data ownership: which module is the authoritative writer for each database table?
- Team ownership: who is actually responsible for maintaining this code day to day?
If your team of four owns the checkout module, checkout should probably be one service. If you negotiate with three other teams on every change, the boundary is drawn wrong.
The anticorruption layer
When you extract a new service, the new service speaks a clean domain language. But it still needs data from the monolith during the transition period. The Anticorruption Layer (ACL) is a translation adapter at the boundary between your new service and the legacy system.
The ACL translates legacy concepts (a CustomerRecord with 47 fields built up over 12 years) into your new clean domain model (a Customer with 8 fields that matter). Without an ACL, the monolith's data model bleeds into your new service's codebase, and you carry the legacy complexity forward into every new extraction.
When It Shines
So when does this actually make sense to reach for?
The strangler fig is the right choice when:
- The monolith is actively developed by multiple teams and cannot be safely frozen during a rewrite
- The system serves live production traffic with revenue at stake, where a cutover window longer than minutes is unacceptable
- You need to deliver new business features while the migration is in progress (the business will not pause for the migration)
- The codebase is too large or poorly understood to rewrite safely (over 100K LOC, or critical business logic with minimal test coverage)
- Different parts of the system have different scaling needs that the monolith cannot satisfy independently
The strangler fig is the wrong choice when:
- The monolith is small enough to fully rewrite in under three months
- The monolith is being sunset rather than replaced (just run a deprecation timeline and shut it down)
- The team does not understand the domain well enough to draw correct service boundaries yet (extracting wrong seams creates a distributed monolith that is harder to untangle than the original)
- The data model is so entangled that even database separation is not feasible (common in financial systems with decades of schema drift and cross-domain foreign keys)
The honest answer for most teams: you need the strangler fig when the business will not give you a rewrite window and you are honest enough not to claim you can do a full rewrite in three months.
Failure Modes & Pitfalls
Pitfall 1: The facade becomes the new monolith
The facade starts thin. Then someone adds authentication validation logic. Then business logic for special-casing certain users. Then per-route rate limiting rules with custom exceptions. Three years later, you have a 15,000-line Express application that nobody fully understands and that has become the new deployment bottleneck.
The rule is absolute: the facade contains routing logic only. No business logic. No data access. If you are tempted to put application logic in the facade, that logic belongs in a service and you should build that service.
Pitfall 2: The distributed monolith
You extract 12 services. But each service calls seven other services via synchronous REST for every user request. Order Service calls User Service, Inventory Service, Pricing Service, and Notification Service synchronously before returning. A 45ms checkout operation becomes 4 serial network hops of 45ms each.
Worse: a deployment of Inventory Service breaks Order Service because they share data contracts so tightly that any breaking change in one propagates through all callers. Services are not independently deployable in practice.
This is the distributed monolith: all the operational complexity of microservices with none of the independence benefits. The fix is proper domain design before extraction, not incident response after.
Pitfall 3: DB migration gets abandoned
Teams extract API layers successfully. They ship the facade. They migrate 80% of routes. Then the database migration is scoped at 18 more months, business pressure pushes them to new features, and the migration is quietly shelved.
The result: 12 new services and the old monolith all reading the same PostgreSQL database. You have not reduced coupling; you have increased operational complexity while maintaining all the original coupling.
My recommendation: start the database migration in Sprint 2, not Sprint 20 or "after the API extraction." Migrate one or two tables per sprint alongside the API work, making parallel progress rather than leaving the hardest part for last.
Pitfall 4: Strangling the wrong seams first
Teams often extract what is technically easy rather than what is architecturally correct. They extract user authentication first because it has a clean contract and minimal DB dependencies. But authentication is called by every other service, which means every subsequent extraction has a runtime dependency on the Auth Service before it can even be tested.
Extract leaf modules first. A leaf module is one that nothing else depends on at runtime. Start with inventory lookups, product search, or reporting. Auth, notifications, logging, and configuration management are extracted last, after you have removed all the other dependencies around them.
Pitfall 5: The migration that never reaches Phase 4
The final 20% of the monolith is always the hardest. It typically contains the transactional core: the financial calculations, the legacy payment processing, the business rules that nobody documented. Teams declare "migration complete" at 80% and leave the monolith running indefinitely.
Set a decommission date at the start of the migration. Write it into the project charter. The pressure of a real deadline forces teams to properly migrate or formally decide to rewrite the remaining pieces, rather than leaving them running as a zombie system costing operational overhead forever.
Pitfall 6: The facade is a single point of failure
Every request in the system routes through the strangler facade. If the facade goes down with a single-instance deployment, 100% of user traffic fails. This negates the zero-downtime guarantee the entire pattern is designed to provide.
Deploy the facade with the same availability requirements as the monolith it replaces: at minimum two instances behind a load balancer, health checks, automated failover, and its own incident alerting. An unmonitored Node.js proxy running as a single process is not a production-grade strangler facade.
Trade-offs
| Advantage | Concern |
|---|---|
| Zero-downtime migration throughout | Running two systems doubles operational complexity and cost |
| Reversible at every step via routing config or feature flags | Migration duration is months to years, not weeks |
| Teams ship new features on new services while migration runs | Facade adds an extra network hop to every request |
| Service boundaries can be corrected if drawn wrong | Database migration is independent and harder than API migration |
| Reduced risk vs. big-bang rewrite | Requires mature observability and routing tooling to manage safely |
| Monolith stays warm as a live rollback target at every stage | Monolith must not receive new feature development during extraction |
The fundamental tension here is speed vs. safety. The strangler fig is the safest migration path available. It is also the slowest. Big-bang rewrites succeed faster when they succeed, but they fail catastrophically at a rate that most businesses cannot absorb. For systems with active users and revenue, the cost of a failed six-month rewrite is almost always higher than the cost of an 18-month strangler migration with continuous delivery throughout.
Real-World Usage
Netflix (2009-2012): Netflix migrated from a J2EE monolith after a critical database corruption incident caused a 3-day outage in 2008. They used a strangler approach, routing traffic through an API tier they built before the term "API gateway" became standard. A key insight from their public post-mortems: they moved customer data to Cassandra for specific services before extracting those service APIs. Migrating the data first gave them genuine independence that pure API extraction would not have provided.
Booking.com: Has been running a strangler fig migration since approximately 2015 on their Perl monolith. By their own accounts at engineering conferences, they still run portions of the original Perl monolith alongside over 1,000 microservices. This is the most common real outcome at enterprise scale: partial migration that runs indefinitely as an operational steady state. Their lesson: accept partial migration as reality and manage it deliberately rather than treating it as a failure.
Shopify: Took a hybrid approach before full service extraction. By 2021 they were running over 200 distinct Rails Engines (bounded components) within a single monorepo rather than immediately extracting separate services. Select high-traffic components were later promoted to standalone services. Their public lesson: for teams under 100 engineers, a modular monolith is often a better endpoint than full microservices, and the strangler fig can stop at modularization rather than full extraction.
How This Shows Up in Interviews
Most candidates mention Strangler Fig when asked "how would you migrate from a monolith?" and explain it correctly at a surface level. Staff-level candidates are expected to go further.
Bring it up proactively when:
- The interviewer describes a system with high change rate and significant legacy debt
- The question involves migrating a working system without downtime
- The design question involves a company that has outgrown its current architecture
- Any scenario involving a big-bang rewrite with estimated delivery risk
Depth expected at a staff level:
- Explain the four phases with specific detail about what happens at each
- Identify database migration as the hardest part and describe expand/contract by name
- Know what a "distributed monolith" is and how to avoid it
- Describe dark launching and when you would use it before committing to cutover
- Explain how you pick the first extraction seam (leaf modules, not cross-cutting concerns)
- Know the Anticorruption Layer pattern and why it matters at domain boundaries
Follow-up Q&A:
| Interviewer asks | Strong answer |
|---|---|
| "How do you choose which module to extract first?" | Extract leaf nodes: functionality with minimal outbound dependencies and clean data ownership. Authentication and logging seem simple but are called by everything, making them bad first extractions. Start with inventory lookups or search where failures are fully isolated. |
| "What happens if the new service has a bug after cutover?" | Feature flags on every route. Flip the flag and the facade routes back to the monolith within seconds. The monolith stays warm in production precisely for this reason. Recovery is configuration, not deployment. |
| "How do you handle database migration without downtime?" | Expand/contract. Add the new column alongside the old one. Deploy dual writes. Backfill existing data. Verify reads migrate to the new column. Drop the old column. Each step is independently safe to roll back. |
| "How long does a strangler fig migration typically take?" | API extraction: 3-12 months for a midsize monolith. Database migration: 1-3 years for the same system. Most companies never fully finish the database layer at enterprise scale. Being honest about this rather than claiming a clean endpoint is the staff-level signal. |
| "What is the biggest risk in strangler fig?" | The distributed monolith: extracting services without correct domain boundaries gives you all the complexity of microservices with none of the independence. The second risk is facade bloat: the proxy accumulating business logic until it becomes the new monolith. |
The distributed monolith trap in interviews
Candidates lose credibility when they propose extracting 12 services but leave all 12 reading the same database through shared ORM models. If an interviewer asks "how do you ensure independent deployability?" and you have no answer about database separation, you have described a distributed monolith. Know the distributed monolith anti-pattern by name and describe how you would avoid it before you are asked.
Interview anchor: the database migration is the real challenge
Every interviewer who asks about Strangler Fig is testing whether you understand this is a 2-3 year commitment, not a 3-month project. Anchor your answer on: "The API extraction is the visible work. The database migration is what determines whether you actually get independent deployability, and it takes 3-5x longer." That framing signals you have lived the pattern, not just read about it.
Test Your Understanding
Quick Recap
- The Strangler Fig replaces a legacy monolith incrementally by routing all traffic through a facade and shifting individual routes to new services over months or years, with zero-downtime for users throughout.
- The four phases are: install the facade (week 1), extract the first leaf service, run the incremental extraction loop per sprint, and decommission the monolith when it receives zero traffic.
- Database migration is independent from API extraction and is always harder. Use expand/contract (add new schema, dual write, backfill, switch reads, drop old) and start it in Sprint 2.
- The most dangerous failure mode is the distributed monolith: services extracted with incorrect domain boundaries, calling each other synchronously, with no real deployment independence gained.
- Use dark launching (shadow mode) to run the new service against production traffic before any cutover. Block cutover until the response mismatch rate is below 0.01% for business-critical operations.
- The strangler facade must contain routing logic only. Any business logic in the facade is the beginning of a new monolith.
- In interviews, the staff-level signal is naming database migration as the harder challenge, describing expand/contract, and being honest that most migrations never fully reach Phase 4 at enterprise scale.
Variants
Event interception strangler
Instead of proxying HTTP requests, the monolith publishes domain events to a message queue (via the Outbox Pattern). New services subscribe and build their own read models. This works well when the new service primarily needs to react to monolith changes rather than serve synchronous requests. Good for analytics services, recommendation engines, and audit systems. Does not work for operations requiring an immediate response.
Database-first strangler
Extract the data layer before the API layer. Move each domain's tables to a separate database, enforce access only through a service API, then extract the HTTP boundary. This creates genuine independence from day one but is significantly harder to execute because it requires eliminating joins, managing referential integrity, and running dual writes across database boundaries.
Branch by abstraction (inside-out variant)
Rather than extracting services immediately, create an abstraction layer inside the monolith. Split the payment module into an interface and a legacy implementation. Deploy the monolith with the abstraction. Then build a new service implementing the same interface. Route through the abstraction first, then swap the implementation to the new service. Slower than external extraction but does not require a facade or a parallel runtime from day one, making it safer for teams new to service extraction.
Related Patterns
- Microservices โ The target architecture that Strangler Fig migrations move toward. Understanding service boundaries, data ownership, and operational trade-offs is prerequisite knowledge for planning an extraction.
- API Gateway โ The strangler facade is often implemented on top of an API gateway. Understanding routing, middleware, and gateway capabilities informs how sophisticated your facade can become and what it can safely do.
- Outbox Pattern โ The correct mechanism for publishing domain events from the monolith to new services. Required reading if you plan to use event-driven extraction as a migration strategy.
- CQRS โ Frequently adopted alongside Strangler Fig. Once you extract read models to new services via event streams, CQRS naturally follows as a way to separate read and write paths.
- Event Sourcing โ New services extracted via the event interception variant often adopt event sourcing as their data model, since they are already building from streamed domain events emitted by the monolith.