The Strangler Fig Pattern: Migrating Legacy Monoliths to Modern Architecture

Most enterprises don't get to build the modern architecture greenfield. They have to migrate into it — from a decade-old monolith, from a sprawling SOA with bespoke point-to-point integrations, from a vendor system that was never designed to be extended the way the business now wants. The architecture pieces I've written about — ESB + event-driven, CQRS + event sourcing, sagas, the outbox pattern — describe target states. This piece is about how to get there from where you actually are.

The pattern that works, almost without exception, is the strangler fig: incrementally route traffic away from the legacy system, replacing it service by service while the old system continues to handle everything not yet migrated. The name comes from the strangler fig tree, which grows around a host tree until eventually the host dies and only the fig remains. Applied to software, the metaphor is exact — the new architecture grows around the legacy system until eventually the legacy can be retired.

This is the practitioner's guide to running a strangler fig migration in production. Why big-bang rewrites fail and this works, the routing facade that makes the pattern possible, service-by-service extraction, the unavoidable data co-existence problem, and the cut-over patterns that finally let the legacy system go.

Why Rewrite Projects Fail

Before the strangler fig makes sense, the alternative has to be visible. The big-bang rewrite — build the new system in parallel, switch over on a flag day — is the default instinct, and the historical failure rate is staggering. Most ambitious rewrites either never ship, ship at a fraction of the original system's functionality, or ship and then drag for years as the team rediscovers all the edge cases the legacy system silently handled.

   The big-bang rewrite trap:

   Year 0:  Decide to rewrite. Start new system in parallel.
            Legacy keeps running. New team builds the replacement.

   Year 1:  New system has 30% of features. Legacy continues to evolve
            (because the business doesn't stop). New team falls behind
            on parity.

   Year 2:  New system has 50% of features. Legacy has grown new features
            that new system has never seen. New team's scope has doubled.

   Year 3:  Decision point. Cancel the rewrite, ship at reduced scope,
            or keep going. Most projects cancel or ship in a degraded state.

   Year 4:  Legacy is still running. New system is either dead or has
            become a second legacy in parallel to the first.

Three structural reasons rewrites fail:

The legacy is a moving target. The business doesn't pause feature development during the rewrite. Every quarter the new system has to ship more than it started with just to catch up.
Hidden complexity surfaces only in production. The legacy handles edge cases that nobody documented. Migration scope expands as the team discovers them, and the discovery happens during the migration, not before.
The cut-over is too big. Switching everything at once is a single point of failure with massive blast radius. Teams fear it correctly, and the fear delays the cut-over indefinitely.

The strangler fig sidesteps all three. The migration is incremental — each piece is small, ships independently, and adds business value on its own. The legacy continues to handle anything not yet migrated. There is no flag day; there are dozens of small cut-overs, each one reversible.

The Strangler Fig Metaphor and Core Idea

The pattern's central insight: route traffic to the new system gradually, one capability at a time, while the legacy system continues to handle everything else. A routing facade sits in front of both systems and decides — request by request — which one should handle each call.

   Initial state (everything legacy):

      [Clients] --> [Routing Facade] --> [Legacy Monolith]


   Mid-migration:

      [Clients] --> [Routing Facade] -+--> [Legacy Monolith]
                                       |    (handles most things)
                                       +--> [New Service A]
                                       +--> [New Service B]
                                            (handle their slices)


   Final state:

      [Clients] --> [Routing Facade] -+--> [New Service A]
                                       +--> [New Service B]
                                       +--> [New Service C]
                                       ...
                                       (legacy retired)

The migration moves through these states one service at a time. At any given moment, most of the system is still handled by the legacy. Each new service ships when ready and starts taking its share of traffic. The facade abstracts the choice away from clients — they don't know or care whether their request hits legacy or new.

The strengths are structural:

Each step is small and reversible. A new service can be shipped, exposed to a fraction of traffic, monitored, and rolled back without affecting anything else.
The legacy keeps running. New feature development on the legacy doesn't have to stop. The migration happens alongside normal business operations.
Risk compounds slowly. A bug in one new service doesn't break the rest of the system. The blast radius of each step is bounded.
Business value lands incrementally. Each migrated service can add capabilities the legacy didn't have. The migration produces value continuously, not just at the end.

The challenge is that the architecture is intentionally messy in the middle. For potentially years, you're running two systems in parallel, with the facade orchestrating between them. This is the trade — temporary mess in exchange for permanently avoiding the big-bang risk.

The Routing Facade

The facade is the load-bearing component of the entire pattern. It exposes the same interface clients used to talk to the legacy, but internally routes each request to either the legacy or the new service that handles it.

                       +-------------------------+
   [Clients] --------->|     ROUTING FACADE      |
                       |                         |
                       |  rules:                 |
                       |    /api/orders/*  -> new|
                       |    /api/users/*   -> new|
                       |    /api/billing/* -> legacy|
                       |    everything else -> legacy|
                       +-----+--------------+----+
                             |              |
                             v              v
                       +-----------+   +-----------+
                       | New Svcs  |   | Legacy    |
                       +-----------+   +-----------+

The facade's responsibilities:

Routing rules. Per-route configuration of which backend handles each request. As more capabilities are extracted, the rules accumulate.
Protocol abstraction. The legacy might speak SOAP; the new services speak gRPC or REST. The facade translates so clients don't need to.
Authentication and authorisation. A single security boundary, applied consistently regardless of which backend handles the request.
Observability. Every request flows through the facade; logging, tracing, and metrics all originate here.
Traffic shaping. Percentage-based routing for canary releases, A/B testing, gradual rollout of new services.

Implementation options range from "an API gateway with routing rules" to "a custom-built proxy with sophisticated logic." For most teams, an API gateway (Kong, Tyk, AWS API Gateway, Apigee) is the right tool. For teams already running the ESB + event-driven architecture, the ESB itself can act as the facade — it already handles routing, transformation, and security.

The critical discipline: the facade must be high-availability, low-latency, and operationally boring. It is in the path of every single client request. A flaky facade is worse than no migration at all.

Service-by-Service Extraction

The migration progresses by extracting one capability at a time from the legacy. The process for each extraction:

   Step 1: Identify a slice
      Pick a capability that's reasonably self-contained.
      Examples: order placement, user profile, payment processing.

   Step 2: Build the new service in parallel
      Implement the capability in the new architecture.
      Tests pass; observability is in place.

   Step 3: Dual-write phase
      Both legacy and new service handle requests.
      Compare outputs in production; reconcile drift.

   Step 4: Read-from-new phase
      Reads route to the new service.
      Writes still go to both. Confidence builds.

   Step 5: Cut over
      Writes route to the new service.
      Legacy stops handling this capability.

   Step 6: Decommission
      Legacy code for this capability is removed.
      Eventually, the legacy database table can be dropped too.

Each step is reversible. If something goes wrong in step 4, revert to step 3. If something goes wrong in step 5, revert to step 4. The facade configuration is what controls each transition, and the configuration is changeable in seconds.

The order of extraction matters more than people expect. Three guidelines:

Start with the most-changed capability. Whichever piece of the legacy the business keeps wanting to change is the highest-leverage candidate. Migrating it first means subsequent feature work goes into the modern architecture, not into adding more weight to the legacy.
Avoid the central data table early. Most monoliths have one or two tables (users, accounts, transactions) that everything depends on. Extracting these last is fine; trying to extract them first creates a data co-existence problem you don't yet have the maturity to handle.
Start with read-heavy capabilities if possible. Reads are easier to dual-run than writes. Building confidence in the new system on read paths makes the eventual write migration less risky.

The mid-migration topology looks deliberately uneven — some slices extracted, others not, the facade routing each request appropriately:

   +--------------------------------+
   |       Routing Facade           |
   +-+-----+-----+-----+-----+------+
     |     |     |     |     |
     v     v     v     v     v
   [New] [New] [Leg] [New] [Leg]    <-- some new, some legacy
    A     B     C     D     E
     \     \    /    /     /
      \     \  /    /     /
       v     v v   v     v
       (data co-existence — discussed next)

This is normal for the duration of the migration. The facade hides the messiness from clients.

The Data Co-Existence Problem

The hardest part of any strangler fig migration is data. The legacy has one database with all the data. The new services want their own databases, scoped to their own bounded contexts. During the migration, both systems need access to the same data, and they need to stay in sync.

Three patterns for handling this, each with different trade-offs:

   Pattern 1: Shared database (early migration)

      [New Service]     [Legacy Monolith]
            \                  /
             \                /
              \              /
               v            v
            +----------------+
            | shared database |
            +----------------+

   Both systems read and write the same database.
   Simple but couples the new service to legacy schema.
   Right starting point; wrong long-term home.


   Pattern 2: Database view / API (transitional)

      [New Service]                [Legacy Monolith]
            |                              |
            | reads/writes via legacy API   |
            +---------+--------------------+
                      v
            +----------------+
            | Legacy DB      |
            +----------------+

   New service treats legacy as a backing service.
   Reads through views or APIs; writes through legacy
   write paths. Reduces schema coupling.


   Pattern 3: Separate databases with sync (long-term)

      [New Service]                [Legacy Monolith]
            |                              |
            v                              v
      +-----------+                  +-----------+
      | New DB    | <--CDC/events--> | Legacy DB |
      +-----------+                  +-----------+

   Each system has its own database.
   Changes synchronised via events or CDC.
   Decoupled but operationally complex; final destination.

The migration through these patterns typically takes longer than the surface-level service extraction. Most teams underestimate how long they'll be living with Pattern 1 or 2 before they can credibly move to Pattern 3.

The synchronisation problem in Pattern 3 is exactly the problem the transactional outbox pattern solves. Each system writes to its own database and publishes events; the other system consumes those events to keep its own data current. The same CDC infrastructure that powers the outbox can power the legacy-to-new sync during migration.

For workflows that span both systems during the migration period, sagas become unavoidable — orchestrating writes across legacy and new with proper compensation is exactly the cross-system consistency problem sagas address.

Parallel Run and Cut-Over Patterns

The transition from "legacy handles this" to "new service handles this" is the moment of highest risk in each extraction. The pattern that contains the risk is the parallel run: both systems handle the same requests for a period; their outputs are compared; only when they agree consistently does traffic fully shift.

   Parallel run flow:

   1. Both legacy and new receive the request.
   2. Both produce a response.
   3. The facade returns the legacy response to the client.
   4. The facade logs both responses; a comparator runs offline.
   5. Discrepancies are investigated and resolved.
   6. When discrepancy rate < threshold for N days, cut over.

   +-----------+
   | Request   |
   +-----+-----+
         |
         v
   +-----------+
   |  Facade   | -- request --> [Legacy] -- response -->
   +-----------+ -- request --> [New]    -- response -->
         |
         |  legacy response returned to client
         |  new response logged for comparison
         v
   [Client receives legacy response]

   (Offline:)  comparator -> drift report

The parallel run is uncomfortable to operate — twice the load on the systems, complex comparison logic, real engineering work just to ship "no user-visible change yet." It's also the single pattern that prevents most cut-over disasters. Skipping it produces production incidents where the new service has a subtle behavioural difference nobody caught in test, and customer-facing damage before anyone notices.

The final cut-over, when it happens, is a routing rule change. The new service has been handling the load alongside the legacy for weeks. Confidence is high. The facade is updated to send traffic only to the new system; the legacy code path is left intact for a few more weeks as a fallback; eventually the legacy code is removed and its database tables can be dropped.

The fully-migrated final state:

   +-----------+
   | Clients   |
   +-----+-----+
         |
         v
   +---------------+
   | Facade        |  (now optional — could be retired or stay)
   +---+--+--+--+--+
       |  |  |  |
       v  v  v  v
   +---+--+--+--+----+
   |  New service    |
   |  architecture   |
   |  (ESB, events,  |
   |   sagas, CQRS,  |
   |   outbox, etc.) |
   +-----------------+

   [Legacy: retired]

The facade may continue to exist post-migration as the system's edge layer (it's well-positioned to keep handling auth, rate-limiting, and observability), or it may be retired in favour of a direct architecture. Either choice is reasonable.

What This Pattern Solves

The pros worth being explicit about. The strangler fig addresses several concrete problems that big-bang rewrites don't:

Risk is bounded per step. A single failed extraction affects one capability, not the whole system. The blast radius of failure stays small throughout the migration.
Business value ships continuously. Each migrated service can add capabilities the legacy didn't have. The migration produces value at every step, not just at the end.
The legacy keeps running. No flag day. No "feature freeze on the legacy while we rewrite." Normal feature development continues, and migration runs alongside it.
Each step is reversible. A bad extraction can be reverted by flipping a routing rule. No prolonged outage.
Learning compounds. The first extraction is hard. The fifth is faster. The team develops genuine expertise in the new architecture by the time they tackle the hard pieces, rather than gambling on it during the big-bang cut-over.
Stakeholder buy-in is easier. "We'll migrate over two years, capability by capability, with continuous demonstrable progress" is a much easier sell than "we'll rewrite for three years and then switch."

The compounding benefit: the strangler fig is the only migration pattern with a meaningful track record of actually completing. Big-bang rewrites have a famously high failure rate. Strangler-fig migrations succeed at a much higher rate, mostly because the structural properties of the pattern align with how complex systems actually behave under change.

When Not to Use It

Honest limits. Three cases where the strangler fig is the wrong choice:

The legacy is small. If the system is a couple of hundred lines of business logic and a few tables, a clean rewrite in a week is faster than setting up the facade and the parallel-run infrastructure. Don't strangler-fig a contact form.
The legacy is genuinely throwaway and replaceable in one sprint. If a small team can credibly replace the system end-to-end in a few weeks, just do that. The strangler fig's infrastructure overhead is unjustified at that scale.
The legacy is too entangled to extract anything. Some legacies are so deeply tangled — circular dependencies between every component, shared mutable state everywhere, no clean module boundaries — that no extractable slice exists. In these cases, the choice is either a heroic refactor of the legacy first (creating modules so the strangler fig can attach) or an honest acceptance that the system can't be migrated and a different strategy is needed. The Spring Boot modular monolith piece describes an intermediate stop on the migration path — for some legacy systems, moving to a modular monolith first creates the boundaries needed before a service-by-service extraction is even possible.

The cleanest indicator that strangler fig is the right pattern: you can name three to five capabilities you could extract this quarter without depending on a months-long refactor of unrelated code. If that list exists, the pattern fits. If it doesn't, you need to make it exist first.

Implementation Stack Choices

Common production stacks for strangler fig migrations:

API gateway as the facade. Kong, Tyk, AWS API Gateway, Apigee. The right starting point for most teams. Full-featured routing, auth, observability, traffic management.
Service mesh routing. If you're already running a service mesh, the mesh's routing can act as the facade for internal traffic. External traffic still typically benefits from an explicit gateway.
ESB as facade. If you're already running an ESB, it's well-positioned to act as the routing layer. Many enterprise migrations use the ESB this way during the transition.
CDC for data sync. Debezium / Kafka Connect for keeping legacy and new databases in sync during the transition.
Feature flags for gradual rollout. LaunchDarkly, Unleash, hand-rolled flag systems. The flags drive percentage-based routing in the facade.
Comparator infrastructure. Often hand-rolled — a process that reads both legacy and new logs and reports discrepancies. Critical for parallel-run phases.

The decision criteria: the routing layer should be production-grade because it's in the path of every request; the comparator infrastructure should be cheap because it's temporary; the CDC layer should be robust because it carries data integrity.

The Practitioner's Take

The strangler fig is the only migration pattern I trust to actually complete. The big-bang rewrite is seductive — clean architecture, no compromises, ship the dream — and almost always ends in either cancellation or a degraded compromise. The incremental migration is harder to romanticise but vastly more likely to succeed.

The right move is to commit to the pattern explicitly and structurally. Build the facade as production infrastructure, not as a stopgap. Plan the extraction order deliberately, starting with the highest-leverage slices and leaving the gnarliest shared data tables for late. Invest in the comparator and the data-sync layers because you'll be living with them for years, not months. Set realistic timelines — most strangler-fig migrations of meaningful enterprise systems take 18–36 months, not 6.

The teams that succeed with this pattern treat it as a long, deliberate, structural change to how the organisation builds software. The teams that fail with it treat it as a series of one-off project decisions and rediscover, slowly, that the migration has stalled because nobody is responsible for finishing it. The pattern works. The pattern requires commitment. Both are true.

The modern architecture in the corpus — the ESB + event-driven backbone, CQRS+ES, sagas, the outbox pattern, the right cross-cutting layer — is the destination. The strangler fig is the path. Most enterprises will need both.