Service Mesh vs ESB: Where Cross-Cutting Concerns Belong

Service mesh and Enterprise Service Bus solve overlapping concerns. Routing, observability, security, traffic management, retries, circuit breaking — both architectures address all of them. The interesting question is not which is better, because that's a meaningless comparison. The question is which layer of the stack the concerns belong in: the application layer (ESB) or the infrastructure layer (service mesh).

The answer matters more than the framing usually allows. Putting the concerns in the wrong layer doesn't just produce a suboptimal architecture — it produces an architecture that resists evolution, accumulates duplicated implementations, and creates ambiguity about who owns what when something breaks at 2am. Picking deliberately is one of the higher-leverage architectural decisions a team will make.

This is the comparison, framed around real production trade-offs rather than vendor positioning. For the underlying ESB framework, the ESB + event-driven piece is the reference. This piece is about where the comparison actually lands.

The Concerns Both Address

Both architectures address the same underlying set of cross-cutting concerns — the operational and architectural needs that every service-to-service call has, that don't belong in business logic:

Routing. Sending the call to the right instance, possibly based on version, load, locality, or canary configuration.
Observability. Capturing metrics, distributed traces, and structured logs for every call without each service doing the work.
Security. Mutual authentication between services, encryption in transit, per-call authorisation policies.
Resilience. Retries with backoff, timeouts, circuit breakers, bulkheads.
Traffic management. Rate limiting, throttling, traffic shaping, canary/blue-green deployments.
Protocol concerns. Negotiating protocols (HTTP/1.1, HTTP/2, gRPC), handling versioning, content negotiation.

The reason both architectures address these: every one is a concern that shows up in every service the moment you have more than a handful. Implementing them per-service produces inconsistency, duplicated code, and bugs that show up differently in different services. Both ESB and service mesh exist to factor these concerns out of individual services and into a shared layer.

The disagreement is about which shared layer.

The ESB Approach: Application-Layer Mediation

The ESB places these concerns in a central application-layer component. Every service-to-service call goes through the bus; the bus handles routing, transformation, security, observability, and traffic management on the way through.

   +-----------+     +-----------+     +-----------+
   | Service A |     | Service B |     | Service C |
   +-----+-----+     +-----+-----+     +-----+-----+
         |                 |                 |
         v                 v                 v
   +--------------------------------------------------+
   |                                                  |
   |          ENTERPRISE SERVICE BUS                  |
   |  routing | transformation | security | retries   |
   |  observability | rate-limiting | mediation       |
   |                                                  |
   +--------------------------------------------------+
         |                 |                 |
         v                 v                 v
   +-----+-----+     +-----+-----+     +-----+-----+
   | Service D |     | Service E |     | Service F |
   +-----------+     +-----------+     +-----------+

Services know the bus exists and route their calls through it. The bus is a first-class component of the application architecture — operated, deployed, and reasoned about as application infrastructure.

The shape of the layer matters. The ESB lives at the application layer — it speaks the application's protocols (HTTP, SOAP, JMS, AMQP), understands the application's data formats (XML, JSON, EDI), and applies transformations that depend on application semantics (route this customer to the legacy system, transform this date format, enforce this business-level authorisation rule).

This makes the ESB powerful for use cases that need application-aware logic, and clumsy for use cases where the concerns are purely transport-level.

The Service Mesh Approach: Sidecar-Per-Service

The service mesh places the same concerns in a sidecar proxy attached to every service instance. Calls between services don't go through a central component — they go through the mesh's data plane, which is a collection of sidecars co-located with each service.

   +-------------------+        +-------------------+
   | +---------------+ |        | +---------------+ |
   | |  Service A    | |        | |  Service B    | |
   | +-------+-------+ |        | +-------+-------+ |
   |         |         |        |         |         |
   | +-------v-------+ |        | +-------v-------+ |
   | |   Sidecar     | |<------>| |   Sidecar     | |
   | |   (Envoy)     | |        | |   (Envoy)     | |
   | +---------------+ |        | +---------------+ |
   +-------------------+        +-------------------+
            |                            |
            |                            |
            +-----------+----------------+
                        |
                        v
              +-------------------+
              |   Control Plane   |
              |    (Istio /       |
              |    Linkerd)       |
              |                   |
              |  policy, config,  |
              |  observability    |
              +-------------------+

Each service ships with a sidecar (typically an Envoy proxy). The sidecar intercepts all incoming and outgoing network traffic for the service. The sidecars together form the data plane — the layer that actually handles service-to-service traffic. A separate control plane (Istio's istiod, Linkerd's controller) configures the sidecars centrally: routing rules, security policies, observability targets.

The service code itself does nothing special. It makes a normal HTTP call to another service's address; the sidecar intercepts it, applies mTLS, captures metrics, enforces retries, and forwards it on. The application is unaware that any of this is happening.

The shape of the layer is fundamentally different. The mesh lives at the infrastructure layer — it speaks transport protocols (TCP, HTTP/2, gRPC) but doesn't know or care about application semantics. Transformations, business-logical routing, and partner-specific adaptation are not the mesh's job.

The Architectural Difference, Visualised

The architectures look superficially similar from the service's perspective — both abstract cross-cutting concerns into a shared layer. Side by side, the difference becomes clear:

   ESB (centralised application layer):

       [Service A]
            |
            v
       +---------+
       |   ESB   |  <-- application layer; one component
       +---------+
            |
            v
       [Service B]


   Service Mesh (distributed infrastructure layer):

       [Service A] -- [Sidecar A] ---->  [Sidecar B] -- [Service B]
                              \                /
                               \              /
                                v            v
                              +----------------+
                              | Control Plane |  <-- configures sidecars
                              +----------------+

The ESB is a hop — traffic flows through the central component. The service mesh is a wrapper — traffic flows directly between services via the sidecars, with the control plane standing aside and configuring behaviour rather than handling traffic.

This difference cascades:

Latency profile. ESB adds a network hop per call. Service mesh adds two local in-process hops (through sidecar A and sidecar B) but no extra network hop.
Failure domain. ESB is a centralised availability target — if it goes down, all service-to-service calls are affected. Service mesh has distributed failure modes — a single sidecar failing only affects its one service.
Capabilities. ESB can apply rich application-layer transformations. Service mesh can apply rich transport-layer policies but generally not application transformations.
Operational ownership. ESB sits in application-team territory. Service mesh sits in platform/SRE territory.

These differences are what make one or the other right for different use cases.

Where Service Mesh Wins

Latency-critical workloads

Adding a network hop through a central ESB is meaningful overhead for workloads with tight latency budgets. Sub-10ms call paths, high-frequency trading systems, real-time bidding, gaming backends — these can't afford the ESB hop. The service mesh's in-process sidecar adds microseconds rather than milliseconds. For these workloads, the mesh is the only viable choice.

Cloud-native and Kubernetes-shaped ecosystems

Service mesh deployment is native to Kubernetes. Sidecars are pods. The control plane is a Kubernetes operator. Configuration is custom resources. Observability flows into Prometheus and Jaeger by default. If your platform is already Kubernetes-shaped, adopting a mesh is largely a matter of installing the operator and configuring policies. Adopting an ESB in the same environment is meaningfully more operational lift.

Polyglot teams

The mesh is language-agnostic. It works at the transport layer, so a Go service, a Java service, and a Python service all participate identically. An ESB requires application-layer clients that often need language-specific support (libraries, connectors, transformation engines tied to specific runtimes). For polyglot teams, the mesh's protocol-level neutrality is a real win — applied to the patterns I've written about for Go gRPC microservices in production, it just works.

Transport-level concerns dominate

If your service-to-service communication is mostly clean RPC with consistent protocols (gRPC + Protobuf, REST + JSON), and the cross-cutting concerns you care about are routing, mTLS, observability, retries, and traffic splitting — concerns that live entirely at the transport layer — the mesh handles them naturally without ever needing application-layer logic.

Where ESB Wins

Complex protocol transformations

If your system has to translate between SOAP and REST, between EDI and JSON, between mainframe COBOL records and modern Protobuf messages, the mesh is the wrong layer. These are application-semantic transformations, and the ESB lives at the layer where they belong. Service meshes have no equivalent capability — they don't speak application protocols and shouldn't.

Partner integration with format heterogeneity

External partners almost always come with their own preferred protocols, schemas, and message shapes. Some send SOAP. Some send EDI. Some send partner-specific JSON dialects. The ESB's adapter layer absorbs this heterogeneity. The mesh has nothing analogous because partners don't speak mesh protocols. This is the use case where the ESB still wins decisively — and it's exactly why partner-integration architectures are organised around the ESB's adapter layer, not around a service mesh.

Regulated environments needing centralised audit

When compliance requires "every integration call between every service is logged, audited, and policy-enforced in one place," a centralised mediation layer is structurally easier to audit than a distributed sidecar deployment. Both architectures can produce the necessary audit trail, but the ESB's centralisation makes the regulatory conversation simpler.

Heavy business-logic routing

"Route customers opened before 2024 to the legacy service, customers in jurisdiction X to the regional system, premium-tier customers to the dedicated cluster" — these are application-semantic routing decisions. The mesh can route at the transport level; it cannot make decisions that depend on payload content or business attributes. The ESB can.

The Hybrid Architecture

A meaningful number of production systems run both, and it's often the right answer. The pattern:

    [External Partners]
         |
         v
   +-------------------+
   |   API Gateway     |  <-- external edge
   +---------+---------+
             |
             v
   +-------------------+
   |       ESB         |  <-- application-layer mediation for partner integration
   +---------+---------+   <-- and complex transformations
             |
             v
   +-------------------+
   | Service Mesh      |  <-- infrastructure-layer concerns
   | (control plane)   |        for internal service-to-service traffic
   +---------+---------+
             |
   +---------+---------+---------+
   |         |         |         |
   v         v         v         v
[Svc A]   [Svc B]   [Svc C]   [Svc D]
   |         |         |         |   <-- each has a mesh sidecar
[sidecar][sidecar][sidecar][sidecar]

The ESB handles partner integration and complex transformations at the application boundary. The service mesh handles routing, mTLS, observability, and resilience for internal service-to-service traffic. Each system owns the layer where its strengths apply.

This isn't a compromise — it's a deliberate division of responsibilities. The two architectures aren't redundant when each owns a distinct piece of the stack.

The risk to manage: clarity about which concerns live where. Without explicit guidelines ("partner protocols go through the ESB; internal mTLS and observability go through the mesh"), teams duplicate behaviour across both layers and lose the simplification both were supposed to provide.

Migration Considerations

If you're already on one and considering the other, the question is rarely "migrate everything across" — it's "which concerns can be moved without breaking the things that depend on them."

Common migration paths:

ESB to mesh. Pull out the routing, observability, and security concerns layer by layer into the mesh, leaving the ESB responsible for the application-layer transformations and partner integration that the mesh can't do. The ESB becomes smaller and more focused over time.
Mesh to ESB. Less common, but happens when the system needs to start integrating with partners using legacy protocols or when complex transformations start appearing. The mesh stays in place for internal traffic; the ESB is added at the edge for the new concerns.
Greenfield decision. For new systems, the decision is structural. Modern microservices in Kubernetes with internal-only communication: service mesh. Enterprise integration with partners, complex transformations, regulatory compliance: ESB. Both: the hybrid above.

The Spring Boot modular monolith piece is the right reference for systems where neither is justified — small enough teams or systems where the operational complexity of either architecture isn't repaid by the benefit.

What's Not Changed

The unchanging caveats:

Evals and operational discipline still matter. Both architectures shift complexity around; neither eliminates it. The team's ability to operate either system is what determines whether it earns its keep.
Cross-cutting concerns don't disappear with a tool choice. Picking the layer doesn't eliminate the need to think about routing, security, and observability. It just decides where to think about them.
Vendor lock-in is real in both directions. Mulesoft ESB and Istio mesh both produce real lock-in. Pick deliberately. The same boring-architecture principle applies — boring, well-understood architectures beat clever ones the team can't run.

The Practitioner's Take

The service-mesh-vs-ESB framing is misleading because it treats them as alternatives competing for the same problem. They're not. They're tools that address overlapping concerns at different layers of the stack, and the right answer for most systems with serious complexity is to use each one where its layer is the right one.

The decision criteria that matter: do you have substantial transport-level concerns between internal services? Use a mesh. Do you have substantial application-level integration and transformation needs at the edge? Use an ESB. Do you have both? Use both, with clear delineation about which layer owns which concern.

Almost every team I've watched make this decision well treated it as a placement question — "where in the stack should this concern live?" — rather than as a vendor question. The teams that treated it as a vendor decision ("we picked Istio over Mulesoft") ended up with the wrong tool applied to the wrong layer, paying for capabilities they didn't need while still implementing the capabilities they did need in the wrong place.

The right architectural question is which layer should own the concern. The vendor follows from the answer.