CQRS + Event Sourcing: A Production Architecture Guide

The companion problem to integration sprawl — which the ESB + event-driven architecture piece addresses — is state management at scale. CQRS (Command Query Responsibility Segregation) and event sourcing are the two patterns most teams reach for when their traditional CRUD-shaped data layer starts collapsing under conflicting demands from reads and writes, audit and replay, history and current state.

Used together, the two patterns combine into one of the most powerful — and operationally expensive — production architectures available. They give you complete history for free, replayable state, naturally separated scaling axes for reads and writes, and an audit trail that compliance teams will quietly thank you for. They also impose real complexity that punishes teams that adopt them without the operational muscle to run them.

This piece is the practitioner's guide to building the combined CQRS + event sourcing architecture. What each pattern does on its own, why they pair so naturally, how to wire one up, what it solves at scale, and the cases where it's the wrong choice.

The Problem: Reads and Writes Pull in Opposite Directions

In a traditional CRUD architecture, the same data model serves both reads and writes. One database. One schema. One set of indexes. The same table that handles the write path also handles the read queries.

                +---------------------+
   [Writes]---->|   Single Database   |<----[Reads]
                |   (CRUD model)      |
                +---------------------+
                         |
                         v
              one schema, one set of indexes,
              one optimisation target

This works fine at small scale. At larger scale, it stops working — because reads and writes want fundamentally different things from the data layer.

Writes want consistency and normalised structure. Strict invariants, foreign keys, atomic transactions, normalised tables that make updates safe and predictable.
Reads want denormalised, query-shaped data. Pre-joined views, materialised projections, indexes optimised for the specific queries the UI runs.

The traditional response — add indexes, add read replicas, add materialised views — papers over the conflict without resolving it. Indexes slow down writes. Read replicas have lag. Materialised views need to be refreshed, and the refresh path is a constant source of staleness bugs. The deeper you go, the more the single-model architecture starts to feel like it's fighting itself.

The architecture that resolves the conflict by acknowledging it: split the read path and the write path into entirely separate models, each optimised for its own job. That's CQRS.

CQRS: Splitting the Two Halves

The core idea of CQRS is to maintain two distinct models — one for commands (the write path) and one for queries (the read path). Each is optimised independently. They communicate via events.

                                                 +-----------------+
                            +------writes------> |  Command Model  |
                            |                    |  (normalised,   |
                            |                    |   consistent,   |
                            |                    |   strict invariants) |
                            |                    +--------+--------+
                            |                             |
                            |                             | events
                            |                             v
   [Application]            |                    +-----------------+
                            |                    | Query Models    |
                            |                    | (denormalised,  |
                            |                    |  pre-joined,    |
                            |                    |  query-shaped)  |
                            |                    +--------+--------+
                            |                             |
                            +<-----reads------------------+

The benefits of the split are immediate and concrete:

Each model has one job. The command model can be aggressively normalised and consistency-first without paying for read performance. The query models can be aggressively denormalised and query-shaped without paying for write complexity.
Reads and writes scale independently. Heavy read traffic doesn't pressure the write path. Heavy write traffic doesn't pressure the read indexes.
You can have many query models. Different consumers — the customer-facing UI, an internal dashboard, an analytics pipeline, a partner-facing API — each get their own read model tuned for their specific access patterns.
Schema evolution is decoupled. Adding a new query view doesn't require touching the command model. Adding new write invariants doesn't ripple through every read path.

The trade is operational complexity. You now have two systems that need to stay coherent. Events flowing from the command side to the query side need to be reliable. Read models lag behind writes by some amount, and your UI has to deal with that. The eventual-consistency window between "write committed" and "read updated" is a real thing engineers and users now have to reason about.

The architectural follow-up question is: what's the source of truth on the command side? Is it the current-state row in a database, or is it the sequence of events that produced that row? The answer that pairs naturally with CQRS — and that compounds its benefits — is event sourcing.

Event Sourcing: The Event Store as Source of Truth

Event sourcing inverts the standard relationship between events and state. Instead of storing the current state and emitting events as a side-effect, you store the events as the durable record and derive current state from them.

                              +-------+-------+-------+-------+
                              | event | event | event | event |  ...
                              +-------+-------+-------+-------+
                                              |
                                              | replay
                                              v
                              +-------------------------------+
                              |    Current state derived      |
                              |    from event sequence        |
                              +-------------------------------+

A typical sequence for an Account aggregate might look like:

   Event 1: AccountOpened       { id: 42, owner: "alice", balance: 0 }
   Event 2: MoneyDeposited      { id: 42, amount: 100 }
   Event 3: MoneyDeposited      { id: 42, amount: 50 }
   Event 4: MoneyWithdrawn      { id: 42, amount: 30 }
   Event 5: AccountSuspended    { id: 42, reason: "fraud-check" }

The current state of account 42 — balance 120, suspended — is derived by replaying these five events through an aggregator. The events themselves are the durable, immutable, append-only record. Nothing ever updates an event. Nothing ever deletes one. The event store grows monotonically.

Three properties of this design are transformative:

Complete history for free. Every state change is preserved. You can answer "what was the balance of this account on January 15th" by replaying events up to that date. Audit, compliance, debugging, and analytics all benefit.
State is reproducible. A bug in the projection that builds current state doesn't lose data — fix the projection, replay the events, rebuild the state. The events themselves are safe.
New views are cheap to build. Want a new read model that didn't exist before? Subscribe to the event stream, replay history, build the projection. No back-fill scripts, no data migrations.

The trade is that querying current state directly is now expensive — you can't just SELECT balance FROM accounts WHERE id = 42 because the events are the truth and balance is derived. The natural fix: combine event sourcing with CQRS, where the query side maintains pre-computed projections.

The Combined CQRS + Event Sourcing Architecture

The two patterns reinforce each other almost perfectly. Event sourcing on the command side gives you a durable, immutable, replayable record of every state change. CQRS gives you optimised read models that consume those events and pre-compute the views the UI needs.

                          +-------------------+
        Commands -------> | Command Handler   |
                          | (validates,       |
                          |  applies          |
                          |  business rules)  |
                          +---------+---------+
                                    |
                                    | appends events
                                    v
                          +---------------------+
                          |    EVENT STORE      |   <--- source of truth
                          |  (append-only log)  |
                          +---------+-----------+
                                    |
                              event stream
                                    |
                    +---------------+---------------+
                    |               |               |
                    v               v               v
              +---------+     +---------+     +---------+
              | Query   |     | Query   |     | Query   |
              | Model A |     | Model B |     | Model C |
              | (UI)    |     | (Admin) |     | (Analy) |
              +----+----+     +----+----+     +----+----+
                    |               |               |
                    v               v               v
              [Reads:UI]      [Reads:Admin]   [Reads:Analytics]

The flow:

The application sends a command — DepositMoney { accountId: 42, amount: 100 }.
The command handler validates the request, applies business rules, and on success appends one or more events to the event store: MoneyDeposited { accountId: 42, amount: 100, timestamp: ... }.
The event store durably persists the event in the append-only log.
Subscribers — the query models — consume the event stream and update their respective projections.
Reads hit the query models directly, getting pre-computed, query-shaped data with no derivation cost at read time.

The architectural elegance comes from each component doing exactly one thing well. The command side enforces consistency and invariants. The event store is durable and immutable. The query models are optimised for reads. Nothing is fighting itself.

In practice this slots cleanly into an event-driven architecture — the event store is the event bus for most teams, or the event store publishes onto a Kafka-shaped event bus that downstream consumers subscribe to. The pattern composes well with the broader ESB + event-driven architecture because both are organised around events as the durable communication primitive.

Projections and Read Models

The query models are usually called projections in event-sourced systems. Each projection is a small service that subscribes to the event stream, maintains its own optimised storage, and exposes queries against that storage.

                      +--------------------+
   [Event stream] --->|  Projection        |
                      |  - subscribes      |
                      |  - applies event   |
                      |  - updates store   |
                      +---------+----------+
                                |
                                v
                      +--------------------+
                      |  Optimised store   |
                      |  (postgres, redis, |
                      |   elastic, etc.)   |
                      +---------+----------+
                                |
                                v
                          [Read queries]

A few practical patterns worth knowing:

Projections are independently buildable and rebuildable. If you want a new view of the data, write a new projection, point it at the start of the event stream, let it catch up. No coordination with the command side needed.
Projections can use any storage technology. SQL for relational queries, Elasticsearch for full-text search, Redis for hot-path lookups, a graph database for relationship queries. Each projection picks the right tool for its specific job.
Projection failures are recoverable. If a projection has a bug that corrupts its store, you drop the store, fix the bug, and rebuild from the event stream. The truth never lived in the projection; it lived in the event store.

For teams used to "the database is the source of truth and you must never lose it," the mental shift is significant. In an event-sourced system the projections are cache — useful, performant, query-optimised cache — but cache nonetheless. The actual truth is the immutable event log.

Snapshots: The Performance Optimisation

The one operational concern that comes up at scale is replay cost. If an aggregate has a million events in its history, deriving current state by replaying all million on every command is unworkable.

The standard fix is snapshots — periodic checkpoints of an aggregate's state, taken every N events.

   events 1..1000  ->  [Snapshot at event 1000]
   events 1001..2000 -> [Snapshot at event 2000]
   events 2001..2030 -> [current; need to replay only these 30]

When the command side needs to load aggregate state, it loads the most recent snapshot and replays only the events after it. The amount of replay work is bounded by the snapshot interval, not by total history length.

Snapshots are an optimisation, not a source of truth. The events themselves remain authoritative. If a snapshot is corrupted or a serialisation format changes, you rebuild snapshots from the event stream the same way you rebuild projections.

For most aggregates, snapshots are unnecessary until the per-aggregate event count crosses a few thousand. Don't optimise prematurely — most event-sourced systems run for years without needing them.

What This Architecture Solves

The pros worth being explicit about. After running this pattern across several engagements, the concrete problems it addresses:

Audit and compliance. Every state change is preserved, with timestamps, actor, and full context. Regulatory questions about "who changed what when" have a clean answer that requires no separate audit infrastructure.
Temporal queries. "What was the state of this account on January 15th?" becomes a replay query, not an impossible question. Historical reconstruction is a first-class capability.
Bug recovery. A bug in projection logic that corrupts the read store is recoverable without data loss — fix the bug, rebuild from the event log. The bug doesn't lose truth; it loses derived state.
New views on demand. A new product feature that needs a new query shape is a new projection, not a database migration. The cost of adding views is configuration, not data engineering.
Independent scaling. The write path and each read path can scale on its own profile. Heavy reads don't pressure writes; heavy writes don't pressure read indexes.
Natural pairing with event-driven architecture. The events that drive projections are the same events the broader system can consume. Sagas, integrations, analytics pipelines, and downstream services all hang off the same event stream.
The system tells you what happened. Debugging "how did we get into this state" stops being archaeology — the event log is the answer.

The compounding benefit: every one of these is built into the architecture, not a feature you have to add. Compliance, audit, debugging, historical analysis, and new view development all become easier because the underlying data model is shaped right.

When Not to Use It

Honest limits. Three cases where this architecture is the wrong choice:

Simple CRUD applications. If your domain is straightforward record-keeping with no meaningful business logic on the write side and no audit requirements, a standard database with read replicas is far simpler and the right answer. Don't event-source a contact form.
Strong real-time read-after-write requirements. The eventual-consistency window between command commit and projection update is typically tens of milliseconds, but it's not zero. If your product genuinely requires "the user immediately sees the result of their own write with zero lag," you'll need workarounds (read-your-own-writes patterns, in-memory caching of recent commands) that add complexity.
Teams without the operational muscle. Event-sourced systems require real care: schema evolution on events, snapshot strategy, projection rebuild infrastructure, observability into projection lag, replay testing. Teams that adopt this pattern without dedicating real engineering attention to the operational layer end up worse off than they would have been with a simpler architecture. The boring-architecture-beats-clever-architecture principle applies as strongly here as anywhere — adopt this only when the benefit justifies the operational cost.

For most systems sitting on the boundary, the right move is to start with a regular database, identify the bounded contexts where audit/history/temporal queries actually matter, and migrate just those subsystems to CQRS+ES. Most production CQRS+ES systems I've seen are hybrid — event-sourced where the business benefit is real, traditional CRUD where it isn't.

Where the broader migration question lives — moving from a legacy CRUD monolith to this kind of modern architecture incrementally — is its own topic. The strangler fig pattern is the playbook for that work.

Implementation Stack Choices

Common production stacks for CQRS + event sourcing:

EventStoreDB. Purpose-built event store with native subscriptions, projections, and snapshots. The default choice when starting fresh and you want a system designed for this pattern from the ground up.
Axon Framework. JVM-focused framework with first-class CQRS+ES support, sagas, distributed command bus, and tight Spring integration. The right answer for Java/Kotlin teams already in the Spring ecosystem.
Kafka as event store. Many teams use Kafka topics as the durable event log. Works well when you're already deeply invested in Kafka and want one technology for both the event store and the broader event-driven backbone. Schema registry becomes essential.
PostgreSQL with an events table. The simplest implementation. An append-only events table with (stream_id, sequence, event_type, payload, timestamp), a LISTEN/NOTIFY channel for subscriptions, and aggregate-level transactions for invariant enforcement. Works at surprisingly large scale before it stops working. Pairs naturally with PostgreSQL as the broader data layer.
Marten (.NET). For .NET teams, Marten on Postgres is a strong choice — a thin event-sourcing library layered over Postgres.

For most teams, "Postgres with an events table" is the right starting point — it requires no new infrastructure, has predictable operational properties, and works at the scale most teams actually have. Migrate to a purpose-built event store only when the volume genuinely justifies it.

For the projection side, pick the storage technology that matches each projection's access pattern: relational for joins, search engine for full-text, key-value for hot lookups, document store for nested structures.

The Practitioner's Take

CQRS + event sourcing is one of the highest-leverage architectures available for systems where audit, temporal queries, and complete history are real business requirements. The cost is operational complexity — schema evolution, projection rebuilds, snapshot strategy, eventual-consistency reasoning. The benefit is that everything about the system becomes more observable, more recoverable, and more flexible at the same time.

The pattern that works is: adopt CQRS+ES in the bounded contexts where the business value is clearest (audit-heavy, history-sensitive, multi-view-requiring), keep the rest of the system simple, and grow the event-sourced footprint deliberately rather than greenfield-everything. The teams I've watched succeed with this architecture treat it as a tool in their kit, not a religion. They use it where it pays back and use simpler patterns where it doesn't.

The teams I've watched fail with it adopted it as a universal solution and discovered, six months in, that they'd taken on operational complexity without commensurate business benefit. The architecture rewards selective adoption and punishes maximalist adoption. The right move is to internalise that distinction before you commit, not after.