Composer 2.5 + MCP: The Production Integration Guide (2026)

A pattern from the production audits I have run over the last fortnight: teams switch to Composer 2.5, see the headline 10× cost saving in their per-token bill, and watch the same agent loops still take three or four attempts to land each task. The cost-per-completed-task barely moves. The expectation was that the cheaper model would compound; the reality is that retry tax eats most of the saving.

The fix is not a different model. It is the context the model receives. Composer 2.5 is trained to use tools — that is the whole point of its training stack — and it is trained for that on tool transcripts that look like Cursor's harness specifically. MCP is the production interface that lets you feed Composer the grounded inputs its training distribution expects, so the model does the work in one pass instead of three.

This is the practitioner's guide to wiring Composer 2.5 to MCP servers in production. It is not a survey of the protocol — there are plenty of those — it is the integration playbook for the specific model, written from the ergonomics of running it at scale.

Why MCP Changes Composer 2.5's Task Economics

The single most useful number to internalise about Composer 2.5 is that the per-task cost is dominated by retries, not by the per-token rate. On the workloads I have measured, a one-pass completion runs roughly $0.50–$0.80 on the standard tier. A three-pass completion — same task, model gets it wrong twice — runs $1.80–$2.40. The cost ratio between the two outcomes is 3–5×, which dwarfs any per-token saving the model offers over the frontier.

MCP affects this directly because the failure modes that cause retries are concentrated in a small set of categories that grounded context eliminates. Composer guesses an API field name → retry. Composer invents a column in your database → retry. Composer assumes a Linear ticket structure that does not match reality → retry. Each of these is a hallucination the model should not be making in the first place, and each is one that disappears when the model can read the real schema, the real ticket, the real error.

The right mental model is that MCP is not a feature, it is a retry-tax-reduction mechanism, and its ROI is measured in cost-per-completed-task rather than in capability gains on benchmarks. The same model gets meaningfully cheaper to run because it is wrong less often.

The same discipline I have written about in the cost-engineering playbook for Composer 2.5 at production scale applies here: the 10× headline turns into the real number only when the orchestration layer captures the savings. MCP is one of the cleanest mechanisms in that orchestration layer.

Why MCP Matters for Composer 2.5 Specifically

Composer 2.5 was trained on tool-heavy transcripts. The Cursor team has been explicit that its training distribution is weighted toward agentic loops that include file reads, terminal commands, test runs, and edits in sequence — the shape of work a Cursor user actually does. That training is the reason Composer 2.5 hits 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench despite being a smaller and cheaper model than the frontier alternatives.

What that training does not cover is the long tail of integrations that production teams actually depend on. The model does not know your Postgres schema. It does not know what your Linear ticket templates look like. It does not know which Sentry issue corresponds to the bug you are debugging right now. Without MCP, the agent guesses; the guesses cost retries; the retries cost more than the per-token bill saved by choosing Composer over Opus in the first place.

MCP closes that gap. When the model can call a tool to read the real schema, the real ticket, the real error stack, the guesses go away — and Composer 2.5's training distribution is exactly the right shape to make use of those tools, because the training transcripts that produced the model look very similar to a Composer + MCP loop in production.

Two things to internalise. First, Composer 2.5 gets disproportionate benefit from MCP compared to a more general-purpose model, because its training already expects this shape of input. Second, the configuration cost of MCP is the cheap part of the work — the cost is in choosing the right servers and disciplining the rules around them.

The Server Inventory

The MCP server ecosystem has gone from "a few proof-of-concept servers" to a real catalogue over the last six months. Below is the production-grade inventory I would install first, ordered by ROI on retry-tax reduction.

Postgres MCP. The single highest-ROI server for any team that runs a Postgres-backed product. It exposes the live schema, lets the agent run EXPLAIN-bounded read queries, and removes the entire category of "agent invents a column name" failures. Read-only mode is the right default; write access is something to gate behind explicit approval. Install this first.

GitHub MCP. Lets the agent read PRs, diffs, issues, and CI status without having to construct shell commands or scrape the web. Particularly useful for code-review agent loops where the model needs to see the context of a change before commenting on it. The official server is well-maintained and supports both Personal Access Token and GitHub App authentication.

Linear MCP. For teams that run their planning in Linear, this server is the cleanest way to feed the agent the actual issue text, acceptance criteria, and conversation history. Agents that read Linear before writing code make demonstrably fewer "did the wrong thing because the spec was ambiguous" errors. The mid-article point in my Composer 2.5 builder's guide on ambiguous-spec failures applies directly here — Linear MCP is the most direct way to ground the spec.

Sentry MCP. Sentry's own MCP server, maintained by the team, exposes issues, error stacks, and Seer analysis to the agent. The killer use-case is "given this production error, find the cause and propose a fix" — a workload that Composer 2.5 can run end-to-end when it has access to both the real stack trace and the codebase. Without Sentry MCP, the agent guesses what the error meant; with it, the agent reads what the error actually was.

Apidog MCP. For teams maintaining REST or GraphQL APIs, the Apidog MCP server pipes the live API specification to the agent. Schema-grounded codegen — types, request shapes, integration tests — runs on the real spec instead of the model's guess at the spec. This is the single highest-ROI server for API-heavy work; the before/after on token spend is documented in the next section.

Brave Search MCP (or equivalent web-search server). Lets the agent run real searches when it needs context the codebase does not contain — current library versions, error-message lookups, security advisories. Lower ROI than the in-codebase servers but still useful for the long tail of "I need to look something up" steps.

If you install nothing else from this list, install Postgres MCP and either Linear or Sentry depending on which one your team lives in. Those three servers cover the bulk of the retry-tax-reduction surface; everything else is incremental.

The `.cursor/rules` + `.cursor/mcp.json` Pattern

The configuration pattern that has held up across the audits I have run separates two concerns cleanly: rules govern what the agent should do, MCP servers govern what the agent can see. Both live in the repository, both go through code review, and both should be treated as infrastructure rather than as developer preferences.

.cursor/mcp.json lives at the project root and configures the MCP servers the agent has access to. The basic shape is well documented; the production shape looks like this:

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": {
        "POSTGRES_URL": "${POSTGRES_READONLY_URL}"
      }
    },
    "linear": {
      "command": "npx",
      "args": ["-y", "@linear/mcp-server"],
      "env": {
        "LINEAR_API_KEY": "${LINEAR_API_KEY}"
      }
    },
    "sentry": {
      "url": "https://mcp.sentry.dev/v1",
      "transport": "streamable-http",
      "auth": { "type": "oauth" }
    }
  }
}

Three things to call out about that snippet. First, every environment variable is interpolated from ${VAR_NAME} — never inline the secret. Second, the Sentry server uses Streamable HTTP transport, not stdio; for remote environments and team-shared servers, stdio is fragile and HTTP is the right transport in production. Third, the Postgres URL points to a read-only role — the default should always be read-only, with write access added explicitly to a separate server entry if and only if the workflow requires it.

.cursor/rules/*.md files live alongside the MCP config and govern agent behaviour. The pattern that works is one rules file per concern: data-access.md covers what to do when querying the database, pr-conventions.md covers how to format PR descriptions, error-handling.md covers how to read Sentry issues. The agent reads all rules files on every loop; keeping each one short and concern-focused keeps the system prompt manageable.

The cardinal discipline: treat .cursor/mcp.json changes in Git like infrastructure changes. The same review rigor you apply to a Terraform file or a GitHub Actions workflow applies here. A misconfigured MCP server is a production incident; the review process should treat it as one.

Schema-Grounded API Codegen: A Before-and-After

The cleanest worked example of what MCP does to Composer 2.5's task economics is API codegen. Take a realistic task: generate a typed TypeScript client and integration test suite for the /orders/{id}/refunds endpoint.

Without MCP. Composer reads your codebase, sees other endpoint clients, infers the likely shape of the refunds endpoint, and writes the client. The inference is mostly right but gets the field names wrong on three properties — refundReason instead of reason_code, processedAt instead of processed_at_unix, and a missing optional field that the spec actually requires. The tests run, three of them fail, the agent fixes the names, the suite passes on the third attempt. Total: roughly 38K tokens input, 14K output, $2.40 in raw cost on the standard tier across three loops.

With Apidog MCP. Composer reads the actual API spec from the MCP server, generates the client against the real field names, writes the tests against the real response shape, and the suite passes on the first attempt. Total: roughly 22K tokens input (the spec replaces a lot of codebase-context guessing), 8K output, $0.85 in raw cost on a single loop. The cost drops by roughly 65%, and the wall-clock drops by more than that because the retries are gone entirely.

The dynamic is the same across every server in the inventory. The agent that reads the real input does the work right the first time. The agent that guesses pays the retry tax.

The corollary — and this is the bit teams get wrong — is that the saving compounds with task complexity. On a one-line change, the retry tax is small in absolute terms even if it is large in proportion. On a multi-file refactor with five interconnected dependencies, the retry tax becomes enormous, and MCP becomes correspondingly more valuable. The right places to install MCP first are the places where the agent runs the longest, most expensive loops — not the places where the agent does small isolated edits.

Multi-Tool Agent Run Discipline

Composer 2.5 + MCP is genuinely good at long agent runs. The textual-feedback RL training I covered in the credit-assignment piece means the model can stay coherent across more sequential tool calls than its predecessor. That coherence is a capability; without discipline, it is also a liability.

The discipline that has held up:

Cap the tool-call budget per loop. A loop that runs more than 30 sequential tool calls is almost always either doing something wrong or doing something you should have decomposed into smaller tasks. Set a budget; surface the breach as an error rather than letting the loop continue indefinitely.
Cap the active server count. Cursor has a documented ceiling of roughly 40 active tools across all MCP servers combined. Once you cross it, the agent starts losing access to tools silently. The right number is well below that — pick the servers that earn their slot, drop the rest.
Retry policy that does not burn tokens. When a tool call fails (network error, schema-validation failure, downstream timeout), the default should be one structured retry with the error message in context, not an open-ended "try again." The model is better at fixing the failure when it sees what went wrong; it is worse at it when the harness papers over the error.
Per-server timeout. A slow MCP server poisons the whole loop. The 5–10 second timeout that feels fine in development becomes a 30-second user wait in production once you compound a few of them. Configure tight timeouts per server and have the agent fall back to "without this tool" rather than block.

Since January 2026, Cursor has shipped dynamic context management that drops MCP-related token usage by roughly 47% when multiple servers are active. That is real. It is also not a substitute for the disciplines above — it makes a well-configured stack cheaper; it does not rescue a poorly configured one.

Observability and Cost Attribution

The Composer 2.5 + MCP stack is one of the easier ones to make legibly observable, because every tool call is a structured event and every server has a name. The minimum viable instrumentation:

Log every tool call with name, server, latency, token cost, and success / failure. A simple JSONL stream is sufficient; you do not need a vendor tracer to start. The point is to have the data when you need to ask "which server is eating my budget."
Aggregate to per-task cost. A task is a coherent unit of work — a single user request, a single CI run, a single scheduled job. Sum the token cost and the tool-call cost into one per-task number. That number is what you optimise.
Surface the unit-economics in a dashboard the business reads. This was the line in the cost-engineering piece and it applies again — the team that ships into production at the unit-economics level wins; the team that ships into production at the per-token level wastes the savings.
Tie cost events into your product analytics. If you are running custom GA4 events or a similar product-analytics stack, push the per-task cost number into the same stream. The business side wants to see "the cost of doing X" alongside "the volume of X being done"; the same dashboard answers both.

When MCP Makes Things Worse

The honest counter-case. MCP is not a free win; there are configurations where it makes the system slower, more brittle, or more expensive.

Too many servers. Past roughly 6–8 active MCP servers, the agent spends a measurable share of its planning budget on tool selection — "which tool do I want for this step" becomes its own cost. The 40-tool ceiling is an outer limit; the practical ceiling is much lower. Fewer, better-chosen servers beat more servers almost every time.

Slow servers. One slow MCP server, especially one that the agent calls in the inner loop, poisons the whole task latency. The agent's wall-clock per task ends up dominated by the slowest tool. Profile your servers and either fix the slow ones or drop them.

Server-as-API trap. Not everything that has an API is a good MCP server. The mental shortcut "we have an internal API, let's wrap it as MCP and let the agent call it" produces servers that the agent uses badly, because the API was designed for human / programmatic consumption and not for the shape of context the model actually needs. A good MCP server presents the data the way an agent reads it, with concise descriptions, well-scoped tools, and predictable shapes. A wrapped REST API rarely does.

Production server reliability. When the agent depends on a remote MCP server and that server goes down, every task that touches the server fails. Design the agent loop to degrade gracefully — "this tool is unavailable, do the best you can without it" is a better failure mode than the loop halting.

Routing Recommendation

The framework I have settled on after running the stack for the last several weeks:

Install Postgres MCP first. It is the highest-ROI server for almost every team that runs a Postgres-backed product. Read-only mode by default; production-grade timeouts; treat the configuration as infrastructure.
Install one of Linear or Sentry depending on where your team lives. Either ticket-grounding or error-grounding will reduce retry tax materially; both is incrementally better but install one before agonising over the second.
Install Apidog or your spec server only if your work is API-heavy. The before/after on token spend is real, but only realised on API-shaped tasks.
Cap the active server count at 6–8 and prune ruthlessly. Past that ceiling, the agent's tool-selection cost outweighs the marginal benefit. Better fewer, better-chosen servers.
Treat .cursor/mcp.json as infrastructure. Code review, secrets in environment variables, transport choice deliberate, timeouts configured. The configuration is part of your production system.

The interesting engineering work here is not picking MCP servers — it is the discipline around which ones earn their slot, how the rules layer governs their use, and how the observability surfaces the unit economics. Most teams pick a server list and ship; the teams that capture the saving are the ones that prune.

What's Not Changed

The unchanging caveats:

MCP does not make the model smarter. It makes the model better-informed. The capability ceiling is still Composer 2.5's training; MCP just stops the model wasting that capability on guesses.
Schema validation still matters. The agent can read the real schema through MCP and still misinterpret it. Output validation at the application boundary is non-negotiable; do not skip it just because the input is now grounded.
MCP servers are attack surface. A misconfigured Postgres server with write access is a production-incident-in-waiting. The same security discipline you apply to any privileged service applies here, more loudly than usual.
The protocol is moving. MCP is still settling — transport options, auth patterns, server capabilities are all evolving. Plan for the configuration to drift over the next few quarters and treat the integration as a maintained system rather than a one-time setup.

The Practitioner's Take

The honest read on Composer 2.5 + MCP is that the two are co-designed in spirit even though they were built by different teams. Composer's training distribution expects tool-heavy loops; MCP is the cleanest production interface for feeding those loops. Teams that wire them together capture the cost saving the per-token rate card promises. Teams that ship Composer 2.5 without MCP discover at month-end that the per-token bill dropped by 80% but the per-task cost barely moved, because the retry tax ate the saving.

The right investment over the next quarter is not "switch to Composer 2.5 and check the bill." It is "wire Composer 2.5 to the three MCP servers that ground the agent in your actual systems, treat the configuration as infrastructure, and watch the cost-per-completed-task come down because the model stops guessing."

The interesting engineering work is the routing layer that picks tools for the model, the rules layer that disciplines their use, and the observability that surfaces the unit economics. The benchmarks tell you Composer is capable. MCP is what lets that capability survive contact with production.