Composer 2.5 in Cursor 3: Parallel Agent Workflows at a Tenth of the Cost

Composer 2.5 and Cursor 3 shipped within twenty-four hours of each other last week, and the combination is doing more for my daily output than either one would alone. Composer 2.5 collapses the per-task cost of coding agents to under a dollar; Cursor 3's Agents Window collapses the friction of running many agents in parallel. Together they make a specific workflow — four-track parallel work with a model mix — economically reasonable for the first time.

This is the practical guide to the combination. For the underlying release detail, Composer 2.5 in practice and the Cursor 3 agents-window guide are the references. This piece is about the workflow patterns that the two unlock together.

Why Composer 2.5 + Cursor 3 Is the Combination

The intersection that matters: the IDE shifted from "one agent per chat, file-tree at the centre" to "many agents in parallel, agents at the centre" — and the model shifted from "frontier-class but expensive" to "frontier-adjacent and roughly ten times cheaper." Either change alone would be a meaningful productivity bump. Together they change which workflows are economically rational.

Pre-Cursor-3, running four agents in parallel on the same task across different models was technically possible but operationally painful — multiple terminals, multiple sessions, manual reconciliation. Pre-Composer-2.5, running four agents in parallel was economically irrational at any meaningful volume — the per-task cost on the frontier models meant four parallel runs cost four times the price of one. Both barriers are gone now.

The four-track workflow is the result. One agent on the feature you're actually shipping. One on the bug fix your teammate flagged. One on test coverage for a recently-touched module. One investigating something exploratory — refactor possibility, library upgrade, dependency audit. All running in parallel, mostly on Composer 2.5, with frontier models escalated for the specific steps where they matter.

The /best-of-n Pattern, Recalibrated

/best-of-n is Cursor 3's most interesting command — runs the same task in parallel across multiple models, each in its own isolated worktree, lets you compare outcomes side-by-side. Pre-Composer-2.5, the cost of running three frontier models in parallel was rarely justifiable. With Composer 2.5 in the mix, the math changes.

When to include Composer 2.5 in the mix

For most coding tasks, the right /best-of-n mix is now Composer 2.5 (standard tier) plus one or two frontier models. The Composer 2.5 slot costs roughly ten percent of what the frontier slots cost, so adding it doesn't materially change the total — but it adds a fourth perspective on the task, and on the bulk of coding work I've watched it produce the winning solution often enough to be worth the slot.

A typical mix:

/best-of-n "Extract the billing logic into a service.
Preserve the public API."

Models: composer-2-5 (standard), claude-opus-4-7, gpt-5-5

Three approaches in parallel. Pick the cleanest. Discard the other two worktrees. Total cost roughly half what running just Opus and GPT would have cost a week ago.

When to make Composer 2.5 the default

For routine work — bug fixes inside a known subsystem, refactors of a single module, test additions to a recently-touched file — running /best-of-n with three Composer 2.5 instances (different temperature, slight prompt variation) is now the economically rational default. Three diverse attempts at $0.50 each is a dollar fifty for a meaningfully better outcome than a single $0.50 run. The total still beats what one frontier-model run would have cost.

This is the pattern that didn't exist before. "Run the same task three times and pick the winner" was viable in research but never in production. Composer 2.5's pricing changes that.

Mixing tiers across worktrees

A specific recipe I've started using on tricky tasks: run /best-of-n with one Composer 2.5 (standard, cheap), one Composer 2.5 (fast tier, faster wall-clock), and one frontier model. The first two cost ~$0.50 and ~$3 respectively; the frontier slot costs ~$7. Total around $10. What I get back is fast-to-first-output (the Composer 2.5 fast tier), thorough-but-slow (the Composer 2.5 standard tier), and top-of-distribution (the frontier slot). I can start reviewing as soon as the fast slot completes, dig into the others as they finish.

This is roughly the HTML-as-plan workflow I've written about, extended into the parallel-agents domain. The plan is shared; the implementations diverge; the review happens against the diff between them.

Parallel Agents at Scale: the Four-Track Workflow

The pattern that has become my default since both products shipped:

Track 1: the feature I'm shipping. Composer 2.5 standard tier in a worktree. The main work. I check in on it every fifteen minutes; the rest of the time it's running.
Track 2: a bug fix. Composer 2.5 standard tier in a different worktree. Lower priority, fire-and-forget — when it comes back with a PR, I review and either merge or send it back for revision.
Track 3: test coverage on a recently-touched module. Composer 2.5 standard, even lower priority. I look at the output once a day.
Track 4: exploratory work. Refactor possibility, library upgrade evaluation, dependency audit. Composer 2.5 standard, frontier escalation for hard sub-steps. I look at this when I have spare attention.

All four worktrees are git-isolated, so they can't interfere with each other. All four agents run on Composer 2.5 standard by default, with escalation to a frontier model for specific sub-steps that need it. Daily inference spend across the four tracks lands around $5–$10 — which would have been $50–$100 a month ago.

The reason this is more than just a cost optimisation: the four-track pattern represents a structural shift in how much speculative work I'm running at once. Tracks 2, 3, and 4 are things I would never have done before because they weren't worth the inference bill. Now they are. The compounding effect — bugs fixed faster, test coverage rising without explicit effort, exploratory refactors landing as PRs without me writing them — is meaningful within a month.

This is roughly the Level 4 / Level 5 pattern from my five-levels-of-Claude framework, with the parallel-agents bottleneck eliminated. Cost was the constraint; cost is no longer the constraint.

Pairing With the HTML Plan Format

The single highest-leverage pairing in this combination is using the HTML-plan format I've written about as input to the parallel agents. The pattern:

Generate an HTML plan for the task — mockups, file system layout, code excerpts, decision rules. Costs a few cents.
Pass the same HTML plan to all four parallel agents (or all three /best-of-n slots). Each agent reads the same artifact and works from the same shared understanding.
The diffs between the agents' implementations are now comparable. They're solving the same problem under the same constraints with the same context. Where they diverge is informative.

This was technically possible before. It was not economically reasonable. One frontier-model run with a long HTML plan as context cost real money; four such runs in parallel cost four times more. Composer 2.5 makes the experiment trivially cheap.

What to Watch

Two specific risks worth knowing about before the workflow becomes muscle memory:

Per-second billing on long-running agents

Cloud agents in Cursor 3 are billed per second of runtime. Composer 2.5's per-token cost is low; the per-second cost on long agent loops is not. A four-track workflow where each track runs for hours per day adds up faster than the token math suggests. Watch the daily spend for the first week and recalibrate which tracks belong in the cloud versus local.

Harness lock-in

Composer 2.5 is only available inside Cursor's harness. The four-track workflow described above is Cursor-shaped — it depends on Cursor 3's Agents Window, on Cursor's worktree management, and on Composer 2.5 being available where the IDE expects it. If your team's medium-term roadmap involves multi-IDE workflows or provider-neutral coding agents, this workflow doesn't carry across cleanly. Use it knowing that's the trade.

What's Not Changed

The unchanging caveats:

The model still hallucinates. Running four parallel agents doesn't eliminate hallucination; it gives you three more places it might happen. Review discipline is still load-bearing.
Plan and prompt discipline still matter. A vague plan produces four vague outputs in parallel. The HTML-plan workflow exists precisely because vague specs were the bottleneck.
Daily spend visibility matters more, not less. When four tracks are running, the bill compounds faster than intuition suggests. Per-day cost monitoring is non-optional at this workflow's level of parallelism.
Multi-provider risk is unchanged. A Cursor-shaped workflow on Cursor-only model is a single-vendor bet. Keep your architecture flexible enough to fall back to a different harness if you need to.

The Practitioner's Take

The Composer-2.5-plus-Cursor-3 combination is the most consequential dev-workflow change I've made this year. Not because either product is individually transformative — both are good, neither is revolutionary on its own — but because the intersection of cheap-but-good model and parallel-agents-friendly IDE unlocks a specific workflow that was previously economically irrational.

The teams that internalise this workflow first capture a meaningful productivity gap over teams that adopt one of the two products without the other. The four-track parallel pattern, the /best-of-n Composer-only variant, the HTML-plan-plus-mixed-models pairing — these are the patterns that compound. A month into running them, I can't imagine going back to single-threaded coding with one frontier model.

The window where this combination is novel is the window where adopting it confers an outsized advantage. That window is now.