REF / WRITING · SOFTWARE

Google's Managed Agents API: A Production Agent in One Call

Google's Managed Agents API gives you a sandboxed Linux agent with tool use and code execution in one call - what it changes about the build-vs-rent calculus.

DomainSoftware
Formatessay
Published20 May 2026
Tagsgoogle · gemini · managed-agents

The most consequential developer announcement from I/O isn't a new model. It's a new primitive: the Managed Agents API. One API call spins up a fully provisioned agent — sandboxed Linux environment, tool use, code execution, persistent context — running on Gemini 3.5 Flash. Google has compressed what used to be weeks of infrastructure work into a single endpoint.

That changes the build-vs-rent calculus on every agentic feature I have shipped in the last year. This is the practitioner read on what the primitive actually does, what it's competing with, and where I'd use it versus continue rolling my own.

What Managed Agents Does in One Call

The pitch is exactly what it sounds like. With one API call you get:

  • A remote Linux sandbox with filesystem access, package install, and persistent state across the run.
  • A tool-use runtime that can call any tool you register, plus a default cast for common operations (file I/O, HTTP, code execution).
  • Code execution inside the sandbox, with the agent reading stdout/stderr and iterating until tasks pass.
  • A default agent harness running on Gemini 3.5 Flash, with the new benchmark profile — 76.2% Terminal-Bench, 1656 Elo on agentic evals.
  • No infrastructure to provision yourself. No sandbox setup, no container orchestration, no credential management for tool access, no per-run cleanup.

In a typical request you pass the agent a prompt and a set of tools. The harness handles the rest — provisioning the sandbox, executing tool calls, capturing intermediate state, returning the final output. You pay per second of agent runtime and per token of model usage, billed together.

The reason this matters: it removes the part of agent development that was actually hard. Defining intent, choosing tools, validating outputs — those are interesting problems. Provisioning ephemeral Linux environments, masking credentials, isolating sandbox state per run — those are infrastructure problems that every team kept solving from scratch.

What It's Competing With

Managed Agents lands in a market that already has direct alternatives. The honest comparison:

CapabilityManaged AgentsClaude Agent SDKOpenAI Assistants
One-call provisioningSelf-host or hosted
Default Linux sandboxBuild your ownLimited (code interpreter only)
Tool registration modelOpenOpenFunction calling
Default modelGemini 3.5 FlashClaude familyGPT family
SDK in Python/TypeScript✓ (via Antigravity SDK)
Multi-model routingSingle providerHosted-vs-self choiceSingle provider
Persistent state across runsBuild your ownThreads

The interesting part of this table isn't the row-by-row comparison — all three platforms are now broadly capable. It's that the unit of abstraction differs. Managed Agents sells you an agent. Claude Agent SDK sells you the building blocks for an agent. OpenAI Assistants sells you a conversation-shaped wrapper. The right choice depends on how much of the harness you want to own.

Cost and Trust Trade-Offs

Three things to be honest about before you commit production traffic to this primitive.

The single-vendor problem

The Managed Agents API runs on Google's infrastructure with a Google model by default. The simplicity that makes the one-call provisioning compelling also means your agent's behaviour, latency, and availability are now coupled to one provider. The tool-use architectures I've written about for SMB workflows generally route across multiple providers for exactly this reason. Single-vendor agentic primitives are convenient. Multi-vendor architectures are resilient. Pick deliberately.

Per-second billing changes the cost calculus

A sandbox that lives for the duration of a run is billed accordingly. Long-horizon tasks that previously cost only inference tokens now also cost runtime. Cheap when the agent finishes in two minutes; less cheap when it loops for thirty. The cost model rewards short, focused agents and penalises sprawling ones — which is probably the right incentive, but worth knowing before the first invoice.

Trust and sandbox isolation

A managed sandbox is convenient. It also means I'm trusting Google's sandboxing rather than mine. For the bulk of workloads — internal tools, content pipelines, customer-facing agents that don't touch sensitive systems — this is fine. For high-stakes work (financial transactions, healthcare data, anything where a bad agent action has lasting consequences) the calculus is different. Most of the production failure modes I've written about for AI agents still apply — the harness reduces failure surface, it doesn't eliminate it.

When to Use the Managed Primitive vs Roll Your Own

The decision rule I'm settling on, after a week of running both side-by-side on small workloads:

Use Managed Agents when

  • The agent is short-lived. Sub-fifteen-minute runs are where the per-second billing is generous.
  • You don't already have agent infrastructure. Greenfield agentic feature, no in-house sandbox layer to integrate with — start here.
  • You want to ship in days, not weeks. Time-to-first-working-agent is the metric this primitive optimises for. Use it.
  • The workload is non-sensitive. Internal tooling, content generation, batch processing, automation glue. Anywhere the worst case is a wasted run, not a security incident.

Roll your own (or use Claude Agent SDK with self-hosted infrastructure) when

  • You're operating across multiple providers. A primitive that only runs Google models is the wrong abstraction for a multi-vendor stack.
  • Per-run cost will be the dominant line item. High-volume, sustained workloads where the runtime billing compounds faster than you'd like.
  • You need bespoke sandbox isolation. Regulated industries, sensitive data, anywhere your security team's review will not accept "Google's sandbox is fine."
  • The agent is long-running by design. Multi-hour autonomous loops where per-second billing crosses the threshold of inconvenient.

Worth running both

For most teams, the right move is to use Managed Agents for the agentic features you'd otherwise not ship, and to keep the existing self-hosted or Claude-Agent-SDK-based agents for the cases where the trade-offs go the other way. The two paths aren't mutually exclusive.

What's Not Changed

The unchanging caveats:

  • You still need evals. A managed primitive doesn't reduce the need to validate that your agent does what it says it does on your specific workload.
  • Prompt and tool design still matter. The default tool cast is a starting point, not a finished system. Bad tools produce bad agents regardless of harness.
  • Multi-provider risk is unchanged. A convenient primitive is not a reason to remove your routing layer or your fallback path.
  • Closed-loop discipline still applies. Agentic features that don't capture outcomes for downstream improvement plateau in quality fast — the closed-loop framework I've written about applies whether the underlying harness is managed or self-hosted.

The Practitioner's Take

The Managed Agents API is the kind of release that doesn't generate headlines but quietly resets the build-vs-rent default for an entire category of features. A year ago, "agent that runs in a Linux sandbox" was a multi-week infrastructure project. Today it's an API call.

That changes which agentic features are worth shipping. The ones I deferred because the infrastructure cost was unjustifiable at the expected usage volume are now back on the roadmap. The ones I shipped on bespoke infrastructure are not getting rebuilt — but the next one I would have built that way is probably going to start here and migrate later if the workload demands it.

This is the move Google should have made twelve months ago, and the move that the entire frontier-model market has been heading toward. Whoever ships the simplest, most reliable "one-call agent" wins a meaningful share of the agentic-features market — not because the model is better, but because the time-to-first-working-agent is lower. Google has just made that case for Gemini 3.5 Flash. Expect Anthropic and OpenAI to match within the quarter.