The Claude Agent SDK is Anthropic's higher-level layer for building agents. handling the loop, tool execution, session management, and the surrounding infrastructure that you'd otherwise build yourself. Combined with Opus 4.7's improvements in long-running stability and 1M context, it's the most productive path to a production agent today.
This is the production playbook: the patterns that hold up under real usage.
What the SDK Actually Gives You
The raw Messages API gives you a model and a tool calling protocol. Everything else. looping, error handling, streaming, session persistence, parallelism. is yours to build. The Agent SDK provides:
- The agent loop, with sensible defaults for stop conditions and error recovery.
- Session management, so an agent can resume a conversation cleanly.
- A tool registration system that handles both your custom tools and built-in capabilities.
- Streaming and event hooks for observability and UX.
- Composable agents, so you can compose specialised sub-agents into a larger system.
You'd build all of this anyway for any non-trivial agent. Skipping that work and going straight to the agent's actual behaviour is the productivity win.
Minimum Viable Setup
The shape of an agent in the SDK:
import { ClaudeAgent, Tool } from "@anthropic-ai/claude-agent-sdk";
const searchOrders: Tool = {
name: "search_orders",
description: "Search past orders by customer email or date range.",
inputSchema: {
type: "object",
properties: {
email: { type: "string" },
from: { type: "string", format: "date" },
to: { type: "string", format: "date" },
},
},
execute: async ({ email, from, to }) => {
return await db.orders.findMany({ where: { email, date: { gte: from, lte: to } } });
},
};
const agent = new ClaudeAgent({
model: "claude-opus-4-7",
systemPrompt: `You are a customer support agent for Acme.
Use the available tools to look up information before answering.
Cite the order ID when discussing past orders.`,
tools: [searchOrders, getOrderDetails, refundOrder],
maxIterations: 12,
costBudgetUsd: 0.50,
});
const result = await agent.run({
message: "Find my last order under jane@example.com and tell me the status",
});
Three production defaults to set on day one:
maxIterations: hard cap on agent loops. 5-15 is typical for most use cases.costBudgetUsd: termination if cumulative cost exceeds the budget. Critical for any agent exposed to user input.model: "claude-opus-4-7": the right default for production agents in 2026. Drop to Sonnet only if you've measured equivalent quality on your task.
System Prompt Discipline
The system prompt is the agent's contract. Three rules I enforce on every codebase:
1. Define the agent's identity, scope, and boundaries explicitly.
You are a customer support agent for Acme Cloud.
Scope: You can look up customer orders, billing history, and account status.
You can issue refunds up to $500 with proper documentation.
Out of scope: Account password resets (direct to /reset). Legal questions
(direct to legal@acme.com). Sales questions (direct to sales).
Tone: Professional, concise. Match the customer's level of formality.
Never invent policies. If unsure, say so and escalate.
2. Specify how the agent should handle uncertainty. The default failure mode of a confident model is confident hallucination. Counter it explicitly: "If you don't have enough information, ask the user. Don't guess."
3. Document the tools at the system level, not just in tool descriptions. "When the user asks about a past order, use search_orders first to confirm it exists, then get_order_details for specifics." This guides the model through your intended workflow.
Tool Design Within the SDK
The SDK's Tool interface lets you focus on behaviour. Two patterns worth adopting:
1. Tool composition. A complex tool can call other tools internally:
const issueRefund: Tool = {
name: "issue_refund",
description: "Issue a refund on an order.",
inputSchema: { /* ... */ },
execute: async ({ orderId, amount, reason }, { agent }) => {
const order = await agent.callTool("get_order_details", { orderId });
if (amount > order.total) throw new Error("Refund exceeds order total");
return await paymentService.refund({ orderId, amount, reason });
},
};
2. Side-effect annotations for guardrails.
const deleteAccount: Tool = {
name: "delete_account",
description: "Permanently delete a user account.",
destructive: true, // SDK can require explicit confirmation flow
requiresApproval: true, // SDK pauses for human approval before executing
// ...
};
These hooks let the SDK enforce safety constraints at the framework level rather than requiring discipline in every prompt.
Sessions and Resumption
For agents that have multi-turn conversations with users, the SDK's session model handles persistence:
const session = await agent.startSession({ userId: "user_123" });
const r1 = await session.send("What's the status of my last order?");
// User goes away for 30 minutes
const r2 = await session.send("Great. Can you ship the next one to a different address?");
The session preserves the full message history and tool call results across requests. With Opus 4.7's 1M context, sessions can run very long without manual compression.
For very long sessions, configure compression strategies:
const agent = new ClaudeAgent({
model: "claude-opus-4-7",
sessionPolicy: {
maxContextTokens: 800_000,
onOverflow: "summarize-old-tool-results",
},
});
Three strategies work well in practice:
summarize-old-tool-results: replaces large tool result blocks with summaries when context approaches capacity. Cheapest; preserves reasoning history.truncate-oldest: drops the earliest messages. Simple but loses conversation context.summarize-then-restart: produces a "session so far" summary and restarts. Best for very long sessions where most early context is no longer relevant.
Sub-Agents and Composition
The SDK supports composing specialised agents. The pattern that works:
const researchAgent = new ClaudeAgent({
model: "claude-sonnet-4-6",
systemPrompt: "You research topics. Return a structured summary.",
tools: [webSearch, readUrl],
});
const writerAgent = new ClaudeAgent({
model: "claude-opus-4-7",
systemPrompt: "You write essays from research summaries.",
tools: [],
});
const coordinator = new ClaudeAgent({
model: "claude-opus-4-7",
systemPrompt: `You coordinate research and writing tasks.
First, dispatch research. Then, dispatch writing with the research output.`,
tools: [
researchAgent.asTool({ name: "research", description: "Run a research task" }),
writerAgent.asTool({ name: "write_essay", description: "Write an essay from research output" }),
],
});
The discipline: only split agents when they have genuinely different prompts, models, or permissions. A "planner agent" + "executor agent" with the same tools and context just adds latency. A research agent on Sonnet (cheap, fast for many calls) handing off to a writer on Opus (one expensive, careful call) is a real split.
Evaluation: The Discipline That Separates Demos From Products
Every production agent should have an evaluation suite. The minimum:
const evals = [
{
name: "finds-order-by-email",
input: "What's the status of my last order? My email is jane@example.com",
assert: (output) => output.toolCalls.includes("search_orders") && /Order #\d+/.test(output.text),
},
{
name: "refuses-out-of-scope",
input: "Can you reset my password?",
assert: (output) => output.text.includes("/reset") && !output.toolCalls.length,
},
// 30-50 of these
];
await runEvals(agent, evals);
Run these on every PR that touches the agent's prompt or tools. Quality regressions are easy to ship and hard to detect without an eval suite.
Observability
The SDK emits events; instrument them.
agent.on("toolCallStart", (e) => logger.info("tool.start", { tool: e.name, agentId: e.agentId }));
agent.on("toolCallEnd", (e) => logger.info("tool.end", { tool: e.name, durationMs: e.durationMs, error: e.error }));
agent.on("iteration", (e) => logger.info("agent.iteration", { iteration: e.n, inputTokens: e.usage.inputTokens, outputTokens: e.usage.outputTokens, costUsd: e.costUsd }));
The metrics that matter for ongoing operations:
- p95 iterations per task
- p95 cost per task
- Tool call success rate (per tool)
- Stop reason distribution (end_turn vs max_iterations vs cost_budget)
Build dashboards. Alert on regressions.
The Production Checklist
Before any agent ships:
- System prompt covers identity, scope, tone, uncertainty handling
- Tools have clear descriptions and tight input schemas
- Destructive tools marked
requiresApprovalor guarded in the prompt maxIterationsandcostBudgetUsdset- Eval suite of 30+ cases passing
- Observability instrumented for iterations, tools, costs
- Manual review of 50 real runs to catch behaviours not covered by evals
The SDK doesn't make agents safe by default. it makes the patterns that produce safe agents easier to apply. Apply them.