Production agents
Learn why production agents fail when demos succeed, how to reduce blast radius through sandboxing and cost limits, and what reliability patterns make AI agents safe to deploy.
TL;DR
- A 10-step agent at 95% per-step reliability has a $(0.95)^10 \approx 0.60$ success rate. At 1,000 runs/day that is 400 failures every single day.
- Production agents fail in five specific ways: tool hallucination, infinite loops, irreversible actions, context overflow, and cascading hallucinations. Design a defense for each.
- Sandboxing is non-negotiable: code runs in Docker with no network access, API calls go through a logging proxy, database access is read-only replica only.
- Blast radius control means starting read-only, escalating permissions explicitly, capping tool calls at 20 per run, and enforcing a per-run cost budget.
- The minimum viable production agent is narrow in scope, 3-5 steps maximum, all irreversible actions behind human approval, and traces every step to a structured log.
The problem it solves
The demo works. The agent completes a five-step workflow flawlessly every time you show it. You deploy it to production on Friday. On Saturday morning, it sends a duplicate email to 300 customers. The email tool succeeded on the first attempt, but the HTTP response timed out before reaching the agent, so the agent assumed failure and retried. Both sends went through. By the time anyone notices, 300 customers have two identical "Your order has shipped" messages in their inboxes.
On day five, the agent enters a loop. It is checking an order status API. Every response says "processing." The agent doesn't recognize this as a terminal wait state and keeps polling. Forty API calls in one hour. $85 in unexpected charges, plus rate-limit bans from the API provider.
These are not edge cases. They are predictable failure modes that demos systematically hide. Demos use clean inputs at low frequency with an engineer watching. Production uses messy, unpredictable inputs at high volume with no one watching.
The gap between those two columns is where production incidents live.
What is it?
A production agent is an AI agent deployed for real user traffic, with the reliability, safety, and observability that implies. The technical challenge shifts from "can it complete the task" to "does it fail safely, can I debug it when it breaks, and does it stay within cost and safety constraints."
Think of it like a software deployment: a script that works on your laptop still needs proper error handling, logging, rate limiting, and rollback before it runs on 10,000 customers' data. An agent is a more capable script with a much larger blast radius.
Most of the work in taking an agent to production is not improving the LLM or the prompts. It is building the surrounding infrastructure: sandboxing, structured logging, circuit breakers, idempotency, and cost controls.
How it works
The five failure modes
Every production agent incident traces back to one of five root causes. Know the detection signal and the mitigation for each before you deploy.
Tool hallucination happens when the agent generates a tool call with arguments that don't match the tool's JSON Schema, or invents a non-existent tool name entirely. The mitigation is strict schema validation before execution. Reject the call and return the schema error back to the agent so it can self-correct. Do not silently swallow the error.
Infinite loops occur when the agent calls the same tool repeatedly because it misinterprets the response or the task is ambiguous. Track (tool_name, hash(args)) tuples within a run. If the same tool is called with semantically similar arguments three or more times, trigger a circuit breaker and route to human review.
Irreversible actions create duplicate effects when the agent retries after a timeout. The send succeeded, the response didn't arrive. This is the most dangerous failure mode in production. The mitigation is idempotency keys on every side-effecting tool call (covered in detail below).
Context overflow is subtle. When the accumulated observation history approaches the model's context limit, the model silently starts ignoring earlier parts of the context, which is usually the system prompt and task instructions. Add explicit context budget management: summarize or truncate observation history when it approaches 60% of the context limit.
Cascading hallucinations compound across steps. The agent misidentifies a user in step 2. Every subsequent step (send email, update record, generate report) operates on the wrong user. Add validation checkpoints after data-producing tool calls, especially ones whose output is used to drive downstream operations.
I have seen teams spend weeks debugging what looked like LLM quality problems when every failure was actually one of these five modes. Build explicit detections and mitigations for each before launch.
Sandboxing layers
Every agent action that touches external systems must go through a controlled interface.
Code execution: Run in Docker containers with no network access, limited filesystem mount, resource limits (CPU, memory, time). Never execute LLM-generated code in the host process.
File system access: Behind a permission-checked proxy that maps agent file paths to a sandboxed directory. The agent thinks it's accessing /reports/, it's actually accessing /sandbox/user_456/reports/. No path traversal.
API calls: Through a proxy that logs every call with the agent's trace ID, enforces rate limits, and can be killed externally if needed. The proxy is your kill switch.
Database access: Agents should never have direct production database credentials. Give agents a read-only replica if they need to query data. For write operations, have agents generate SQL that a human or validation layer approves before execution.
class SandboxedFileAccess:
def __init__(self, allowed_base_path: str, agent_id: str):
self.base_path = Path(allowed_base_path)
self.agent_id = agent_id
def read_file(self, path: str) -> str:
# Resolve to absolute path, verify it's within allowed base
full_path = (self.base_path / path).resolve()
if not str(full_path).startswith(str(self.base_path)):
raise PermissionError(f"Agent {self.agent_id} attempted path traversal: {path}")
return full_path.read_text()
Every agent action that touches an external system must pass through a controlled interface. The goal is to constrain what the agent can do, log everything it does, and retain the ability to kill it externally.
Code execution: Run LLM-generated code in Docker containers with no network access, limited filesystem mount, and hard resource limits (256 MB RAM, 10-second CPU timeout). Never execute agent-generated code in the host process. Not once.
File system access: Proxy file access through a permission-checked layer that maps agent-visible paths to a sandboxed directory on disk. The agent thinks it is writing to /reports/final.csv. It is actually writing to /sandbox/run_abc123/reports/final.csv. Validate every path before read or write; reject paths with .. traversal sequences.
API calls: Route all agent-initiated HTTP calls through an API proxy that logs every request with the run's trace_id, enforces per-tool rate limits, and can be killed externally. The proxy is your kill switch. If an agent goes rogue, you cut the proxy. The agent cannot reach the internet.
Database access: Agents should never hold production write credentials. Provide a read-only replica for queries. For write operations, have the agent generate a SQL statement or structured action that a validation layer reviews and executes separately.
class SandboxedFileAccess:
def __init__(self, allowed_base_path: str, run_id: str):
self.base = Path(allowed_base_path).resolve()
self.run_id = run_id
def read_file(self, path: str) -> str:
full = (self.base / path).resolve()
if not str(full).startswith(str(self.base)):
raise PermissionError(
f"Run {self.run_id} attempted path traversal: {path}"
)
return full.read_text()
def write_file(self, path: str, content: str) -> None:
full = (self.base / path).resolve()
if not str(full).startswith(str(self.base)):
raise PermissionError(
f"Run {self.run_id} attempted path traversal: {path}"
)
full.parent.mkdir(parents=True, exist_ok=True)
full.write_text(content)
Blast radius tiers
Blast radius is the maximum damage an agent can cause in a single run. Minimize it through a tiered permission model.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.