Sandboxed tool authorization
Run each tool call inside an isolated sandbox with the minimum permissions needed, so a compromised or hallucinating agent cannot damage production systems.
TL;DR
- Every tool call executes inside an isolated sandbox (container, microVM, or Wasm module) with only the permissions that specific operation requires. Permissions expire when the call completes.
- Three permission layers stack together: tool-level (what the tool can do), session-level (what this user's agent can do), and operation-level (what this specific call can do). The intersection of all three determines the effective permission set.
- Credential injection happens at sandbox creation time from a secrets vault. The LLM never sees raw API keys, database passwords, or tokens.
- Dangerous operations (DELETE, financial transactions, infrastructure changes) trigger a human approval gate before the sandbox executes. Latency cost: 0 for pre-approved operations, seconds to minutes for gated ones.
- Limitation: sandbox overhead adds 50-200ms per tool call for container-based isolation, and the permission model requires upfront investment in defining what "minimum permissions" means for each tool.
The Problem It Solves
Your AI coding agent is refactoring a database migration script. The LLM decides the cleanest approach is to drop the old table and recreate it with the new schema. It calls the execute_sql tool with DROP TABLE users. In production. During business hours.
The agent wasn't malicious. It wasn't even wrong in a theoretical sense, dropping and recreating is a valid migration strategy in development. But nothing stopped the tool call from executing a destructive operation against a production database. The tool had full database credentials, no execution boundary, and no permission scoping.
I've watched this exact scenario play out at a startup where an agent had broad DB access for "flexibility." The post-mortem took longer than the actual data recovery. The root cause wasn't the LLM hallucinating. It was that the tool execution environment had no concept of least privilege.
The fundamental issue: when an LLM calls a tool, the tool executes with whatever permissions the hosting process has. If your agent server has write access to production databases, every tool call has write access to production databases. A single hallucinated argument, a prompt injection attack, or even a well-meaning but misguided plan can cause irreversible damage.
What Is It?
Sandboxed tool authorization wraps every tool call in an isolated execution environment (the sandbox) that enforces a strict, pre-defined permission boundary. The sandbox controls what the tool can access (files, network, databases), what operations it can perform (read vs. write), and how many resources it can consume (CPU, memory, time). Permissions are scoped to the individual call and expire when execution completes.
Think of it as a bank safe-deposit box. You don't hand the customer the vault key and say "grab whatever you need." Instead, a clerk escorts them to one specific box, opens it with a time-limited key, watches while they access only that box, and locks it when they're done. The customer (tool call) gets exactly the access they need, nothing more, and the access disappears when the interaction ends.
How It Works
The three-layer permission model
Permissions resolve through three layers that intersect to produce the smallest possible privilege set. No single layer can escalate permissions beyond what the layers above it allow.
Tool-level permissions define the maximum capability of a tool regardless of who calls it. The query_database tool might allow SELECT on specific tables. The write_file tool might allow writes only to /tmp/agent-workspace/. These are defined once when you register the tool and they never change at runtime.
Session-level permissions constrain what a specific user's agent session can do. An admin user's session might allow DELETE operations. A read-only analyst's session restricts all tools to read-only mode. Session permissions are resolved at session creation from user roles, org policies, and environment (dev vs. prod).
Operation-level permissions scope a single tool call. Even if the tool supports writes and the session allows writes, this specific call might only need read access. The policy engine inspects the tool arguments (the SQL query, the file path, the API endpoint) and grants the narrowest permission that satisfies the request.
The effective permission is always the intersection: effective = tool_perms β© session_perms β© operation_perms. If any layer denies an action, it's denied.
Sandbox lifecycle: create, inject, execute, destroy
Every tool call follows a four-phase lifecycle. The sandbox exists only for the duration of that single call.
Phase 1: Create. The policy engine resolves effective permissions and spins up an isolated environment. For container-based sandboxes, this means pulling a pre-warmed container from a pool (cold start: 200-500ms; warm pool: 20-50ms). For Wasm-based sandboxes, instantiation takes under 5ms. The environment has a filesystem snapshot, network rules, and resource limits baked in.
Phase 2: Inject. The sandbox retrieves credentials from a secrets vault (HashiCorp Vault, AWS Secrets Manager, or a local encrypted store). Credentials are mounted as environment variables or files inside the sandbox. The LLM never sees these credentials in the tool call arguments, in the response, or in the audit log. I've seen teams make the mistake of passing API keys through the LLM's tool arguments. That means the key appears in your prompt history, your logs, and potentially in the model provider's training data.
Phase 3: Execute. The tool runs inside the sandbox with enforced limits: CPU quota, memory ceiling, execution timeout, network allowlist. If the tool tries to access a file outside its permitted paths, the OS-level sandbox (seccomp, AppArmor, gVisor) blocks the syscall. If it exceeds the timeout, the sandbox is killed.
Phase 4: Destroy. After execution completes (or the timeout fires), the sandbox is torn down. All filesystem changes, network connections, and temporary state are discarded. The only things that survive are the tool's return value and the audit log entry. This prevents state accumulation across tool calls, which is critical for blocking prompt injection attacks that try to persist a foothold.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.