AI Security

AI Agents Keep Escaping Sandboxes

Name: ZeroTrace
Address: DE

Every coding agent ships with a sandbox story: "the agent can only touch this directory," "the container isolates the model from your real system," "the tool runs...

AI Agents Keep Escaping Sandboxes - ZeroTrace blog image

April 21, 20263 min read539 words

AI SecurityAI SecurityAgentsKeepEscaping

The sandbox was always optimistic

Every coding agent ships with a sandbox story: "the agent can only touch this directory," "the container isolates the model from your real system," "the tool runs with reduced permissions." In practice, 2025 and 2026 have produced a steady drip of reports where those boundaries turned out to be softer than the marketing implied.

The pattern is not a single bug. It is a class of bugs with a shared root cause: sandboxes that were designed against honest users, then deployed against adversarial prompts.

The recurring escape vectors

Across disclosed issues in the last year, the escape paths look familiar:

Symlink traversal from an "allowed" project directory into the rest of the filesystem
Environment variable leakage from the host into the agent's execution context
Git hooks, build scripts, or test runners that execute with fewer restrictions than the agent itself
MCP servers or plugin systems that run outside the agent's sandbox entirely
Path normalization bugs that accept ../ once the input is URL-encoded or mixed-case
Network rules that allow "just the model provider" but let proxy headers reach internal services

None of these are exotic. They are all variations of "the sandbox trusts something it should not have trusted."

Why agent sandboxes are harder than container sandboxes

A normal container sandbox has to resist code running inside it. An agent sandbox has to resist prompts that describe code, which then runs inside it, often with the user's implicit blessing. The attack surface is the model's willingness to take instructions from any text it encounters — pull request descriptions, issue comments, README files, scraped web content, dependency metadata, even code comments.

That makes prompt injection part of the sandbox threat model, not a separate "AI safety" concern. If a malicious README can get your agent to write a file to ~/.ssh/authorized_keys, the sandbox failed, regardless of whether the model was "jailbroken" in a traditional sense.

What to actually check

For any coding agent running on operator workstations:

Confirm the sandbox resists symlink and junction traversal — test it, do not assume
Audit which environment variables the agent can read at runtime
Treat every tool the agent can invoke as a potential escape vector, including git, make, npm, and test runners
Restrict network egress to the model provider, and block host-header smuggling into the proxy
Log every tool invocation with enough context to reconstruct what the agent did
Keep credentials for production systems out of any environment an agent can reach

The short version: assume the agent will do exactly what a prompt tells it, and ask whether the sandbox holds when that prompt is adversarial.

The design goal

A sandbox that only works against well-behaved inputs is not a sandbox. The bar for agent tooling in 2026 should be the same bar we already apply to browser renderers and container runtimes: the boundary holds even when the content inside is actively hostile.

Treat every agent like it has already been jailbroken. Then the sandbox story gets honest.

Source note

This field note reflects patterns across disclosed issues in Cursor, Windsurf, Claude Code, and Copilot Workspace through early 2026, plus the SoK paper on prompt injection against agentic coding assistants and ongoing research on MCP tool poisoning.

Keep Reading

All Posts

Threat Brief/2 min read

ActiveMQ KEV Message Broker Review

CISA added CVE-2026-34197 for Apache ActiveMQ to the KEV catalog on April 16, 2026. The catalog describes it as an improper input validation issue that can allow code...

Hardware/2 min read

After Physical Access Tests

Physical access testing can create temporary changes: opened rooms, moved equipment, test accounts, evidence files, device approvals, and security alerts. The work is...

AI Security/2 min read

Agentic Coding Tools Need Permission Design

Agentic coding tools ask for trust constantly: read this file, edit that module, run this command, install this package, open this URL. After enough prompts, humans...