Sandboxing
Sandboxing isolates the agent’s actions to limit potential damage. The degree of isolation varies widely.
Weak Sandboxing
Most AI coding tools provide some form of permission system:
- Asking before operating in a folder
- Whitelisting allowed commands
- Requiring approval for file modifications
These are weak guardrails. They rely on the tool’s own enforcement and your attention to prompts. A sufficiently confused agent—or malicious code it generates—can still cause problems.
Soft Sandboxing
A middle ground uses constraints and conventions rather than hard isolation:
- Running in a dedicated subfolder
- Instructing the agent to only touch certain files
- Using worktrees to isolate experimental work
This relies on the model following instructions. It’s not true isolation, but it channels the agent’s work and makes cleanup easier if things go wrong.
Strong Sandboxing
True sandboxing means the agent can’t break out even if it tries:
- Running inside a VM
- Network-level firewalling
- Container isolation with restricted permissions
- No access to secrets or credentials
Strong sandboxing protects against not just the agent’s mistakes but also malicious code it might generate—code that could exfiltrate data or cause damage if run in your real environment.
The Trade-off
Stronger sandboxing means more friction. You can’t give the agent access to your real database, your actual credentials, or your production environment—which limits what it can do.
For exploratory work or untrusted tasks, strong sandboxing makes sense. For routine work in a codebase you trust, weaker forms might be practical. Match the isolation level to the risk.