"Openclaw's Agent Sandbox Architecture Under Popularity: From Technology Choices to Security Stories Understandable to Ordinary People" Two modes Imagine hiring a security guard to take care of your home. You have two options: Option 1: The security guard lives in your home, but locks the toolbox in the safe. Security guards can move around and see your home, but they can't get the keys. Option 2: The security guard lives in the sentry box outside, and there is nothing for him at home. He wants to get anything, he has to find your housekeeper. Browser Use (which runs millions of Web Agents) chose Option 2. Their stories are actually relevant to everyone who uses AI.
2. How to Use Browser They initially used Option One: the Agent runs on its own server, with code execution placed in an isolated sandbox. Sounds pretty safe, right? But there's a problem: the Agent itself is still on the server, and it can see environment variables, API keys, and database credentials. What if the Agent decides to "steal something"?
3. So they rewrote the entire architecture: • Agent completely isolated: each Agent runs in its own Unikraft micro-VM, starting in less than a second • Control plane as the steward: all external communications (calling LLM, storing files, billing) go through the control plane, which holds all credentials • Sandbox knows nothing: Agents only receive three environment variables—session token, control plane URL, session ID. No AWS keys, no database credentials • Disposability: Agent dead? Restart one. State lost? The control plane has the complete context. It has nothing worth stealing and no state to retain.
4. Technical details: Production Unikraft micro-VM (scale-to-zero, suspend when idle), development using Docker containers. The same image everywhere. From the perspective of an ordinary person: What does this have to do with me? You may not know what "micro-VM" or "presigned URLs" are, but when you use AI, you are dealing with this kind of architecture.
5. Sense of security: When you use an AI service to write code and check data, they are actually running your request in an isolated VM. If the architecture is not well designed (Option 1), the AI Agent can theoretically see all the secrets of the service side - database passwords, API keys, and other users' data.
6. Cost and speed: Option 2 has a price - one more network jump per operation. However, compared to LLM response time, this latency is almost negligible. What's more, the VM hangs when the agent is idle, and the cost is close to zero. Data Privacy: How Do You Store Your Files? The sandbox asks the control plane for a presigned URL and uploads it directly to S3. The entire sandbox did not see AWS keys. Your data is not leaked to the agent.
VII. My Thoughts: On-premises vs. Cloud My current setup (OpenClaw + LM Studio + x-reader) is a typical "stand-alone version": • Model runs locally (Qwen3.5-35B on RTX 3090) • The agent is not isolated (because it is on your computer) • Data is completely local This compares to Browser Use's plan: Dimensions Local Agent (us) Cloud Isolation Agent(Browser Use) Privacy The data is not local The data is uploaded to the cloud, but the agent cannot get the key Safety Rely on on-premises protection Agents are completely isolated and cannot be stolen Cost One-time hardware investment Pay-as-you-go (scale-to-zero) Scalability Limited by local hardware Unlimited scaling, multi-agent parallelism Delay Zero network latency One more network jump (but negligible)
VIII. My verdict: the future will be a hybrid model. • Simple tasks run locally: write a script, check a data, organize files, these can be done locally, with good privacy and speed •Complex tasks on the cloud: When multiple agents need to be run in parallel, processing large amounts of data, and running for a long time, it is more suitable to use Browser Use
9. There is nothing in the first place, where is the dust? Your agent should have nothing to steal and no state to keep. This sentence translates into the vernacular: • Not worth stealing: Agents don't know any secrets. Does it require a token for LLMs? The control plane gives it, throw it away when it runs out. Does it want to save files? The presigned URL is temporary and expires and becomes invalid. • No need to hold: Agent dead? Restart a new one. The context it remembers? The full records are available in the control plane database. This is actually the application of Zero Trust architecture in the age of AI: don't trust any component, even if it's an agent written by yourself.
10. How should AI novices learn? 1. Choice of AI Tools: When using cloud AI services, ask yourself – what can I get if this agent gets out of control? A good architecture should make it "know nothing". 2. Privacy awareness: Local AI runs simple tasks (OpenClaw, LM Studio), and sensitive data is not uploaded to the cloud. Complex tasks are isolated in the cloud, but know that the data leaves on-premises. 3Future workflows: One person + multiple agents collaboration is the trend (Karpathy says Tab→Agent→Parallel Agents→Agent Teams). But every agent should be quarantined and not allowed to "live in your home".
XI. The trade-off between safety and efficiency Browser Use's solution isn't perfect - three more services to deploy, and one more network jump per operation. But compared to the risk of "the agent steals all the keys", these agents pay for themselves. For those of us who are native AI setups, the enlightenment is: • Simple scenario: Continue to use the local solution (OpenClaw + LM Studio), which has good privacy and low cost • Complex scenarios: In the future, it may be necessary to access the cloud isolation agent service to allow professional people to do professional things AI security is not metaphysics, it's architecture design. Good design leaves agents "with nothing" - no secrets to steal and no status to rely on.
XII. This is probably what the future of AI infrastructure will look like: agents are disposable, control planes are trusted, and user data is protected. As for us? Continue to use OpenClaw to run local agents, and when you need to run dozens or hundreds of parallels one day, consider accessing the architecture of Browser Use. Tomorrow will be better
1.41K