← back to writing

Running Untrusted LLM Code Without Fear: Building a MicroVM Sandbox

• 3 min read

Every time an LLM generates code, you face a choice: trust it blindly or spend hours reviewing it. Neither option scales.

Every time an LLM generates code, you face a choice: trust it blindly or spend hours reviewing it. Neither option scales.

I built Fission to solve this. It's a high-performance sandbox that executes untrusted code in ephemeral Firecracker microVMs. Boot time: 125ms. Overhead: minimal. Security: bulletproof.

Here's how to run LLM code without losing sleep.

The Trust Problem

LLMs are getting scary good at writing code. But they also hallucinate, make mistakes, and occasionally suggest rm -rf /. Even worse, malicious prompts can inject harmful code that looks innocent.

Traditional solutions don't work:

  • Docker: Shared kernel, container escapes possible
  • VMs: Too slow, too heavy (2-3 second boot times)
  • Language sandboxes: Limited to one language, often bypassable
  • Static analysis: Can't catch runtime behavior

I needed something that was:

  1. Fast: Sub-second execution for interactive use
  2. Secure: True isolation, not just namespace separation
  3. Language-agnostic: Python, Node, Go, whatever
  4. Ephemeral: No state persistence between runs

Enter Firecracker.

Firecracker: VMs at Container Speed

Amazon built Firecracker to run Lambda functions. It creates microVMs with:

  • 125ms boot time
  • 5MB memory overhead
  • Hardware-level isolation
  • No shared kernel surfaces

Perfect for sandboxing untrusted code.

Here's the architecture I built:

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Management API │────▶│  VM Orchestrator │────▶│   Firecracker   │
│    (gRPC)       │     │  (Rust Service)  │     │   MicroVMs      │
└─────────────────┘     └──────────────────┘     └─────────────────┘
         ▲                                                │
         │                                                ▼
┌─────────────────┐                              ┌─────────────────┐
│   MCP Adapter   │                              │ Ephemeral VMs   │
│   (JSON-RPC)    │                              │ • Python env    │
└─────────────────┘                              │ • Node.js env   │
                                                 │ • Isolated FS   │
                                                 └─────────────────┘

## Real-World Usage

### Basic Execution

```bash
## Run Python code
fission run --lang python "print('Hello from isolation!')"

## Output appears, VM destroyed, no trace left

Behind the scenes:

1. VM boots (125ms)
1. Code executes in isolation
1. Output captured
1. VM destroyed completely

### Complex Scenarios

**Running untrusted data analysis:**

```bash
fission run --lang python "
import pandas as pd
## LLM-generated analysis code
df = pd.read*csv('/data/input.csv')
print(df.groupby('category').sum())
"

**Testing package combinations:**

```bash
fission install --lang python \
  --packages numpy,scipy,matplotlib \
  --command "python analysis.py"

**Clone and test repositories:**

```bash
fission repo https://github.com/untrusted/code \
  --command "pytest" \
  --lang python

Each execution is completely isolated. No cross-contamination possible.

## Security Deep Dive

### Layer 1: Hardware Isolation

Firecracker uses KVM for hardware-level isolation:

```rust
// Each VM gets its own virtualized hardware
let vm*config = VmConfig {
    vcpu*count: 1,
    mem*size*mib: 128,
    kernel: "vmlinux-5.10",
    rootfs: "python-rootfs.ext4",
};

### Layer 2: Network Isolation

By default, no network access:

```rust
pub struct NetworkConfig {
    enabled: bool,
    allowed*hosts: Vec<String>,  // Empty by default
}

Can be selectively enabled:

```bash
fission run --network --allow-host api.example.com \
  --lang python "requests.get('https://api.example.com')"

### Layer 3: Resource Limits

Hard limits prevent resource exhaustion:

```rust
pub struct ResourceLimits {
    cpu*shares: u32,      // Default: 1024
    memory*mb: u32,       // Default: 128
    timeout*secs: u32,    // Default: 30
    disk*mb: u32,         // Default: 512
}

### Layer 4: Syscall Filtering

Seccomp filters block dangerous syscalls:

```rust
let seccomp*rules = vec![
    // Block kernel module operations
    Rule::new(libc::SYS*init*module, Action::Errno(libc::EPERM)),
    Rule::new(libc::SYS*delete*module, Action::Errno(libc::EPERM)),

    // Block raw network access
    Rule::new(libc::SYS*socket,
        if domain != AF*INET { Action::Errno(libc::EPERM) }),
];

## Performance Optimization

### Pre-warmed VM Pool

I maintain a pool of pre-booted VMs:

```rust
struct VmPool {
    python*vms: Vec<PrewarmedVm>,
    node*vms: Vec<PrewarmedVm>,
    go*vms: Vec<PrewarmedVm>,
}

async fn get*vm(lang: Language) -> Vm {
    // Instant VM from pool, or boot new one
    pool.take(lang).await
        .unwrap*or*else(|| boot*vm(lang).await)
}

**Result**: <50ms from request to execution start.

### Copy-on-Write Rootfs

Base filesystem shared via CoW:

```rust
// Base image (read-only)
let base*rootfs = "/var/lib/fission/base/python.ext4";

// Create CoW overlay for this execution
let overlay = create*overlay(base*rootfs, execution*id);

**Impact**: 5MB per VM instead of 500MB.

### Efficient Output Streaming

Real-time output via virtio-serial:

```rust
let (tx, rx) = channel();
vm.attach*serial(move |data| {
    tx.send(Output::Stdout(data)).ok();
});

No polling, no buffering, instant feedback.

## Building Language-Specific Images

Each language gets a minimal rootfs:

```dockerfile
## Python image
FROM scratch
COPY --from=python:3.11-slim /usr/local /usr/local
COPY --from=python:3.11-slim /lib /lib
COPY --from=alpine:latest /bin/sh /bin/sh

## Only essential packages
RUN pip install --no-cache-dir numpy pandas requests

## Total size: ~50MB

Build process:

```bash
cd images
sudo make python  # Builds python rootfs
sudo make all     # Builds all languages

## Integration Patterns

### With LLM Agents

```python
## Agent generates code
code = llm.generate("Analyze this CSV and find anomalies")

## Execute safely
result = fission.run(
    language="python",
    code=code,
    timeout=30,
    memory*mb=256
)

## Use results with confidence
if result.success:
    return result.output

### CI/CD Pipeline Testing

```yaml
## .github/workflows/test-generated.yml
- name: Test LLM-generated solutions
  run: |
    fission run --lang python \
      --file generated*solution.py \
      --timeout 60

### Interactive Development

```python
## Real-time code execution for IDEs
async def execute*cell(code: str) -> Result:
    async with fission.connect() as client:
        return await client.run(
            language="python",
            code=code,
            stream*output=True
        )

## Lessons Learned

### 1. MicroVMs Are the Sweet Spot

Not as light as containers, not as heavy as VMs. Perfect balance of:

- Security (hardware isolation)
- Performance (sub-second boot)
- Flexibility (any language/runtime)

### 2. Rust + Firecracker = ❤️

The entire orchestrator is 3,000 lines of Rust. Memory safe, blazing fast, perfect for security-critical code:

```rust
// Type-safe VM lifecycle
enum VmState {
    Booting,
    Ready(VmHandle),
    Executing(ExecutionHandle),
    Terminated,
}

### 3. Ephemeral by Design

No cleanup needed. VM dies = everything gone:

- No leaked processes
- No orphaned files
- No network connections
- No memory artifacts

### 4. Pre-warming Changes Everything

That 125ms boot time? With pre-warming, execution starts in <50ms. Fast enough for interactive use, secure enough for production.

## What's Next

I'm working on:

- **GPU passthrough**: For ML workloads
- **Distributed execution**: Spread across multiple hosts
- **Persistent volumes**: Optional mounted directories
- **WebAssembly support**: Even lighter isolation

## Try It Yourself

```bash
## Clone and build
git clone https://github.com/haasonsaas/Fission
cd Fission
cargo build --release

## Build VM images
cd images && sudo make all

## Start the service
./target/release/fission-management

## Run untrusted code fearlessly
./target/release/fission run --lang python "
## Your sketchy LLM code here
"

The future of AI assistants is code generation. The future of code generation is secure execution.

Build with confidence. Execute without fear.

---

*Using Fission in production? I'd love to hear about your use cases and security requirements._

share

next up