The Container Security Lie You’ve Been Told
Somewhere along the way, “containers are not VMs” became “containers are not secure.” Both sentences are true, but only one of them matters to you personally.
If you’re running your own homelab, your own apps, your own CI pipelines — standard runc containers are probably fine. The kernel is shared, yes. But the process running in that container is yours, and you presumably trust yourself not to try kernel exploits on your own box at 2 AM.
The moment that changes — untrusted code, shared multi-tenant infrastructure, compliance auditors breathing down your neck — you need something stronger. That’s where Sysbox, gVisor, and Kata Containers come in. Three different answers to the same question: what do you actually put between a container and the host kernel?
Let’s break them down.
What “Container Isolation” Actually Means
A standard container shares the host kernel. runc sets up namespaces and cgroups, which give the illusion of separation, but a container with the right capabilities or a kernel CVE can escape. The host kernel’s attack surface is fully exposed to every container running on it.
Think of it like this: runc gives you separate rooms in a house. The walls are real, the locks work fine for normal use — but they’re drywall. A determined person with the right tool goes straight through.
Sysbox, gVisor, and Kata each build a thicker wall. They just build it differently.
Sysbox: The “Actually Feels Like a VM” Container Runtime
What it is: An OCI-compatible runtime from Nestybox (acquired by Docker) that replaces runc. It uses Linux user namespaces aggressively to give containers a sandboxed view of the system — including the ability to run systemd, nested Docker, and even k3s inside the container.
The trick: Sysbox shifts the container’s root user into an unprivileged user namespace on the host. From inside the container, you’re root. From the host’s perspective, you’re UID 100000+. The kernel sees an unprivileged process doing unprivileged things.
Install Sysbox
# Ubuntu 22.04 / 24.04wget https://github.com/nestybox/sysbox/releases/latest/download/sysbox-ce_0.6.4-0.linux_amd64.debsudo apt install ./sysbox-ce_0.6.4-0.linux_amd64.deb# Verify it installedsudo systemctl status sysboxsysbox-runc --versionRunning a Container With Sysbox
# Run a container that can run Docker inside itselfdocker run --runtime=sysbox-runc --rm -it ubuntu:22.04 bashInside that container, you can install and start Docker normally. No --privileged. No bind-mounting the Docker socket (which is the worst security decision you can make in containers, incidentally).
Sysbox With Docker Compose
services: ci-runner: image: nestybox/ubuntu-jammy-systemd-docker runtime: sysbox-runc hostname: runner volumes: - /var/lib/sysbox:/var/lib/sysboxSysbox as a Kubernetes RuntimeClass
apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: sysbox-runchandler: sysbox-runcapiVersion: v1kind: Podmetadata: name: nested-docker-podspec: runtimeClassName: sysbox-runc containers: - name: dind image: nestybox/ubuntu-jammy-systemd-docker command: ["/sbin/init"]Kernel requirements: Linux 5.12+ with user namespaces enabled (CONFIG_USER_NS=y). Most modern distros qualify.
Performance: Near-native. Sysbox doesn’t intercept syscalls or add a virtualization layer. The overhead is essentially zero once the container is running.
Where it shines: Nested workloads. Running k3s-in-a-container for ephemeral CI. Running systemd in a container for integration tests. Docker-in-Docker without --privileged. If you’re building a homelab CI system, Sysbox is the cleanest answer.
Where it falls short: It’s not a multi-tenant isolation play. You’re still on the same kernel. A kernel exploit still reaches the host. Sysbox narrows the attack surface significantly via user namespaces, but it’s not a hard boundary.
gVisor: Google’s Userspace Kernel
What it is: A userspace kernel written in Go (originally for Google Cloud Run) that intercepts container syscalls before they reach the host kernel. When your containerized app calls read(), gVisor’s Sentry component handles it in userspace rather than passing it directly to the Linux kernel.
The trick: The host kernel’s attack surface shrinks dramatically because the container never talks to it directly. gVisor implements just enough of the Linux syscall interface to run most apps.
Install gVisor (runsc)
# Add the gVisor APT repocurl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.listsudo apt update && sudo apt install runsc# Configure containerd to use runscrunsc installsudo systemctl restart containerdcontainerd Config for gVisor
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc] runtime_type = "io.containerd.runsc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options] TypeUrl = "io.containerd.runsc.v1.options"Kubernetes RuntimeClass for gVisor
apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: gvisorhandler: runscapiVersion: v1kind: Podmetadata: name: sandboxed-appspec: runtimeClassName: gvisor containers: - name: app image: your-untrusted-app:latestKernel requirements: Standard x86_64 with KVM available for the KVM platform (recommended), or pure ptrace mode (slower, no KVM needed). Most VMs support KVM passthrough with nested virt.
Performance: This is where you feel it. gVisor adds roughly 20-30% overhead on syscall-heavy workloads — anything doing lots of fork(), open(), network I/O. Databases are notably worse. A SELECT * that takes 1ms under runc might take 1.3ms under gVisor. Doesn’t sound like much until you’re doing 10,000 of them.
Fork-heavy workloads (PHP-FPM, CGI-style apps, Python multiprocessing) take the hardest hit. Stateless HTTP apps and Go services tend to fare better.
Where it shines: Multi-tenant SaaS platforms. Running untrusted user code (online judges, code sandboxes, serverless functions). Anywhere where the code running in the container is supplied by someone other than you. Google runs it at scale for Cloud Run — that’s a pretty solid endorsement.
Where it falls short: Not every syscall is implemented. Some apps that do unusual things at the kernel level just don’t work. ptrace-based debugging doesn’t work inside gVisor. Some eBPF use cases break. If your app does anything exotic, test it thoroughly before trusting gVisor to sandbox it in production.
Kata Containers: The “Just Use a VM” Container Runtime
What it is: An OCI-compatible runtime that spins up a lightweight microVM (using QEMU, Firecracker, or Cloud Hypervisor under the hood) for each container — or each pod, in Kubernetes terms. The container’s workload runs inside a real VM with a real kernel. The host kernel never sees the workload’s syscalls.
The trick: Hardware virtualization creates a hard boundary. Even if the workload exploits its guest kernel completely, it’s still inside the VM. Getting from there to the host requires a hypervisor escape, which is a much harder and rarer class of vulnerability.
Install Kata Containers
# Ubuntu via snap (simplest)sudo snap install kata-containers --classic
# Or via the official packagesbash -c "$(curl -fsSL https://raw.githubusercontent.com/kata-containers/kata-containers/main/utils/kata-manager.sh)"# Check QEMU/KVM availability (required)kata-runtime checkConfigure containerd for Kata
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata] runtime_type = "io.containerd.kata.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options] ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"Kubernetes RuntimeClass for Kata
apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: kata-containershandler: kataoverhead: podFixed: memory: "60Mi" cpu: "250m"apiVersion: v1kind: Podmetadata: name: isolated-workloadspec: runtimeClassName: kata-containers containers: - name: app image: nginx:stable-alpine resources: requests: memory: "64Mi" cpu: "100m"Note the overhead in the RuntimeClass — that’s not optional padding. Kata genuinely adds ~50-60MB RAM per pod for the VM overhead and guest kernel. Plan your resource requests accordingly or your scheduler will lie to you.
Kernel requirements: KVM support on the host, with nested virtualization if you’re running inside a VM yourself. Hardware virt is non-negotiable. If your homelab box is an old Celeron with no VT-x, you’re out.
Startup time: Slower. runc starts containers in milliseconds. Kata starts microVMs — figure 500ms to 1s cold start depending on the hypervisor backend. Firecracker is fastest, QEMU is most compatible.
Performance: CPU-bound workloads are close to native inside the VM. The hit is on RAM (overhead per pod) and startup time. Sustained throughput for long-running services is acceptable.
Where it shines: Compliance-mandated VM isolation. Untrusted code execution where gVisor’s syscall compatibility is insufficient. Multi-tenant infra where tenants are paying customers with SLAs attached. Any environment where “it’s a real VM” is a checkbox on an audit form.
Where it falls short: Resource cost adds up fast. 60MB overhead per pod is nothing for 10 pods, brutal for 500. Cold starts hurt latency-sensitive workloads. And you still need hardware virt — you can’t run Kata inside most standard cloud VMs without nested virt enabled.
The Numbers, Side by Side
| Sysbox | gVisor | Kata | |
|---|---|---|---|
| Isolation mechanism | User namespaces | Userspace kernel (syscall intercept) | MicroVM (hardware virt) |
| Attack surface reduction | Moderate | High | Very High |
| Runtime overhead | ~0% | ~20-30% (syscall-heavy) | ~0% (once running) |
| RAM overhead per container | ~0 MB | ~5-10 MB | ~50-60 MB |
| Startup time | Fast (ms) | Fast (ms) | Slow (500ms–1s) |
| Nested Docker/systemd | Yes — first-class | No | Limited |
| KVM required | No | Optional (faster with it) | Yes |
| OCI / runc compatible | Yes (drop-in) | Yes | Yes |
| Kubernetes RuntimeClass | Yes | Yes | Yes |
| App compatibility | High | Medium (some syscalls missing) | High |
| Escape-to-host difficulty | Moderate | Hard | Very Hard |
Which One Do You Actually Need
Pick Sysbox if: You’re running CI pipelines, nested k3s clusters, or anything that needs Docker-in-Docker or systemd inside a container. Your workloads are trusted, you own the infra, and you want the container-feels-like-a-VM experience without the VM overhead. Homelab Kubernetes with self-hosted runners? Sysbox.
Pick gVisor if: You’re running untrusted code at scale — a code execution service, a SaaS platform where users upload and run things, serverless functions. You care more about isolation than raw performance, and you’ve confirmed your workload is compatible. This is Google’s production runtime for Cloud Run for a reason.
Pick Kata if: Compliance demands a real VM boundary. You’re running regulated workloads where “containers are not VMs” is a finding that needs remediation. You need maximum isolation and you have the hardware budget and RAM headroom for it. Or your security team won’t approve anything less than hypervisor isolation and you need something to put in the architecture doc.
Pick none of them if: You’re running your own apps on your own homelab. Standard runc with good image hygiene, a non-root user, read-only filesystems, and dropped capabilities gets you 90% of the way there with zero overhead. The biggest container security win for most people isn’t swapping out runtimes — it’s stopping the use of --privileged and latest tags.
The Real Talk
These runtimes solve real problems, but they’re also genuinely niche. Most production Kubernetes clusters run runc. Most homelab installs run runc. The security improvements matter enormously in specific contexts — multi-tenant SaaS, CI systems that build untrusted PRs, compliance-heavy environments.
For everything else, you’re adding operational complexity and overhead to solve a threat model that doesn’t match your actual environment.
Know your threat model before you reach for these. If the answer to “who runs code in my containers?” is “me and my team,” you probably don’t need any of them. If the answer is “our customers, via their uploaded code,” you need gVisor or Kata yesterday.
Your 2 AM self will thank you for not over-engineering this when you didn’t need to — and will really, really thank you for having the right runtime in place when you did.