The Silent Problem Nobody Talks About Until 3 AM
Picture this: you’ve deployed four containers on a shiny new VPS. Everything runs beautifully for three days. Then, on a quiet Tuesday night, your monitoring goes dark. SSH takes forty-five seconds to connect. When you finally get in, top reveals that your Elasticsearch container has casually consumed 14 GB of RAM on a 16 GB server, and the OOM killer has been on a rampage, taking out your database, your API, and somehow your SSH agent.
You didn’t set resource limits. You basically handed your containers an unlimited credit card and said “go wild.”
This is not a hypothetical. This is a rite of passage. Almost every Docker user learns about resource limits the hard way — usually at 3 AM, usually on a weekend, and usually when they really need that server to be working.
Let’s make sure you skip that particular rite.
Why Default Docker Has No Limits (And Why That’s Terrifying)
By default, a Docker container has access to all of the host’s resources. Every byte of RAM. Every CPU cycle. The entire swap partition. There’s no fence, no guardrail, no “hey buddy, maybe leave some for the rest of us.”
This design choice makes sense if you think about it from Docker’s perspective — it’s trying to stay out of your way. But in practice, running containers without limits is like renting out rooms in your house without setting any ground rules. Eventually, someone’s going to fill the bathtub with rubber ducks and flood the first floor.
Here’s what a completely unconstrained container looks like from the kernel’s perspective:
# Check a container's current resource limits
docker inspect --format='{{.HostConfig.Memory}}' my_container
# Output: 0 (which means "unlimited" -- yikes)
docker inspect --format='{{.HostConfig.NanoCpus}}' my_container
# Output: 0 (also unlimited -- double yikes)
Zero means unlimited. Zero means chaos.
Memory Limits: The Big One
Memory is the resource that will burn you first, fastest, and hardest. A CPU-starved container runs slowly. A memory-starved container gets murdered by the kernel. There’s a meaningful difference between “annoying” and “dead.”
Setting Hard Memory Limits
The --memory (or -m) flag sets a hard ceiling on how much memory a container can use:
# Limit container to 512 MB of RAM
docker run -d --memory=512m --name my_app nginx
# Limit to 2 GB
docker run -d --memory=2g --name my_db postgres:16
When the container tries to exceed this limit, one of two things happens:
- If the container’s process tries to allocate beyond the limit, the allocation fails
- If it keeps pushing, the OOM killer steps in and terminates the process
That second one sounds scary, and it is. But it’s controlled scary. The OOM killer only takes out your container, not your entire server. That’s the whole point.
Memory Reservations (Soft Limits)
A memory reservation is a softer, gentler suggestion. It tells Docker “try to keep this container around this amount of memory, but don’t kill it if it goes over.”
docker run -d \
--memory=1g \
--memory-reservation=512m \
--name my_app node:20-slim
Here’s how reservations work in practice:
- When the host has plenty of free memory, the container can use up to the hard limit (1 GB)
- When the host starts running low on memory, Docker will attempt to shrink the container back to the reservation (512 MB)
- The hard limit is never exceeded regardless
Think of it like a hotel room: the reservation is your guaranteed minimum, the hard limit is the biggest room they’ll ever give you, and the OOM killer is the guy who throws you out for trashing the minibar.
Checking Memory Usage
The docker stats command is your best friend here:
docker stats --no-stream
Output looks something like:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
a1b2c3d4e5f6 postgres 0.50% 156.3MiB / 512MiB 30.53% 1.2kB / 0B 8.2MB / 4.1MB
f6e5d4c3b2a1 redis 0.15% 12.4MiB / 256MiB 4.84% 500B / 0B 0B / 0B
1a2b3c4d5e6f node_app 1.20% 287.1MiB / 1GiB 28.04% 5.6kB / 2.1kB 0B / 12.3MB
That MEM USAGE / LIMIT column is where the action is. If you see a container consistently sitting at 95%+ of its limit, it’s time to either optimize the app or raise the limit. If it’s sitting at 5%, you’re being too generous and could reclaim that memory for something else.
The OOM Killer: Docker’s Grim Reaper
The Linux Out-Of-Memory killer is the kernel’s last line of defense when the system runs out of memory. Without container resource limits, the OOM killer picks victims based on an internal scoring algorithm — and it doesn’t care that your database is more important than your log shipper.
How OOM Priority Works in Docker
Docker lets you adjust the OOM killer’s priority for each container:
# Make this container LESS likely to be killed (lower priority)
docker run -d --oom-score-adj=-500 --name critical_db postgres:16
# Make this container MORE likely to be killed (higher priority)
docker run -d --oom-score-adj=500 --name expendable_worker my_worker
# Disable OOM killer entirely for this container (dangerous!)
docker run -d --oom-kill-disable --memory=2g --name untouchable_app my_app
Warning about --oom-kill-disable: If you use this without a memory limit, you’re creating a container that can eat all your RAM and cannot be stopped by the kernel. This is how you turn a memory leak into a full server crash. Always pair --oom-kill-disable with --memory.
What Happens When OOM Fires
When a container gets OOM-killed, you’ll see it in the container’s exit code:
docker inspect --format='{{.State.OOMKilled}}' my_container
# Output: true
docker inspect --format='{{.State.ExitCode}}' my_container
# Output: 137 (128 + 9, where 9 = SIGKILL)
Exit code 137 is Docker’s way of saying “the kernel killed your process.” If you’re seeing this in production, you need to either raise the memory limit or fix the memory leak. Probably both. Definitely the leak first.
CPU Limits: Shares, Quotas, and Periods
CPU limits are less immediately catastrophic than memory limits — a CPU-starved container just runs slowly rather than getting killed. But slow can cascade into timeouts, which cascade into retries, which cascade into more CPU usage. It’s turtles all the way down.
CPU Shares (Relative Weight)
CPU shares are a relative weighting system. They don’t guarantee a specific amount of CPU; they determine how CPU time is divided when there’s contention.
# Default share weight is 1024
docker run -d --cpu-shares=512 --name low_priority my_worker
docker run -d --cpu-shares=2048 --name high_priority my_api
With these settings:
- When both containers want CPU at the same time,
high_prioritygets 4x more CPU time thanlow_priority(2048 vs 512) - When only one container wants CPU, it gets all available CPU regardless of share weight
- Shares only matter during contention
This is important: CPU shares do nothing when the system isn’t busy. They’re purely a conflict-resolution mechanism. Think of it like priority boarding on a plane — when the plane is half empty, everyone gets on fine regardless of their boarding group.
CPU Quotas (Hard Limits)
If you want actual hard limits on CPU usage, you need CPU quotas. These use the CFS (Completely Fair Scheduler) to limit how much CPU time a container gets within a specific period.
# Limit to 1.5 CPU cores
docker run -d --cpus=1.5 --name my_app my_image
# Same thing, but using the period/quota syntax
docker run -d --cpu-period=100000 --cpu-quota=150000 --name my_app my_image
The --cpus flag is the friendly version. Under the hood, Docker translates it to period/quota values:
--cpus=1.5means “in every 100ms period, this container can use 150ms of CPU time across all cores”--cpus=0.5means “use at most half a CPU core”--cpus=4means “use at most 4 cores”
Unlike CPU shares, quotas are enforced even when the system is idle. If you set --cpus=1 and the system has 8 idle cores, your container still only gets 1 core. This prevents noisy-neighbor problems but can leave resources unused.
CPU Pinning
For performance-sensitive workloads, you can pin containers to specific CPU cores:
# Run only on cores 0 and 1
docker run -d --cpuset-cpus="0,1" --name latency_sensitive my_app
# Run on cores 0 through 3
docker run -d --cpuset-cpus="0-3" --name compute_heavy my_ml_model
CPU pinning is useful when you want to avoid cache thrashing or when you need predictable latency. It’s mostly relevant for real-time or high-frequency workloads. For your typical web app, --cpus is plenty.
Swap Limits: The Safety Net’s Safety Net
Swap is what the kernel uses when physical RAM runs out — it moves less-used memory pages to disk. It’s much slower than RAM but prevents immediate OOM situations.
# Allow 512 MB RAM + 512 MB swap (1 GB total)
docker run -d --memory=512m --memory-swap=1g --name my_app my_image
# Allow 512 MB RAM + NO swap at all
docker run -d --memory=512m --memory-swap=512m --name my_app my_image
# Allow 512 MB RAM + unlimited swap (not recommended)
docker run -d --memory=512m --memory-swap=-1 --name my_app my_image
The math is a little counterintuitive: --memory-swap is the total of RAM + swap, not just the swap amount. So --memory=512m --memory-swap=1g means 512 MB RAM and 512 MB swap.
Pro tip: For databases, set --memory-swap equal to --memory to disable swap entirely. Databases on swap perform approximately as well as a screen door on a submarine.
# PostgreSQL: NO swap, ever
docker run -d \
--memory=2g \
--memory-swap=2g \
--name postgres postgres:16
PIDs Limit: Preventing Fork Bombs
A less-discussed but important limit is the PIDs limit, which caps how many processes a container can create. Without this, a container with a fork bomb (or a badly written recursive script) can exhaust the kernel’s PID space and take down the entire host.
# Limit to 200 processes
docker run -d --pids-limit=200 --name my_app my_image
For most applications, 200-500 PIDs is more than enough. A typical web server runs 10-50 processes. If you’re hitting a PID limit of 200, something has gone very wrong and you probably want the container to be stopped.
Docker Compose: The deploy.resources Block
If you’re using Docker Compose (and you should be), resource limits live under the deploy.resources section. This syntax works with both docker compose up (as of Compose v2) and Docker Swarm deployments.
services:
postgres:
image: postgres:16
deploy:
resources:
limits:
cpus: "2.0"
memory: 2G
pids: 200
reservations:
cpus: "0.5"
memory: 512M
environment:
POSTGRES_PASSWORD: changeme
redis:
image: redis:7-alpine
deploy:
resources:
limits:
cpus: "1.0"
memory: 256M
pids: 100
reservations:
cpus: "0.25"
memory: 64M
api:
build: ./api
deploy:
resources:
limits:
cpus: "1.5"
memory: 1G
pids: 300
reservations:
cpus: "0.5"
memory: 256M
depends_on:
- postgres
- redis
worker:
build: ./worker
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
pids: 100
reservations:
cpus: "0.25"
memory: 128M
A few things to note:
- CPU values must be strings (quoted) in Compose files
- Memory supports suffixes:
B,K,M,G - Reservations are soft limits; limits are hard ceilings
- PIDs limit in Compose requires Docker Engine 20.10+
The Old Way vs The New Way
You might see older Compose files using the v2 syntax with mem_limit and cpus directly on the service:
# OLD v2 syntax (still works but deprecated)
services:
app:
image: my_app
mem_limit: 512m
cpus: 1.5
# NEW v3+ syntax (use this)
services:
app:
image: my_app
deploy:
resources:
limits:
memory: 512M
cpus: "1.5"
Stick with deploy.resources. It’s the current standard and works across both standalone Compose and Swarm mode.
Cgroups v1 vs v2: The Plumbing Underneath
Docker resource limits aren’t magic — they’re implemented through Linux cgroups (control groups). And there are two versions, because of course there are.
Cgroups v1 (Legacy)
Cgroups v1 has been around since 2008. It uses separate hierarchies for each resource type:
/sys/fs/cgroup/memory/docker/<container_id>/memory.limit_in_bytes
/sys/fs/cgroup/cpu/docker/<container_id>/cpu.cfs_quota_us
/sys/fs/cgroup/pids/docker/<container_id>/pids.max
Each resource controller (memory, CPU, PIDs) is its own directory tree. It works, but it’s messy and has some weird edge cases around nested groups and resource accounting.
Cgroups v2 (Unified)
Cgroups v2, available since Linux kernel 4.5 and default in most modern distros (Ubuntu 22.04+, Fedora 31+, Debian 11+), uses a unified hierarchy:
/sys/fs/cgroup/system.slice/docker-<container_id>.scope/memory.max
/sys/fs/cgroup/system.slice/docker-<container_id>.scope/cpu.max
/sys/fs/cgroup/system.slice/docker-<container_id>.scope/pids.max
Everything lives under one tree. It’s cleaner, more consistent, and supports features that v1 doesn’t, like unified memory+swap accounting and pressure stall information (PSI).
How to Check Which Version You’re Running
# Check cgroup version
stat -fc %T /sys/fs/cgroup/
# "cgroup2fs" = v2
# "tmpfs" = v1
# Or check Docker's info
docker info | grep -i cgroup
# Cgroup Driver: systemd
# Cgroup Version: 2
Does It Matter for You?
Mostly no. Docker abstracts the differences away. The --memory, --cpus, and other flags work the same regardless of cgroup version. But there are a few cases where it matters:
- Rootless Docker: Requires cgroups v2 for full resource limit support
- Memory+swap accounting: More accurate on v2
- PSI (Pressure Stall Information): Only on v2; lets you detect when a container is resource-starved before it hits limits
- Nested containers (Docker-in-Docker): Much cleaner on v2
If you’re running a modern distro, you’re probably already on v2 and don’t need to think about it.
Real-World Sizing: What to Actually Set
Theory is great, but what numbers should you actually use? Here are battle-tested starting points for common services. These assume you’re running on a VPS or dedicated server — not a Raspberry Pi and not a 256 GB monster.
PostgreSQL
deploy:
resources:
limits:
cpus: "2.0"
memory: 2G
reservations:
cpus: "0.5"
memory: 1G
PostgreSQL’s shared_buffers should be about 25% of the container’s memory limit. For a 2 GB container, set shared_buffers = 512MB. Also set effective_cache_size to about 75% of the limit (1536 MB). Disable swap — Postgres on swap is a disaster.
Redis
deploy:
resources:
limits:
cpus: "1.0"
memory: 256M
reservations:
cpus: "0.25"
memory: 128M
Redis keeps everything in memory, so the limit should be your maximum dataset size plus ~30% overhead for fragmentation and copy-on-write during persistence. Set maxmemory in Redis config to about 80% of the Docker memory limit to leave room for the process itself.
Node.js / Express API
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M
Node.js has a default heap limit of about 1.5 GB (in V8). Set --max-old-space-size in your Dockerfile’s CMD to about 75% of the Docker memory limit:
CMD ["node", "--max-old-space-size=384", "server.js"]
This gives Node room for the heap while leaving memory for buffers, native code, and child processes.
Nginx (Reverse Proxy)
deploy:
resources:
limits:
cpus: "0.5"
memory: 128M
reservations:
cpus: "0.1"
memory: 32M
Nginx is remarkably lightweight. Unless you’re doing heavy Lua processing or serving massive files from disk cache, 128 MB is generous. If you’re just proxying requests, 64 MB is often enough.
Elasticsearch
deploy:
resources:
limits:
cpus: "4.0"
memory: 4G
reservations:
cpus: "1.0"
memory: 2G
Elasticsearch is the container most likely to eat your entire server if left unchecked. Set the JVM heap to exactly 50% of the container limit (2 GB in this case) using ES_JAVA_OPTS=-Xms2g -Xmx2g. The other 50% is used by Lucene for file system cache. Going above 50% JVM heap actually hurts performance.
Grafana
deploy:
resources:
limits:
cpus: "1.0"
memory: 256M
reservations:
cpus: "0.1"
memory: 64M
Grafana is pretty lightweight unless you’re rendering dozens of dashboards simultaneously. 256 MB handles most setups comfortably.
A Complete Production-Ready Example
Let’s put it all together. Here’s a full Compose file for a typical web application stack with proper resource limits:
services:
traefik:
image: traefik:v3.0
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./traefik:/etc/traefik
deploy:
resources:
limits:
cpus: "0.5"
memory: 128M
pids: 100
reservations:
cpus: "0.1"
memory: 32M
restart: unless-stopped
api:
build: ./api
environment:
- DATABASE_URL=postgres://app:secret@postgres:5432/myapp
- REDIS_URL=redis://redis:6379
deploy:
resources:
limits:
cpus: "1.5"
memory: 1G
pids: 300
reservations:
cpus: "0.5"
memory: 256M
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
restart: unless-stopped
postgres:
image: postgres:16
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
POSTGRES_DB: myapp
deploy:
resources:
limits:
cpus: "2.0"
memory: 2G
pids: 200
reservations:
cpus: "0.5"
memory: 1G
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d myapp"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
redis:
image: redis:7-alpine
command: redis-server --maxmemory 200mb --maxmemory-policy allkeys-lru
volumes:
- redisdata:/data
deploy:
resources:
limits:
cpus: "1.0"
memory: 256M
pids: 100
reservations:
cpus: "0.25"
memory: 64M
restart: unless-stopped
worker:
build: ./worker
environment:
- DATABASE_URL=postgres://app:secret@postgres:5432/myapp
- REDIS_URL=redis://redis:6379
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
pids: 150
reservations:
cpus: "0.25"
memory: 128M
depends_on:
- postgres
- redis
restart: unless-stopped
volumes:
pgdata:
redisdata:
Total resource budget for this stack:
- CPU limits: 6.0 cores (works great on a 4-core box since not all containers peak simultaneously)
- Memory limits: 3.9 GB (comfortably fits on an 8 GB VPS with room for the host OS)
- Memory reservations: 1.5 GB (guaranteed minimum)
You can over-commit CPU limits because CPU is time-sliced — containers take turns. You generally should not over-commit memory limits because memory is not time-sliced — if everyone needs their limit simultaneously, the OOM killer shows up.
Monitoring and Alerting
Setting limits is half the battle. You also need to know when containers are approaching them.
Quick CLI Monitoring
# Live stats for all containers
docker stats
# One-shot stats (great for scripts)
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.CPUPerc}}"
Prometheus + cAdvisor
For production monitoring, use cAdvisor to expose container metrics to Prometheus:
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
deploy:
resources:
limits:
cpus: "0.5"
memory: 256M
ports:
- "8080:8080"
Key metrics to alert on:
container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85— container using 85%+ of memory limitcontainer_cpu_usage_seconds_totalexceeding quota consistentlycontainer_oom_events_total— any OOM kills should trigger an alert
The Docker Events Stream
Docker also publishes events when containers get OOM-killed:
# Watch for OOM events in real time
docker events --filter event=oom
Pipe this into your alerting system. If a container gets OOM-killed once, it’s probably a spike. If it gets OOM-killed repeatedly, you have a memory leak.
Common Mistakes (And How to Avoid Them)
1. Setting limits too tight. If your app normally uses 400 MB, don’t set the limit to 410 MB. Give it headroom — 512 MB or even 768 MB. Memory usage spikes during garbage collection, connection bursts, and report generation.
2. Forgetting that child processes count. A Node.js app that spawns worker threads or a Python app with multiprocessing — all those child processes share the container’s memory limit. Size accordingly.
3. Ignoring swap. If you set a 512 MB memory limit but don’t configure swap, the container gets 512 MB of RAM plus whatever swap the host has. On a server with 16 GB of swap, that’s not really a limit.
4. Over-committing memory. It’s fine to over-commit CPU (the scheduler handles it gracefully). Over-committing memory is asking for OOM kills. Add up all your containers’ memory limits and make sure the total fits within your available RAM.
5. Not setting limits at all. The biggest mistake. Even loose limits (2 GB for an app that uses 200 MB) are infinitely better than no limits. They’re a safety net, not a straitjacket.
6. Copy-pasting limits without profiling. The sizing examples above are starting points, not gospel. Run your actual workload, watch docker stats for a week, and adjust based on real data.
The Golden Rule
Here it is, the one rule to take away from this entire article:
Every container in production should have a memory limit. No exceptions.
CPU limits are nice. PID limits are smart. Swap configuration is prudent. But the memory limit is the one that prevents your 3 AM wake-up call. Set it first. Set it always. Set it even if you’re not sure what value to use — a generous guess is better than no limit at all.
Your server (and your sleep schedule) will thank you.