The 3am Incident You’ve Either Had or Will Have
It goes like this: you wake up to alerts. Or you don’t wake up, and you find out in the morning. Either way, the server is unresponsive, the OOM killer has been running amok through your process table, and one service — you know which one — has consumed every available megabyte of RAM and taken everything else with it.
Node.js is a frequent offender. So is Java with no heap limit set. Elasticsearch out of the box will attempt to consume everything it can touch. A misconfigured backup job reading an unexpectedly large dataset. A memory leak in a long-running service that nobody caught in development.
The frustrating part: Linux has had the tools to prevent this for decades. ulimit for per-process limits. Cgroups for hierarchical resource control. Systemd for integrating both into service management. These tools are present, documented, and largely ignored until something goes wrong.
ulimit: Per-Process Resource Limits
ulimit is a shell builtin that sets resource limits for the current shell and all processes it spawns. These limits are enforced by the kernel.
# See all current limitsulimit -a
# Common limits:ulimit -n # Open file descriptorsulimit -u # Max user processesulimit -m # Max memory size (in KB)ulimit -v # Virtual memory (in KB)ulimit -s # Stack sizeulimit -t # CPU time (seconds)Hard vs. Soft Limits
Every limit has two values:
- Soft limit: The current active limit. A process can raise this up to the hard limit.
- Hard limit: The ceiling. Only root can raise the hard limit.
# See soft limitsulimit -S -a
# See hard limitsulimit -H -a
# Set soft limit for file descriptorsulimit -Sn 65536
# Set both to same valueulimit -n 65536Why File Descriptors Matter So Much
The default open file descriptor limit (nofile) is often 1024 on older systems. For a busy web server or database, this is laughably small. Each network connection is a file descriptor. A server handling 5,000 concurrent connections needs at least 5,000 file descriptors, plus headroom for actual files.
# Current limits for running process (PID 1234)cat /proc/1234/limits
# See current fd usagels -la /proc/1234/fd | wc -l
# System-wide: see open fd countcat /proc/sys/fs/file-nr# Returns: [currently-open] [0] [max-allowed]/etc/security/limits.conf: Persistent Limits
Shell ulimit commands don’t persist across reboots and don’t apply to services. For persistent user/process limits:
# Format: <domain> <type> <item> <value>
# Specific user - high file descriptor limitmyapp-user soft nofile 65536myapp-user hard nofile 65536
# Specific user - memory limit (in KB, so 4GB)myapp-user soft as 4194304myapp-user hard as 4194304
# All users - max processes* soft nproc 1024* hard nproc 2048
# Wildcard for group@developers soft nofile 32768Drop-in files in /etc/security/limits.d/ are also processed (and won’t get overwritten by package upgrades):
myapp-user soft nofile 65536myapp-user hard nofile 65536myapp-user soft nproc 512myapp-user hard nproc 512Important: limits.conf is processed by PAM’s pam_limits module, which means it applies to login sessions. It does not automatically apply to systemd services — those need separate configuration.
systemd Service Limits
For services managed by systemd, use directives in the unit file:
[Service]ExecStart=/usr/bin/myappUser=myapp-user
# File descriptor limitLimitNOFILE=65536
# Max processesLimitNPROC=512
# Core dump size (0 = no core dumps)LimitCORE=0
# Max memory locked into RAMLimitMEMLOCK=512MAvailable Limit* directives mirror ulimit options. After editing:
systemctl daemon-reloadsystemctl restart myapp
# Verify the limitscat /proc/$(systemctl show -p MainPID myapp | cut -d= -f2)/limitsCgroups v2: The Modern Approach
While ulimit limits individual processes, cgroups (control groups) provide hierarchical resource management — you can assign groups of processes to cgroups and set collective limits on CPU, memory, I/O, and more.
Linux 4.5+ supports cgroups v2 (the unified hierarchy). Most modern distributions use it by default:
# Check if cgroups v2 is activemount | grep cgroup2# cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
# Or check kernel version (v2 is default on kernel 5.x+)cat /proc/filesystems | grep cgroupThe Unified Hierarchy
In cgroups v1, different resources were in separate subsystem trees (memory controller, CPU controller, etc.) — a process could be in different groups for different resources. cgroups v2 uses a single unified hierarchy. All controllers apply to the same group.
# The cgroup filesystemls /sys/fs/cgroup/
# Your shell's cgroupcat /proc/self/cgroup# 0::/user.slice/user-1000.slice/session-1.scopeCPU Limits
# Create a cgroupmkdir /sys/fs/cgroup/myapp
# Set CPU weight (100-10000, default 100)echo 50 > /sys/fs/cgroup/myapp/cpu.weight
# Set CPU quota (in microseconds per period)# 200000 us of CPU time per 1000000 us period = 20% of one CPUecho "200000 1000000" > /sys/fs/cgroup/myapp/cpu.max
# Move a process into the cgroupecho $PID > /sys/fs/cgroup/myapp/cgroup.procsMemory Limits
# Hard memory limit: kills process if exceededecho 512M > /sys/fs/cgroup/myapp/memory.max
# Soft limit: memory reclamation suggestionecho 256M > /sys/fs/cgroup/myapp/memory.high
# See memory usagecat /sys/fs/cgroup/myapp/memory.current
# Memory events (OOM kills, etc.)cat /sys/fs/cgroup/myapp/memory.eventssystemd Slices, Scopes, and Service Resource Control
Directly manipulating cgroup files is low-level. Systemd abstracts this cleanly.
Systemd’s Cgroup Hierarchy
system.slice ← All system services├── ssh.service├── nginx.service└── myapp.service
user.slice ← All user sessions└── user-1000.slice └── session-1.scope
machine.slice ← VMs and containersSetting Resource Limits in Service Units
[Service]ExecStart=/usr/bin/myappUser=myapp-user
# CPUCPUQuota=20% # Max 20% of one CPUCPUWeight=50 # Relative weight (default 100)
# MemoryMemoryMax=512M # Hard limit — OOM kill if exceededMemoryHigh=384M # Soft limit — throttle above thisMemorySwapMax=0 # No swap for this service
# IOIOWeight=50 # Relative IO weightIOReadBandwidthMax=/dev/sda 50M # 50MB/s read maxIOWriteBandwidthMax=/dev/sda 25M # 25MB/s write max
# Tasks (processes + threads)TasksMax=256Creating Custom Slices
For grouping related services:
[Unit]Description=Web Application ServicesBefore=slices.target
[Slice]CPUQuota=60%MemoryMax=2G[Unit]Description=My AppAfter=network.target
[Service]Slice=webapp.slice # Put this service in the webapp sliceExecStart=/usr/bin/myappNow all services in webapp.slice collectively can’t use more than 60% CPU or 2GB RAM.
# View resource usage by slicesystemd-cgtop
# Detailed cgroup informationsystemctl status webapp.sliceDocker and Cgroups
Docker uses cgroups under the hood for container resource limits. When you set memory or CPU limits in Docker, it creates cgroup entries:
# Run container with resource limitsdocker run \ --memory="512m" \ --memory-swap="512m" \ --cpus="0.5" \ --cpu-shares=512 \ nginx
# See the container's cgroupdocker inspect CONTAINER_ID | jq ".[0].HostConfig | {Memory, CpuShares, NanoCpus}"
# Find the cgroup directlyCGROUP_PATH=$(docker inspect CONTAINER_ID --format "{{.Id}}")cat /sys/fs/cgroup/system.slice/docker-${CGROUP_PATH}.scope/memory.maxIn docker-compose.yml:
services: webapp: image: myapp deploy: resources: limits: cpus: "0.50" memory: 512M reservations: cpus: "0.25" memory: 256MNote: deploy.resources in docker-compose only applies when using Docker Swarm mode. For single-host compose, use mem_limit and cpus:
services: webapp: image: myapp mem_limit: 512m cpus: 0.5Practical Example: Taming Node.js
[Unit]Description=My Node.js ApplicationAfter=network.target
[Service]Type=simpleUser=nodejsWorkingDirectory=/opt/myappExecStart=/usr/bin/node /opt/myapp/server.js
# Node.js specific: set heap limit before systemd would kick inEnvironment="NODE_OPTIONS=--max-old-space-size=1024"
# systemd resource limitsMemoryMax=1536M # 1.5GB hard limitMemoryHigh=1024M # Throttle at 1GBCPUQuota=50%TasksMax=256
# Restart on failure, but not if it fails too fastRestart=on-failureRestartSec=10sStartLimitIntervalSec=60sStartLimitBurst=3
# Security hardening (bonus)NoNewPrivileges=trueProtectSystem=strictPrivateTmp=true
[Install]WantedBy=multi-user.targetThis ensures Node.js can’t eat more than 1.5GB of RAM. If it tries, it gets OOM-killed and restarted. The NODE_OPTIONS heap limit gives Node.js its own garbage collection target before the kernel kills it, which is friendlier.
Monitoring Resource Limits in Action
# Real-time cgroup resource usagesystemd-cgtop -d 2 # Update every 2 seconds
# Specific servicesystemctl status myapp.service
# Memory events for a cgroup (OOM kills, throttling)journalctl -u myapp.service | grep -i "memory\|oom\|killed"
# Kernel OOM killer messagesdmesg | grep -i "oom\|killed process"
# Current memory usagecat /sys/fs/cgroup/system.slice/myapp.service/memory.currentcat /sys/fs/cgroup/system.slice/myapp.service/memory.eventsResource limits are the infrastructure equivalent of putting a fence around something that’s caused trouble before. You don’t need to perfectly understand cgroups v2’s unified hierarchy on day one — start with systemd’s MemoryMax and CPUQuota in your service units. That covers 80% of the problem with 20% of the complexity. Add ulimit configuration in limits.conf for non-systemd processes. Revisit slices when you want to group multiple services under collective limits. Save the raw cgroup manipulation for when you’re curious or when systemd doesn’t expose what you need.
Your 3am incidents don’t need to be dramatic. They can just quietly hit a limit, restart, and get logged. That’s better than taking the whole server down.