Your app is running fine. Memory usage looks normal. Then, without warning, the process dies. No error message. No crash dump. Just gone. You check the logs and find nothing. But if you look at the kernel logs, you’ll see it: your process was killed by the OOM killer.
OOM stands for “Out of Memory.” When Linux runs out of RAM and can’t allocate more, it panics and starts killing processes to free up memory. The question is: which process gets the axe? The answer is surprisingly sophisticated.
How the OOM Killer Works
When the system runs out of memory, the kernel runs a scoring algorithm on every process. It calculates an oom_score for each one based on:
- How much memory the process uses (higher score = more likely to be killed)
- The process’s
oom_score_adjvalue (you can adjust this manually) - Whether the process is essential (some get protected)
The process with the highest score gets killed. This frees up its memory. If the system is still out of memory, the next highest-scoring process gets killed, and so on.
Reading the Evidence
When the OOM killer strikes, it logs to the kernel ring buffer. Check dmesg:
$ dmesg | tail -n 50Out of memory: Kill process 1234 (myapp) score 800 or sacrifice childKilled process 1234 (myapp) total-vm:2097152kB, anon-rss:1048576kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:2048kB oom_score_adj:0There it is. Process 1234 (myapp) was killed because it had the highest oom_score. The kernel tells you:
total-vm: Total virtual memoryanon-rss: Anonymous (heap) memory actually in RAMoom_score_adj: The adjustment value (0 = default)
Understanding oom_score
Check a process’s current score:
$ cat /proc/$(pgrep myapp)/oom_score800The score is calculated as:
oom_score = (total_memory_used / total_system_memory) * 1000 + oom_score_adjA process using 800 MB on a 1 GB system gets a score around 800. If it’s the largest process, it’s the first candidate for death.
Protecting Critical Processes
You can adjust oom_score_adj to protect important processes. Lower scores = less likely to be killed. You can even go negative.
For a Running Process
# Protect your database (make it 10x less killable)$ sudo bash -c "echo -500 > /proc/$(pgrep postgres)/oom_score_adj"
# Verify it worked$ cat /proc/$(pgrep postgres)/oom_score_adj-500
# Check the new score$ cat /proc/$(pgrep postgres)/oom_score100 # Much lower than beforeNegative values make the process much less likely to be killed. -1000 is special — it disables OOM killing for that process entirely.
For a Systemd Service
Set it permanently in the service file:
[Service]OOMScoreAdjust=-900ExecStart=/usr/bin/postgresReload and restart:
sudo systemctl daemon-reloadsudo systemctl restart postgresWhen the OOM Killer Is Triggered
The kernel runs the OOM killer when:
- No free memory left — Allocation fails
- No page cache to drop — Can’t reclaim memory from caches
- No swap space, or swap is full — Can’t push memory to disk
On modern systems with plenty of RAM, this is rare. But on constrained systems (small VMs, containers), it happens.
Preventing the OOM Killer
1. Add Swap Space
Swap lets the kernel push memory to disk, buying time:
# Create a 4GB swap file$ sudo fallocate -l 4G /swapfile$ sudo chmod 600 /swapfile$ sudo mkswap /swapfile$ sudo swapon /swapfile
# Make it permanent$ echo "/swapfile none swap sw 0 0" | sudo tee -a /etc/fstabSwap is slower than RAM, but it prevents the OOM killer from firing unless you’re really out of memory.
2. Monitor Memory Usage
Don’t let memory creep up unexpectedly:
$ free -h total used free shared buff/cache availableMem: 15Gi 8.5Gi 1.2Gi 256Mi 5.2Gi 6.0GiSwap: 4.0Gi 0.0Gi 4.0Giavailable is what actually matters — it’s free memory + page cache that can be reclaimed. If it’s consistently low, your workload is too big for your system.
3. Limit Container/Process Memory
If you’re running containers or processes with memory limits (cgroups), set them realistically:
# Limit a process to 2GB$ systemd-run --scope -p MemoryLimit=2G ./myappThis is better than letting a process consume unlimited memory and then getting killed by OOM.
Real-World Scenario
You run a database on a 4 GB VM. The database caches data in RAM, gradually consuming memory. Other processes also use memory. Eventually, free memory drops to near zero.
The OOM killer runs. It sees:
- postgres: 3.5 GB (oom_score: 850)
- nginx: 0.2 GB (oom_score: 50)
- your app: 0.1 GB (oom_score: 25)
It kills postgres because it has the highest score. But postgres is critical! Your app crashes when it tries to connect.
Solution: Protect postgres:
$ sudo bash -c "echo -900 > /proc/$(pgrep postgres)/oom_score_adj"Now postgres’s score becomes negative, and nginx gets killed instead (it can be restarted). Critical services survive.
The Warning Signs
Before the OOM killer fires, you’ll usually see:
- System becomes unresponsive (all memory consumed, kernel swapping)
- Load average spikes
- Disk I/O goes crazy (kernel writing to swap)
- One by one, processes get killed
Check dmesg frequently on systems under memory pressure:
$ dmesg | grep -i "out of memory" | tail -n 5Key Takeaway
The OOM killer is a last-resort survival mechanism. It keeps the system alive at the cost of killing processes. On a properly sized system, you should never see it. But if you’re running on constrained hardware (small VMs, shared hosting, containers), understanding OOM is essential.
Monitor memory, add swap, protect critical processes with negative oom_score_adj, and size your workloads realistically. That’s how you avoid mysterious 3 AM process deaths.