It was a Tuesday morning in February when an AWS engineer’s on-call alert fired. Their PostgreSQL cluster, running 1.2 million queries per day on a beefy EC2 instance, had just tanked. Throughput dropped 50%. Query latencies spiked. The database server itself looked fine—CPU wasn’t pinned, memory wasn’t exhausted, disk I/O was normal.
The commit log showed one change: they’d upgraded the host kernel to Linux 7.0 over the weekend.
Not a Postgres version bump. Not a query rewrite. The operating system changed beneath their database, and the database paid the price.
This isn’t some edge case. PostgreSQL’s entire architecture—multi-process model, heavy shared memory, constant context switching—lives and dies by Linux kernel tuning. Most Postgres performance problems aren’t Postgres problems at all. They’re OS problems wearing a database disguise.
Here’s what matters, and why your production database needs you to understand this stuff.
Why Postgres Is Sensitive to Kernel Behavior
PostgreSQL doesn’t use thread pools like MySQL or SQLite. Instead, each connection spawns its own process. That means your 200-connection cluster is 200 separate processes, all sharing a block of kernel memory and fighting for scheduler time.
When the kernel scheduler changes (like the EEVDF algorithm shift in Linux 7.0), Postgres’s context-switching overhead explodes. When memory management gets too aggressive, pages get swapped to disk mid-query. When the I/O scheduler batches requests wrong, SSDs suddenly behave like spinning rust.
The database can’t fix this. Only your kernel can.
Huge Pages: The Biggest Win
If you’re only going to tune one thing, tune this.
Huge pages let Postgres allocate memory in 2MB chunks instead of 4KB pages. This cuts the size of the kernel page table and dramatically reduces TLB (translation lookaside buffer) misses. For a 16GB shared_buffers, that’s the difference between walking through 4 million page table entries or 8,000 huge page entries.
Real-world result: 15–30% throughput improvement on heavy workloads.
Calculate Your Huge Pages
# Check current shared_buffers (from postgresql.conf)sudo -u postgres psql -c "SHOW shared_buffers;"
# Calculate pages needed: (shared_buffers in bytes) / (2MB = 2097152)# Example: 16GB shared_buffers = 16384 MB# 16384 MB / 2 MB = 8192 huge pagesThen set it:
vm.nr_hugepages = 8192And in postgresql.conf:
huge_pages = tryPostgres will use huge pages if available; fall back gracefully if not.
Shared Memory: The Foundation
Postgres stores its entire buffer pool, WAL buffers, and lock tables in shared memory. If your kernel won’t let it allocate enough, Postgres won’t even start.
# Check limitssysctl kernel.shmmax kernel.shmall
# For 32GB total shared memory:# shmmax should be at least 32GB in bytes# shmall should be at least 32GB / 4096 pagesSet these large:
kernel.shmmax = 34359738368 # 32GB in byteskernel.shmall = 8388608 # 32GB / 4KB pagesPostgreSQL automatically calculates shared_buffers + max_connections * work_mem + overhead, but always give the kernel more than you think you’ll need. Postgres is conservative in its allocation; the kernel should be generous.
Kill Transparent Huge Pages (THP)
Transparent huge pages sound great. The kernel automatically promotes 4KB pages to 2MB pages without your asking. Free performance!
Except Postgres can’t predict when THP will kick in. When it does—especially during heavy query scans—the kernel has to stop Postgres processes to compact pages. Latency spikes 10–100ms on a quiet morning.
Disable THP entirely:
echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabledecho madvise | sudo tee /sys/kernel/mm/transparent_hugepage/defragMake it stick:
GRUB_CMDLINE_LINUX_DEFAULT="... transparent_hugepage=madvise"
sudo update-grubsudo rebootMemory Overcommit and Swappiness
PostgreSQL does not play well with the OOM killer. If the kernel starts swapping Postgres pages to disk mid-query, your database is now serving cold storage. Query time goes from 50ms to 5 seconds.
vm.overcommit_memory = 2 # No overcommit; OOM only if truly out of RAMvm.overcommit_ratio = 100 # Use all available RAM
vm.swappiness = 5 # Don't swap unless desperateOn a dedicated database server, swap should be nearly zero. On a shared system, set it to 5–10. Never go above 20 for Postgres.
I/O Scheduler: SSDs Prefer None
The default I/O scheduler (bfq, cfq, mq-deadline) was designed for spinning disks. They batch requests to minimize seeking. SSDs have no seek time. Batching just adds latency.
For SSDs, use none (the null scheduler). For NVMe, mq-deadline is fine but none is better.
Check current:
cat /sys/block/nvme0n1/queue/schedulerSet per-device:
echo none | sudo tee /sys/block/nvme0n1/queue/schedulerMake it permanent:
ACTION=="add|change", KERNEL=="nvme0n1", ATTR{queue/scheduler}="none"Network Tuning for Long Connections
PostgreSQL connections can be idle for seconds (think web apps). The kernel needs to know not to drop them.
net.core.somaxconn = 2048net.ipv4.tcp_max_syn_backlog = 2048
# TCP keepalive for idle connectionsnet.ipv4.tcp_keepalives_intvl = 60net.ipv4.tcp_keepalives_probes = 5And in postgresql.conf:
tcp_keepalives_idle = 600tcp_keepalives_interval = 30The Linux 7.0 Scheduler Issue
The shift from CFS (Completely Fair Scheduler) to EEVDF broke Postgres’s multi-process balance. CFS treated all processes equally; EEVDF prioritizes by time slices, causing uneven load distribution across parallel queries.
If you hit this in late 2026, apply the kernel workaround or pin to Linux 6.x LTS until your Postgres version adds EEVDF awareness (planned for 18.0).
Complete /etc/sysctl.d/99-postgres.conf
# SumGuy's PostgreSQL Kernel Tuning# Apply with: sudo sysctl -p /etc/sysctl.d/99-postgres.conf
# Huge pages (calculate based on your shared_buffers)vm.nr_hugepages = 8192
# Shared memory (for 32GB systems)kernel.shmmax = 34359738368kernel.shmall = 8388608
# Memory managementvm.overcommit_memory = 2vm.overcommit_ratio = 100vm.swappiness = 5
# Networknet.core.somaxconn = 2048net.ipv4.tcp_max_syn_backlog = 2048net.ipv4.tcp_keepalives_intvl = 60net.ipv4.tcp_keepalives_probes = 5
# Filesystemfs.file-max = 2097152fs.aio-max-nr = 1048576Apply it:
sudo sysctl -p /etc/sysctl.d/99-postgres.confWhat This Fixes
Properly tuned, you’ll see:
- 15–30% throughput gain from huge pages alone
- Flat tail latencies (no surprise spikes from swapping or THP)
- Stable query times even under sustained load
- No OOM kills of random Postgres processes
The AWS engineer who hit the Linux 7.0 wall? These settings brought them back to baseline. The kernel had changed, but the database could adapt.
Your 2 AM self will thank you for getting this right before something breaks.