Sysctl Tuning: The Linux Kernel Settings Nobody Told You About

Your Linux server is running with a kernel tuned for a desktop from 2004. Not metaphorically: literally. The default sysctl values ship tuned for general workloads: modest network buffers, conservative memory policies, and connection limits that made sense when 1 Gbps was fast.

Then you bolt a self-hosted Docker swarm, a Prometheus scraper, or a Nextcloud instance on top, and the kernel starts fighting you. Dropped packets. Connection timeouts. Memory thrashing. Your 2 AM self tweets “why is my Compose file so slow?” and the answer is: you didn’t tell the kernel what job you actually hired it for. Kernel tuning isn’t magic. It’s not a sacred art that only ops veterans understand: it’s reading five sysctl parameters, understanding what each one does, and setting them to values that match how your server actually works.

What Is sysctl and How Do You Use It?

sysctl is the interface to Linux kernel parameters. They live in /proc/sys/ as a filesystem (which is wild, honestly), but you manage them via the sysctl command or by editing /etc/sysctl.d/ config files.

Three ways to apply settings:

# Temporary (lost on reboot)
sysctl -w net.core.somaxconn=4096

# Persistent approach: drop a file in /etc/sysctl.d/
# Edit /etc/sysctl.d/99-tuning.conf, then reload:
sysctl -p /etc/sysctl.d/99-tuning.conf

# Or just check what's set now:
sysctl net.core.somaxconn

The /etc/sysctl.d/ approach is better for production, files get processed in alphabetical order at boot, and it’s self-documenting.

Network Performance: Buffers and Congestion Control

Most performance problems on self-hosted infrastructure are network problems. The kernel’s default TCP buffers are tiny.

TCP read/write memory buffers:

# net.ipv4.tcp_rmem = min default max
# Default: 4096 87380 6291456 (4KB / 85KB / 6MB)
# Better for servers with gigabit links:
net.ipv4.tcp_rmem = 4096 87380 134217728

# net.ipv4.tcp_wmem = min default max
net.ipv4.tcp_wmem = 4096 65536 134217728

What this does: allows TCP connections to buffer up to 128 MB of data in kernel memory, preventing artificial bottlenecks. On a 1 Gbps link pulling data from a database or S3, this prevents artificial bottlenecks. Your Prometheus scraper won’t timeout waiting for metrics.

BBR congestion control replaces the ancient Reno algorithm with something from this century:

net.ipv4.tcp_congestion_control = bbr

BBR measures bandwidth and RTT, not just packet loss: it’s the reason Google’s infrastructure doesn’t melt. Available since kernel 4.9.

TCP Fast Open:

net.ipv4.tcp_fastopen = 3

Lets clients send data in the SYN packet. Reduces handshake overhead. Good for high-connection-rate services (Prometheus, API gateways).

Connection Handling: Backlog and TIME_WAIT Recycling

A Docker host running 50 services trying to connect to each other hits the kernel’s listen queue limits fast.

Listen queue depth:

# Max pending connections per socket
net.core.somaxconn = 4096
# Also bump this — kernel's backlog before somaxconn applies
net.ipv4.tcp_max_syn_backlog = 4096

If your service logs show “connection refused” but the port is listening, the listen queue is full. Bump these.

TIME_WAIT recycling: after a connection closes, Linux holds the socket in TIME_WAIT for 60 seconds (RFC compliance). On a high-churn setup (load balancer, API gateway), you’ll exhaust ephemeral ports:

# Allow reuse of TIME_WAIT sockets for new connections
net.ipv4.tcp_tw_reuse = 1

This is safe in modern kernels and saves you from port exhaustion on high-connection setups.

Memory Management: Swappiness, Dirty Buffers, and Overcommit

Don’t blindly set vm.swappiness = 10 because you read it on a forum. The default is 60, which means “prefer keeping fs caches in memory over swapping application memory.” That’s often right.

However, if you’re running a Nextcloud + database on a single machine with 8 GB RAM, swap is your last resort:

# 0 = avoid swap unless desperate, 100 = prefer swap
vm.swappiness = 10

# How much dirty data before writes block synchronously (default 20%)
vm.dirty_ratio = 15

# When background flusher threads start writing (default 10%)
vm.dirty_background_ratio = 5

Lower dirty_ratio = more frequent disk writes = less chance of a huge flush stall. On NVMe, aggressive is fine. On spinning disk, be gentler.

Overcommit:

# 1 = allow overcommit (default)
# 0 = strict memory accounting
vm.overcommit_memory = 1

Docker loves overcommit. Apps request memory they’ll never use, and the kernel trusts them. Set to 0 only if you have strict memory guarantees and can handle OOM killer being more aggressive.

File Descriptors and Inotify Watches

You have 50 Docker containers, each with a service spamming logs, Prometheus scraping, and file watchers. The kernel’s default limits are 1024 per process and 8192 system-wide watches.

# System-wide file descriptor limit
fs.file-max = 2097152

# Inotify watches per user (crucial for Docker + systemd + Prometheus)
fs.inotify.max_user_watches = 524288
fs.inotify.max_queued_events = 32768

If your app logs complain about “no space left on device” or inotify watch limit, these are the culprit.

Security-Relevant Sysctls

Not just performance: some sysctls are security decisions:

# Docker needs this for container networks
net.ipv4.ip_forward = 1

# Don't send ICMP redirects (you're a server, not a router)
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

# SYN flood protection (SYN cookies)
net.ipv4.tcp_syncookies = 1

# Ignore ICMP redirects (don't update routing table from untrusted sources)
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

A Production-Ready Config

Drop this in /etc/sysctl.d/99-performance.conf:

# ============================================================================
# SumGuy's Sysctl Tuning — Production Self-Hosted Linux
# ============================================================================

# NETWORKING: Buffers, congestion control, TCP tuning
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728

# Congestion control: BBR (available since kernel 4.9)
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_notsent_lowat = 16384

# Connection handling
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fastopen = 3

# MEMORY: Swap policy and dirty buffer tuning
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.overcommit_memory = 1

# FILE SYSTEM: Descriptors and watches (critical for Docker/Prometheus)
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
fs.inotify.max_queued_events = 32768

# SECURITY: IP forwarding, SYN cookies, ICMP hardening
net.ipv4.ip_forward = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

Then reload:

sudo sysctl -p /etc/sysctl.d/99-performance.conf

Verify:

$ sysctl net.core.somaxconn
net.core.somaxconn = 4096

Testing Changes: Before and After

Don’t just set and pray. Measure.

For network throughput, use iperf3 (server and client on two machines):

# Server
iperf3 -s

# Client (before tuning)
iperf3 -c 192.168.1.100 -t 30 -P 4

Record the bandwidth. Apply the tuning. Run again. You should see 10-40% improvement on gigabit links.

For connection handling, watch active connections and the listen queue:

# Real-time connection stats
ss -tupn | wc -l

# Check listen queue depth per socket
ss -ltpn | grep LISTEN

For file descriptors, check before hitting a limit:

# System-wide open file count
cat /proc/sys/fs/file-nr
# Returns: <used> <free> <max>

The Skeptical Question

“Won’t bad values break my system?” Unlikely. The Linux kernel is defensive: bad sysctl values typically just don’t help, they don’t hurt. The worst you’ll get is OOM killer being more aggressive or connections being dropped. Your 2 AM self will appreciate knowing why.

Start with the config above: it’s conservative enough for a 4-core self-hosted box, aggressive enough to stop fighting the kernel.

Then monitor. Measure. Adjust. That’s the art.

Sysctl Tuning: The Linux Kernel Settings Nobody Told You About

What Is sysctl and How Do You Use It?

Network Performance: Buffers and Congestion Control

Connection Handling: Backlog and TIME_WAIT Recycling

Memory Management: Swappiness, Dirty Buffers, and Overcommit

File Descriptors and Inotify Watches

Security-Relevant Sysctls

A Production-Ready Config

Testing Changes: Before and After

The Skeptical Question

Responses from around the web

Discussion

Related Posts

iperf3 + nload: Network Diagnosis

nftables: Modern Linux Firewalling

Sysctl Tuning: The Linux Kernel Knobs That Actually Matter

PostgreSQL + Linux: Kernel Tuning That Actually Matters

Sysctl Tuning: The Linux Kernel Settings Nobody Told You About

What Is sysctl and How Do You Use It?

Network Performance: Buffers and Congestion Control

Connection Handling: Backlog and TIME_WAIT Recycling

Memory Management: Swappiness, Dirty Buffers, and Overcommit

File Descriptors and Inotify Watches

Security-Relevant Sysctls

A Production-Ready Config

Testing Changes: Before and After

The Skeptical Question

Related Reading

Responses from around the web

Discussion

Related Posts

iperf3 + nload: Network Diagnosis

nftables: Modern Linux Firewalling

Sysctl Tuning: The Linux Kernel Knobs That Actually Matter

PostgreSQL + Linux: Kernel Tuning That Actually Matters