Skip to content
Go back

Sysctl Tuning: The Linux Kernel Settings Nobody Told You About

By SumGuy 7 min read
Sysctl Tuning: The Linux Kernel Settings Nobody Told You About

Your Linux server is running with a kernel tuned for a desktop from 2004. Not metaphorically — literally. The default sysctl values ship tuned for general workloads: modest network buffers, conservative memory policies, and connection limits that made sense when 1 Gbps was fast.

Then you bolt a self-hosted Docker swarm, a Prometheus scraper, or a Nextcloud instance on top, and the kernel starts fighting you. Dropped packets. Connection timeouts. Memory thrashing. Your 2 AM self tweets “why is my Compose file so slow?” and the answer is: you didn’t tell the kernel what job you actually hired it for.

Here’s the thing: kernel tuning isn’t magic. It’s not a sacred art that only ops veterans understand. It’s reading five sysctl parameters, understanding what each one does, and setting them to values that match how your server actually works.

What Is sysctl and How Do You Use It?

sysctl is the interface to Linux kernel parameters. They live in /proc/sys/ as a filesystem (which is wild, honestly), but you manage them via the sysctl command or by editing /etc/sysctl.d/ config files.

Three ways to apply settings:

Terminal window
# Temporary (lost on reboot)
sysctl -w net.core.somaxconn=4096
# Persistent approach: drop a file in /etc/sysctl.d/
# Edit /etc/sysctl.d/99-tuning.conf, then reload:
sysctl -p /etc/sysctl.d/99-tuning.conf
# Or just check what's set now:
sysctl net.core.somaxconn

The /etc/sysctl.d/ approach is better for production — files get processed in alphabetical order at boot, and it’s self-documenting.

Network Performance: Buffers and Congestion Control

Most performance problems on self-hosted infrastructure are network problems. The kernel’s default TCP buffers are tiny.

TCP read/write memory buffers:

/etc/sysctl.d/99-tuning.conf
# net.ipv4.tcp_rmem = min default max
# Default: 4096 87380 6291456 (4KB / 85KB / 6MB)
# Better for servers with gigabit links:
net.ipv4.tcp_rmem = 4096 87380 134217728
# net.ipv4.tcp_wmem = min default max
net.ipv4.tcp_wmem = 4096 65536 134217728

What this does: allows TCP connections to buffer up to 128 MB of data in kernel memory. On a 1 Gbps link pulling data from a database or S3, this prevents artificial bottlenecks. Your Prometheus scraper won’t timeout waiting for metrics.

BBR congestion control — replaces the ancient Reno algorithm with something from this century:

/etc/sysctl.d/99-tuning.conf
net.ipv4.tcp_congestion_control = bbr

BBR measures bandwidth and RTT, not just packet loss. It’s the reason Google’s infrastructure doesn’t melt. Supported on all modern kernels (5.18+).

TCP Fast Open:

/etc/sysctl.d/99-tuning.conf
net.ipv4.tcp_fastopen = 3

Lets clients send data in the SYN packet. Reduces handshake overhead. Good for high-connection-rate services (Prometheus, API gateways).

Connection Handling: Backlog and TIME_WAIT Recycling

A Docker host running 50 services trying to connect to each other hits the kernel’s listen queue limits fast.

Listen queue depth:

/etc/sysctl.d/99-tuning.conf
# Max pending connections per socket
net.core.somaxconn = 4096
# Also bump this — kernel's backlog before somaxconn applies
net.ipv4.tcp_max_syn_backlog = 4096

If your service logs show “connection refused” but the port is listening, the listen queue is full. Bump these.

TIME_WAIT recycling — after a connection closes, Linux holds the socket in TIME_WAIT for 60 seconds (RFC compliance). On a high-churn setup (load balancer, API gateway), you’ll exhaust ephemeral ports:

/etc/sysctl.d/99-tuning.conf
# Allow reuse of TIME_WAIT sockets for new connections
net.ipv4.tcp_tw_reuse = 1

This is safe in modern kernels and saves you from port exhaustion on high-connection setups.

Memory Management: Swappiness, Dirty Buffers, and Overcommit

Don’t blindly set vm.swappiness = 10 because you read it on a forum. The default is 60, which means “prefer keeping fs caches in memory over swapping application memory.” That’s often right.

However, if you’re running a Nextcloud + database on a single machine with 8 GB RAM, swap is your last resort:

/etc/sysctl.d/99-tuning.conf
# 0 = avoid swap unless desperate, 100 = prefer swap
vm.swappiness = 10
# How much dirty data before pdflush kicks in (default 20%)
vm.dirty_ratio = 15
# When background flush starts (default 10%)
vm.dirty_background_ratio = 5

Lower dirty_ratio = more frequent disk writes = less chance of a huge flush stall. On NVMe, aggressive is fine. On spinning disk, be gentler.

Overcommit:

/etc/sysctl.d/99-tuning.conf
# 1 = allow overcommit (default)
# 0 = strict memory accounting
vm.overcommit_memory = 1

Docker loves overcommit. Apps request memory they’ll never use, and the kernel trusts them. Set to 0 only if you have strict memory guarantees and can handle OOM killer being more aggressive.

File Descriptors and Inotify Watches

You have 50 Docker containers, each with a service spamming logs, Prometheus scraping, and file watchers. The kernel’s default limits are 1024 per process and 8192 system-wide watches.

/etc/sysctl.d/99-tuning.conf
# System-wide file descriptor limit
fs.file-max = 2097152
# Inotify watches per user (crucial for Docker + systemd + Prometheus)
fs.inotify.max_user_watches = 524288
fs.inotify.max_queued_events = 32768

If your app logs complain about “no space left on device” or inotify watch limit, these are the culprit.

Security-Relevant Sysctls

Not just performance — some sysctls are security decisions:

/etc/sysctl.d/99-tuning.conf
# Docker needs this for container networks
net.ipv4.ip_forward = 1
# Don't respond to ICMP redirects (old attack vector)
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
# SYN flood protection (SYN cookies)
net.ipv4.tcp_syncookies = 1
# Ignore ICMP redirects (don't update routing table from untrusted sources)
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

A Production-Ready Config

Drop this in /etc/sysctl.d/99-performance.conf:

/etc/sysctl.d/99-performance.conf
# ============================================================================
# SumGuy's Sysctl Tuning — Production Self-Hosted Linux
# ============================================================================
# NETWORKING: Buffers, congestion control, TCP tuning
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
# Congestion control: BBR (needs kernel 5.18+)
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_notsent_lowat = 16384
# Connection handling
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fastopen = 3
# MEMORY: Swap policy and dirty buffer tuning
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.overcommit_memory = 1
# FILE SYSTEM: Descriptors and watches (critical for Docker/Prometheus)
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
fs.inotify.max_queued_events = 32768
# SECURITY: IP forwarding, SYN cookies, ICMP hardening
net.ipv4.ip_forward = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

Then reload:

Terminal window
sudo sysctl -p /etc/sysctl.d/99-performance.conf

Verify:

Terminal window
$ sysctl net.core.somaxconn
net.core.somaxconn = 4096

Testing Changes: Before and After

Don’t just set and pray. Measure.

For network throughput, use iperf3 (server and client on two machines):

Terminal window
# Server
iperf3 -s
# Client (before tuning)
iperf3 -c 192.168.1.100 -t 30 -P 4

Record the bandwidth. Apply the tuning. Run again. You should see 10-40% improvement on gigabit links.

For connection handling, watch active connections and the listen queue:

Terminal window
# Real-time connection stats
ss -tupn | wc -l
# Check listen queue depth per socket
ss -ltpn | grep LISTEN

For file descriptors, check before hitting a limit:

Terminal window
# System-wide open file count
cat /proc/sys/fs/file-nr
# Returns: <used> <free> <max>

The Skeptical Question

“Won’t bad values break my system?” Unlikely. The Linux kernel is defensive — bad sysctl values typically just don’t help, they don’t hurt. The worst you’ll get is OOM killer being more aggressive or connections being dropped. Your 2 AM self will appreciate knowing why.

Start with the config above. It’s conservative enough for a 4-core self-hosted box, aggressive enough to stop fighting the kernel.

Then monitor. Measure. Adjust. That’s the art.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
HashiCorp Vault: Stop Hardcoding Secrets Like It's 2012
Next Post
Woodpecker CI vs Drone CI: Lightweight Pipelines for People Who Hate Waiting

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts