Your Linux server is running with a kernel tuned for a desktop from 2004. Not metaphorically — literally. The default sysctl values ship tuned for general workloads: modest network buffers, conservative memory policies, and connection limits that made sense when 1 Gbps was fast.
Then you bolt a self-hosted Docker swarm, a Prometheus scraper, or a Nextcloud instance on top, and the kernel starts fighting you. Dropped packets. Connection timeouts. Memory thrashing. Your 2 AM self tweets “why is my Compose file so slow?” and the answer is: you didn’t tell the kernel what job you actually hired it for.
Here’s the thing: kernel tuning isn’t magic. It’s not a sacred art that only ops veterans understand. It’s reading five sysctl parameters, understanding what each one does, and setting them to values that match how your server actually works.
What Is sysctl and How Do You Use It?
sysctl is the interface to Linux kernel parameters. They live in /proc/sys/ as a filesystem (which is wild, honestly), but you manage them via the sysctl command or by editing /etc/sysctl.d/ config files.
Three ways to apply settings:
# Temporary (lost on reboot)sysctl -w net.core.somaxconn=4096
# Persistent approach: drop a file in /etc/sysctl.d/# Edit /etc/sysctl.d/99-tuning.conf, then reload:sysctl -p /etc/sysctl.d/99-tuning.conf
# Or just check what's set now:sysctl net.core.somaxconnThe /etc/sysctl.d/ approach is better for production — files get processed in alphabetical order at boot, and it’s self-documenting.
Network Performance: Buffers and Congestion Control
Most performance problems on self-hosted infrastructure are network problems. The kernel’s default TCP buffers are tiny.
TCP read/write memory buffers:
# net.ipv4.tcp_rmem = min default max# Default: 4096 87380 6291456 (4KB / 85KB / 6MB)# Better for servers with gigabit links:net.ipv4.tcp_rmem = 4096 87380 134217728
# net.ipv4.tcp_wmem = min default maxnet.ipv4.tcp_wmem = 4096 65536 134217728What this does: allows TCP connections to buffer up to 128 MB of data in kernel memory. On a 1 Gbps link pulling data from a database or S3, this prevents artificial bottlenecks. Your Prometheus scraper won’t timeout waiting for metrics.
BBR congestion control — replaces the ancient Reno algorithm with something from this century:
net.ipv4.tcp_congestion_control = bbrBBR measures bandwidth and RTT, not just packet loss. It’s the reason Google’s infrastructure doesn’t melt. Supported on all modern kernels (5.18+).
TCP Fast Open:
net.ipv4.tcp_fastopen = 3Lets clients send data in the SYN packet. Reduces handshake overhead. Good for high-connection-rate services (Prometheus, API gateways).
Connection Handling: Backlog and TIME_WAIT Recycling
A Docker host running 50 services trying to connect to each other hits the kernel’s listen queue limits fast.
Listen queue depth:
# Max pending connections per socketnet.core.somaxconn = 4096# Also bump this — kernel's backlog before somaxconn appliesnet.ipv4.tcp_max_syn_backlog = 4096If your service logs show “connection refused” but the port is listening, the listen queue is full. Bump these.
TIME_WAIT recycling — after a connection closes, Linux holds the socket in TIME_WAIT for 60 seconds (RFC compliance). On a high-churn setup (load balancer, API gateway), you’ll exhaust ephemeral ports:
# Allow reuse of TIME_WAIT sockets for new connectionsnet.ipv4.tcp_tw_reuse = 1This is safe in modern kernels and saves you from port exhaustion on high-connection setups.
Memory Management: Swappiness, Dirty Buffers, and Overcommit
Don’t blindly set vm.swappiness = 10 because you read it on a forum. The default is 60, which means “prefer keeping fs caches in memory over swapping application memory.” That’s often right.
However, if you’re running a Nextcloud + database on a single machine with 8 GB RAM, swap is your last resort:
# 0 = avoid swap unless desperate, 100 = prefer swapvm.swappiness = 10
# How much dirty data before pdflush kicks in (default 20%)vm.dirty_ratio = 15
# When background flush starts (default 10%)vm.dirty_background_ratio = 5Lower dirty_ratio = more frequent disk writes = less chance of a huge flush stall. On NVMe, aggressive is fine. On spinning disk, be gentler.
Overcommit:
# 1 = allow overcommit (default)# 0 = strict memory accountingvm.overcommit_memory = 1Docker loves overcommit. Apps request memory they’ll never use, and the kernel trusts them. Set to 0 only if you have strict memory guarantees and can handle OOM killer being more aggressive.
File Descriptors and Inotify Watches
You have 50 Docker containers, each with a service spamming logs, Prometheus scraping, and file watchers. The kernel’s default limits are 1024 per process and 8192 system-wide watches.
# System-wide file descriptor limitfs.file-max = 2097152
# Inotify watches per user (crucial for Docker + systemd + Prometheus)fs.inotify.max_user_watches = 524288fs.inotify.max_queued_events = 32768If your app logs complain about “no space left on device” or inotify watch limit, these are the culprit.
Security-Relevant Sysctls
Not just performance — some sysctls are security decisions:
# Docker needs this for container networksnet.ipv4.ip_forward = 1
# Don't respond to ICMP redirects (old attack vector)net.ipv4.conf.all.send_redirects = 0net.ipv4.conf.default.send_redirects = 0
# SYN flood protection (SYN cookies)net.ipv4.tcp_syncookies = 1
# Ignore ICMP redirects (don't update routing table from untrusted sources)net.ipv4.conf.all.accept_redirects = 0net.ipv6.conf.all.accept_redirects = 0A Production-Ready Config
Drop this in /etc/sysctl.d/99-performance.conf:
# ============================================================================# SumGuy's Sysctl Tuning — Production Self-Hosted Linux# ============================================================================
# NETWORKING: Buffers, congestion control, TCP tuningnet.core.rmem_max = 134217728net.core.wmem_max = 134217728net.ipv4.tcp_rmem = 4096 87380 134217728net.ipv4.tcp_wmem = 4096 65536 134217728
# Congestion control: BBR (needs kernel 5.18+)net.ipv4.tcp_congestion_control = bbrnet.ipv4.tcp_notsent_lowat = 16384
# Connection handlingnet.core.somaxconn = 4096net.ipv4.tcp_max_syn_backlog = 4096net.ipv4.tcp_tw_reuse = 1net.ipv4.tcp_fastopen = 3
# MEMORY: Swap policy and dirty buffer tuningvm.swappiness = 10vm.dirty_ratio = 15vm.dirty_background_ratio = 5vm.overcommit_memory = 1
# FILE SYSTEM: Descriptors and watches (critical for Docker/Prometheus)fs.file-max = 2097152fs.inotify.max_user_watches = 524288fs.inotify.max_queued_events = 32768
# SECURITY: IP forwarding, SYN cookies, ICMP hardeningnet.ipv4.ip_forward = 1net.ipv4.tcp_syncookies = 1net.ipv4.conf.all.send_redirects = 0net.ipv4.conf.default.send_redirects = 0net.ipv4.conf.all.accept_redirects = 0net.ipv6.conf.all.accept_redirects = 0Then reload:
sudo sysctl -p /etc/sysctl.d/99-performance.confVerify:
$ sysctl net.core.somaxconnnet.core.somaxconn = 4096Testing Changes: Before and After
Don’t just set and pray. Measure.
For network throughput, use iperf3 (server and client on two machines):
# Serveriperf3 -s
# Client (before tuning)iperf3 -c 192.168.1.100 -t 30 -P 4Record the bandwidth. Apply the tuning. Run again. You should see 10-40% improvement on gigabit links.
For connection handling, watch active connections and the listen queue:
# Real-time connection statsss -tupn | wc -l
# Check listen queue depth per socketss -ltpn | grep LISTENFor file descriptors, check before hitting a limit:
# System-wide open file countcat /proc/sys/fs/file-nr# Returns: <used> <free> <max>The Skeptical Question
“Won’t bad values break my system?” Unlikely. The Linux kernel is defensive — bad sysctl values typically just don’t help, they don’t hurt. The worst you’ll get is OOM killer being more aggressive or connections being dropped. Your 2 AM self will appreciate knowing why.
Start with the config above. It’s conservative enough for a 4-core self-hosted box, aggressive enough to stop fighting the kernel.
Then monitor. Measure. Adjust. That’s the art.