Your Kernel Has Settings and You’re Not Using Any of Them
Linux ships with conservative defaults. The kernel parameters are tuned for a wide range of hardware and workloads — a desktop, a database server, a router, a 4GB RAM NUC running four Docker containers — all reasonable, none optimal for any specific use case.
sysctl is the interface to these parameters. It lets you read and write kernel settings at runtime without rebooting, and persist them across reboots via configuration files. The settings live in the /proc/sys/ virtual filesystem, which means you can also read and write them directly:
# Read a parameter
sysctl vm.swappiness
# or directly:
cat /proc/sys/vm/swappiness
# Write a parameter (temporary, lost on reboot)
sysctl -w vm.swappiness=10
# or directly:
echo 10 > /proc/sys/vm/swappiness
That last method — writing directly to /proc/sys/ — is useful for quick tests but don’t rely on it for production changes. Use sysctl -w instead, so you’re being intentional about what you’re changing.
Making Changes Persistent
Temporary changes via sysctl -w are lost on reboot. To persist them, write to a file in /etc/sysctl.d/:
# Create a custom config file (the number prefix controls load order)
sudo nano /etc/sysctl.d/99-custom.conf
Add your parameters:
vm.swappiness = 10
net.core.somaxconn = 65535
Apply immediately without rebooting:
sudo sysctl --system
# or for a specific file:
sudo sysctl -p /etc/sysctl.d/99-custom.conf
Using /etc/sysctl.d/ instead of editing /etc/sysctl.conf directly is better practice — it keeps your customizations separate from distro-provided settings and makes it obvious which changes are yours.
Network Parameters That Actually Matter
net.core.somaxconn
This controls the maximum length of the connection queue for a socket — how many pending connections can be waiting to be accepted before the kernel starts refusing new ones.
The default is often 128 or 4096 depending on your distribution. For any server accepting significant traffic, this is too low:
net.core.somaxconn = 65535
Your web server or reverse proxy also needs to be configured to match — Nginx’s backlog in the listen directive, for example. The kernel limit and the application limit, the lower one wins.
net.ipv4.tcp_tw_reuse
When a TCP connection closes, the socket enters a TIME_WAIT state for 2 minutes (2MSL) to ensure any late packets from the old connection don’t interfere with a new one on the same port. Under heavy load — a server handling thousands of short-lived connections per second — you can exhaust ephemeral ports waiting for TIME_WAIT sockets to expire.
tcp_tw_reuse allows the kernel to reuse TIME_WAIT sockets for new outbound connections when it’s safe to do so:
net.ipv4.tcp_tw_reuse = 1
Important: this is safe for outbound connections (your server connecting to others). Don’t confuse it with tcp_tw_recycle, which was removed in kernel 4.12 because it broke connections through NAT.
Network Buffer Sizes
The default socket receive and send buffer sizes are sized for a network from a decade ago. On modern hardware with fast networks, increasing these allows the kernel to buffer more data in flight:
# Default socket receive buffer (bytes)
net.core.rmem_default = 262144
# Maximum socket receive buffer
net.core.rmem_max = 16777216
# Default socket send buffer
net.core.wmem_default = 262144
# Maximum socket send buffer
net.core.wmem_max = 16777216
# TCP receive buffer: min, default, max
net.ipv4.tcp_rmem = 4096 262144 16777216
# TCP send buffer: min, default, max
net.ipv4.tcp_wmem = 4096 262144 16777216
These numbers are a reasonable starting point for a server with 4-8GB RAM and a Gigabit+ network. For 10GbE or higher, you’d scale up further.
net.ipv4.tcp_syncookies
SYN flood protection. This is usually already enabled on modern distributions, but worth verifying:
net.ipv4.tcp_syncookies = 1
When the SYN backlog fills up, the kernel uses SYN cookies to complete the handshake without using backlog queue space. This mitigates SYN flood attacks. Leave it on.
net.ipv4.ip_local_port_range
The range of ports available for outbound connections. The default (32768-60999) gives you ~28,000 ephemeral ports. For servers making heavy outbound connections:
net.ipv4.ip_local_port_range = 1024 65535
This gives you ~64,000 ports. Combined with tcp_tw_reuse, this significantly increases the connection capacity for outbound-heavy workloads.
Memory Parameters
vm.swappiness
This is the one everyone knows about. vm.swappiness controls how aggressively the kernel uses swap space. The default is 60, which means the kernel is reasonably eager to move pages from RAM to swap even when there’s memory available — it prefers to keep RAM available for cache.
For a server where you want to prioritize keeping application data in RAM:
vm.swappiness = 10
Lower values make the kernel less eager to swap. Setting it to 0 doesn’t disable swap entirely — it just tells the kernel to avoid it unless absolutely necessary. The kernel will still use swap if RAM is actually exhausted.
For a system with no swap configured, this setting is irrelevant. For a desktop or system where swap responsiveness matters, 10-20 is common. For a database server where latency spikes from swap activity are catastrophic, 1-5 is common.
Setting it to 0 entirely was a common recommendation that Red Hat walked back — 1 is better than 0 because a completely swap-avoidant kernel can OOM-kill processes when there’s actually memory that could be reclaimed.
vm.dirty_ratio and vm.dirty_background_ratio
These control when the kernel flushes dirty pages (data written to filesystem but not yet synced to disk) to storage.
vm.dirty_background_ratio: percentage of total memory at which background flush startsvm.dirty_ratio: percentage at which processes writing to the filesystem are blocked until flush completes
The defaults are 10% and 20% respectively. For a write-heavy workload on fast storage (NVMe), you can increase these to reduce the frequency of flush operations:
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10
For a system where you can’t afford data loss (no UPS, writing to spinning rust), keep these lower so data reaches disk sooner.
vm.vfs_cache_pressure
Controls how aggressively the kernel reclaims memory used for directory and inode caches. The default is 100 (balanced). Lower values cause the kernel to favor keeping these caches:
vm.vfs_cache_pressure = 50
On a server with heavy filesystem operations (lots of file opens, directory lookups), keeping more inode/dentry cache can improve performance. On a memory-constrained system, leave this at default.
Container-Relevant Parameters
If you’re running Docker, these parameters affect container networking:
# Required for container networking (usually set by Docker automatically)
net.ipv4.ip_forward = 1
# Allow containers to use the bridge for routing
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
# Maximum number of memory map areas (needed for Elasticsearch and some JVM apps)
vm.max_map_count = 262144
The vm.max_map_count one will save you a headache — Elasticsearch refuses to start with the default value and prints a helpful error message telling you to set it. Setting it proactively means your Elasticsearch container starts without drama.
A Practical sysctl.conf for a Docker Host
Here’s a ready-to-use configuration for a general-purpose Linux server running Docker:
# /etc/sysctl.d/99-docker-host.conf
# Docker host optimization
# --- Networking ---
# Increase connection backlog queue
net.core.somaxconn = 65535
# Allow reuse of TIME_WAIT sockets for outbound connections
net.ipv4.tcp_tw_reuse = 1
# SYN flood protection (usually already on)
net.ipv4.tcp_syncookies = 1
# Increase ephemeral port range
net.ipv4.ip_local_port_range = 1024 65535
# Increase socket buffer sizes
net.core.rmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_default = 262144
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 262144 16777216
net.ipv4.tcp_wmem = 4096 262144 16777216
# Container networking
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
# --- Memory ---
# Don't swap aggressively
vm.swappiness = 10
# Balance dirty page flushing
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10
# Keep inode/dentry cache
vm.vfs_cache_pressure = 50
# Required for Elasticsearch and some JVM apps in containers
vm.max_map_count = 262144
Apply it:
sudo sysctl --system
Testing Changes Safely
The right workflow for sysctl changes:
- Apply temporarily first:
sysctl -w parameter=value - Verify it didn’t break anything: test your workload
- Benchmark if relevant: tools like
iperf3for network,fiofor disk - Persist to
/etc/sysctl.d/once you’re satisfied - Test a reboot: make sure the parameters come back correctly
To see all currently loaded parameters:
sysctl -a 2>/dev/null | grep -v "kernel.printk"
To see what your system currently has for a specific category:
sysctl -a | grep "net.ipv4.tcp"
The kernel parameters are one of those areas where “default is fine for most people” is genuinely true. You don’t need to tune sysctl on your laptop or a lightly-loaded home server. But when you start running services that handle real traffic or real data, understanding these parameters means the difference between “it’s getting slow under load” and “I know exactly which knob to turn.”