WireGuard Is Fast, But You're Leaving Performance on the Table

WireGuard Is Fast, But “Fast” Has a Ceiling You’re Probably Not Hitting

WireGuard’s reputation is well-earned. It’s simpler than OpenVPN, faster than IPsec, and the code is small enough that it’s been audited by humans rather than just hoped about. But “WireGuard is fast” isn’t a magic incantation that makes any deployment fast.

Default WireGuard config is tuned for compatibility, not performance. MTU settings that cause fragmentation. Keepalive timers that add unnecessary overhead. Routing configs that bypass the kernel’s fast path. wireguard-go userspace implementations that voluntarily leave kernel performance behind.

Here’s how to actually benchmark what you have, and what to change to close the gap.

Baseline: What WireGuard Should Be Able to Do

Before tuning anything, measure. You cannot improve what you haven’t measured.

Install iperf3 on both ends of your tunnel:

sudo apt install iperf3

# On the "server" side (the machine you're testing against)
iperf3 -s

# On the "client" side (testing through the WireGuard tunnel)
# Replace with your WireGuard peer's tunnel IP
iperf3 -c 10.0.0.1

# Reverse test (server sends, client receives)
iperf3 -c 10.0.0.1 -R

# Bidirectional (realistic traffic pattern)
iperf3 -c 10.0.0.1 --bidir

# Multiple streams (closer to real-world saturation)
iperf3 -c 10.0.0.1 -P 4

Write down these baseline numbers before changing anything. Raw numbers without a baseline are meaningless.

Typical WireGuard performance:

Gigabit hardware, default config: 600-900 Mbps through-tunnel on LAN
With proper tuning: 900+ Mbps, often wire speed limited by NIC
VPS-to-VPS over internet: Limited by uplink, usually 200-800 Mbps
Versus OpenVPN: WireGuard is typically 3-5x faster
Versus IPsec with AES-NI: Comparable or faster depending on implementation

MTU: The Biggest Knob You’re Not Turning

This is where most home lab WireGuard setups lose significant performance. MTU (Maximum Transmission Unit) misconfiguration causes packet fragmentation, which costs CPU and throughput.

WireGuard adds overhead to each packet:

20 bytes: IPv4 outer header (or 40 for IPv6)
8 bytes: UDP header
32 bytes: WireGuard header
16 bytes: Poly1305 authentication tag

Total overhead: 60 bytes for IPv4, 80 bytes for IPv6.

If your physical network MTU is 1500 (standard Ethernet), your WireGuard MTU must be 1500 - 60 = 1420 for IPv4, or 1500 - 80 = 1420 (WireGuard uses 1420 as its default, covering the IPv6 case).

The problem: many paths aren’t standard Ethernet. PPPoE (common for DSL/fiber connections) reduces the physical MTU to 1492. Some VPS providers use jumbo frames or reduced MTU internally.

# Find your actual path MTU — critical first step
# Test from OUTSIDE the tunnel to the far end
ping -c 4 -M do -s 1472 YOUR_VPS_IP  # 1472 + 28 IP/ICMP overhead = 1500
# If this succeeds: your path MTU is at least 1500

ping -c 4 -M do -s 1452 YOUR_VPS_IP  # Test 1480 MTU path
ping -c 4 -M do -s 1400 YOUR_VPS_IP  # Test 1428 MTU path

# Decrease size until ping succeeds — that + 28 is your path MTU
# Then: WireGuard MTU = path_MTU - 60 (IPv4) or - 80 (IPv6)

# On Linux, you can also use tracepath
tracepath YOUR_VPS_IP
# Look for "pmtu XXXX" in output

Set the MTU in your WireGuard config:

[Interface]
PrivateKey = YOUR_PRIVATE_KEY
Address = 10.0.0.2/24
MTU = 1420  # Set this explicitly — don't rely on the default

[Peer]
PublicKey = PEER_PUBLIC_KEY
Endpoint = your.server.com:51820
AllowedIPs = 0.0.0.0/0
PersistentKeepalive = 25

After changing MTU:

sudo wg-quick down wg0
sudo wg-quick up wg0
iperf3 -c 10.0.0.1  # Compare to baseline

If you’re on a PPPoE connection:

MTU = 1412  # 1492 (PPPoE) - 80 = 1412

CPU Offloading: AES-NI and Curve25519

WireGuard uses ChaCha20-Poly1305 for encryption (not AES). ChaCha20 was chosen partly because it doesn’t require hardware acceleration to be fast — it’s efficient in pure software.

However, your CPU can still be the bottleneck on high-throughput connections. Check:

# Is your CPU doing crypto fast enough?
# Run iperf3 while watching CPU usage
htop  # or
top -d 0.5

# If one CPU core is pegged at 100% during iperf3:
# → You're CPU-bound, not bandwidth-bound

# Check for hardware crypto support
grep -m1 -o "aes\|avx2\|avx512\|sha_ni" /proc/cpuinfo
# AVX2/AVX512 acceleration helps ChaCha20 significantly

For very high throughput (10G+), the WireGuard kernel module performs significantly better than wireguard-go (the userspace implementation). Always prefer the kernel module:

# Check if you're using the kernel module
lsmod | grep wireguard
# If this shows "wireguard", you're using the kernel module — good.

# If you're on an older kernel without WireGuard built in:
sudo apt install wireguard-dkms

# wireguard-go is only needed on unsupported kernels or containers
# Avoid it on any system where you have a choice

The kernel module runs in kernel space and doesn’t pay the cost of context switches between user/kernel space. On fast hardware this can make a 20-40% difference in throughput.

AllowedIPs: How Routing Affects Performance

AllowedIPs isn’t just a security config — it directly affects routing performance.

# Full tunnel (all traffic through WireGuard)
AllowedIPs = 0.0.0.0/0, ::/0

# Split tunnel (only specific subnets)
AllowedIPs = 10.0.0.0/8, 192.168.1.0/24

For split tunnels, the kernel has to evaluate the routing table for every packet. With a small AllowedIPs list, this is fast. With a complex route list (if you’re doing fancy things), it can add measurable overhead.

The kernel’s fast path optimization works better with simple, aggregated routes. If you have 50 /32 entries instead of one /24, you’re making the routing lookup more expensive.

# See what routes WireGuard has installed
ip route show dev wg0

# For a split tunnel, use the wg-quick AllowedIPs calculator
# to find the minimal route set for your use case:
# https://www.procustodibus.com/blog/2021/03/wireguard-allowedips-calculator/

PersistentKeepalive: When You Want It vs When You Don’t

PersistentKeepalive = 25 keeps the tunnel alive through NAT by sending a keepalive packet every 25 seconds. It’s almost always needed when the WireGuard client is behind NAT (home routers, most mobile connections).

But: every keepalive packet is tiny overhead and a NAT table refresh. For server-to-server tunnels where neither end is behind NAT, you can omit it:

# Server-to-server — no NAT, no keepalive needed
[Peer]
PublicKey = ...
Endpoint = 1.2.3.4:51820
AllowedIPs = 10.0.0.0/24
# PersistentKeepalive not set — connection only established when traffic flows

The more important performance consideration: if you’re sending traffic through a tunnel that hasn’t been used recently, the first packet after inactivity may fail or be delayed while the connection re-establishes. PersistentKeepalive prevents this at the cost of constant low-level background traffic.

For interactive use (SSH, web browsing), keepalive improves perceived performance. For batch transfers, it doesn’t matter.

Kernel Module vs wireguard-go Performance

The difference is real:

Kernel module (wg in Linux 5.6+):   ~900 Mbps on a modern x86 server
wireguard-go (userspace):            ~600 Mbps same hardware

The userspace implementation exists for platforms where the kernel module isn’t available: OpenBSD, macOS (used by the official WireGuard app), containers with restricted kernel access.

On Linux with kernel 5.6+, WireGuard is built in. No action needed:

# Confirm WireGuard kernel module version
sudo dmesg | grep wireguard
# Should show "wireguard: WireGuard X.X loaded."

On older kernels:

# Install DKMS module
sudo apt install wireguard-dkms
sudo modprobe wireguard

WireGuard vs OpenVPN vs IPsec: Real Numbers

These are representative comparisons on a commodity x86 server (nothing exotic):

Protocol	Cipher	Approximate Throughput
WireGuard (kernel)	ChaCha20-Poly1305	900+ Mbps
WireGuard (wireguard-go)	ChaCha20-Poly1305	500-700 Mbps
IPsec (strongSwan)	AES-256-GCM + AES-NI	700-900 Mbps
OpenVPN (UDP)	AES-256-GCM	200-400 Mbps
OpenVPN (TCP)	AES-256-GCM	150-250 Mbps

OpenVPN’s relatively poor performance comes from running entirely in userspace and using a TUN/TAP abstraction that’s inherently slower than WireGuard’s approach. It’s not a bug, it’s architecture.

The IPsec comparison is interesting: with hardware AES acceleration, IPsec can nearly match WireGuard. But WireGuard’s simplicity and lower latency still make it the preferred choice for most use cases.

Quick Tuning Checklist

Run through these in order:

# 1. Measure baseline
iperf3 -c PEER_TUNNEL_IP -P 4

# 2. Find actual path MTU
ping -c 4 -M do -s 1472 PEER_PUBLIC_IP

# 3. Set correct MTU in wg0.conf
# MTU = path_MTU - 60 (IPv4) or path_MTU - 80 (IPv6)

# 4. Verify kernel module is in use (not wireguard-go)
lsmod | grep wireguard

# 5. Check CPU bottleneck during iperf3
htop  # Is one core maxed?

# 6. Verify AllowedIPs isn't unnecessarily complex
ip route show dev wg0

# 7. Re-run benchmark and compare
iperf3 -c PEER_TUNNEL_IP -P 4 -R
iperf3 -c PEER_TUNNEL_IP -P 4 --bidir

The most impactful change for most setups: getting the MTU right. The second most impactful: ensuring you’re using the kernel module. Everything else is incremental.

WireGuard is fast. With correct configuration, it’s genuinely wire-speed fast on modern hardware. The gap between “default config” and “tuned config” is real, and it’s worth closing. You already did the hard part by choosing WireGuard — might as well get all the speed that came with it.