Understanding and Optimizing Performance in Proxmox VE

You’ve got your Proxmox cluster humming along, VMs spinning up, everything working. Then you run a backup, fire up a heavy workload, and suddenly everything feels sluggish. Your database queries slow to a crawl. Containers lag. You flip through the Proxmox UI, squinting at CPU graphs wondering where the bottleneck is hiding.

Here’s the thing: Proxmox’s defaults are safe, not fast. They work for light home lab loads, but the moment you care about performance — whether that’s running actual workloads or just not feeling like you wasted money on hardware — you need to know which levers actually move the needle.

Let’s walk through the specific tuning that matters: virtio drivers, CPU pinning, memory management, IO threads, storage choices, and kernel tweaks. Real examples. Real numbers. No placeholder <vmid> nonsense.

Virtio Drivers: Why SATA Emulation is Silently Destroying Your Performance

The Proxmox GUI defaults new VMs to SATA disk controllers. This is fine for testing. It’s not fine if you actually care about disk performance.

Here’s what’s happening: SATA is fully emulated. Every read and write travels through QEMU’s emulation layer. Virtio skips that layer entirely — it’s a paravirtualized driver that VM and host communicate through directly. The performance difference is substantial: virtio can pull 2-3x the throughput.

Let’s say you’ve already got a VM (VMID 100) running with SATA. You can change it:

qm set 100 --scsi0 local-lvm:vm-100-disk-1,iothread=1

That command migrates the disk to virtio-scsi (SCSI with iothread). But here’s the gotcha: this breaks the existing disk assignment. You either need to migrate the disk via qm disk reassign or rebuild the VM.

For a new VM, just use the Proxmox GUI and explicitly select VirtIO Block or VirtIO SCSI under “Bus/Device”. Network interfaces also benefit — use VirtIO for those too, not Intel e1000 emulation.

The gains here are real. A sequential-read benchmark on a SATA-backed VM might hit 300 MB/s. Switch to virtio-scsi and you’re looking at 600+ MB/s. Not theoretical. Measured. Your database snapshots will thank you.

CPU Pinning and NUMA: When It Helps and When It’s Overkill

CPU pinning sounds advanced, and honestly, for most home labs it is overkill. But in specific situations — high-load database VMs, intensive data processing — it eliminates a whole class of performance jitter.

Here’s the concept: normally, QEMU’s vCPU threads can jump between physical CPU cores. Each context switch has a cost. Pinning forces a vCPU to stay on a specific core, eliminating that overhead.

The catch: you need more cores than VMs. If you’ve got 8 cores and pin 4 cores to a database VM, you’ve only got 4 cores left for your other VMs. Pinning is a resource trade: you buy consistency with throughput.

Pin a 4-vCPU VM to cores 0-3:

qm set 102 --cpuunits 2048 --cpulimit 4 --cores 4 -sockets 1

Then inside the VM config file (/etc/pve/qemu-server/102.conf), manually edit and add:

cpuunits: 2048
cores: 4
sockets: 1
cpu: host+kvm=off,hv_spinlocks=0x1fff,enforce
cpulimit: 4

Actually, the Proxmox CLI doesn’t have a direct --pins flag. You need to edit the config directly. Add this line:

affinity: 0,1,2,3

This pins vCPU 0→core 0, vCPU 1→core 1, etc.

For multi-socket systems (NUMA), things get more nuanced. A 2-socket Xeon box has local memory on each socket. If you’re pinning to cores on socket 0 but the VM’s RAM is allocated from socket 1, you get cross-socket traffic (slow). The rule: if you pin, also pin memory with numactl inside the VM. But honestly? Unless you’re running a proper data center workload, skip this. The complexity isn’t worth it for a self-hosted Nextcloud instance.

When to pin:

High-concurrency database (PostgreSQL, MySQL under load)
Real-time workloads where jitter matters
Benchmark/testing where you want to eliminate variables

When to skip:

General-purpose VMs (Nextcloud, *arr services, etc.)
Bursty workloads that benefit from multicore flexibility
Any homelab where you don’t have spare cores

Memory Ballooning vs. Fixed Allocation: The 2 AM Wake-up Call

Memory ballooning is a feature in Proxmox that lets VMs dynamically shrink their working set when the host is under memory pressure. Sounds smart. Sounds like you can overcommit memory and everything just works. Narrator voice: “It doesn’t.”

Ballooning works by running a driver inside the guest that “inflates” a balloon, reducing the amount of memory available to applications, freeing it for the host. On paper, elegant. In practice, when ballooning triggers at 2 AM because you’re out of RAM, application performance nosedives because the OS suddenly has half the memory it had 5 minutes ago.

The safer approach: allocate fixed memory to each VM and avoid overcommit.

If you’ve got 128 GB of RAM and you’re running 4 VMs, you give each 24 GB (keeping 32 GB for Proxmox itself). This is boring and conservative. It’s also predictable.

Check a VM’s current memory setting:

qm config 102 | grep memory

Output:

memory: 8192
balloon: 8192

The balloon value is the maximum it can balloon to — it doesn’t mean ballooning is aggressive. But if you want to disable it entirely:

qm set 102 --balloon 0

This sets the balloon to 0 (disabled). The VM now has a hard floor of memory: 8192 MB.

The gotcha: if you allocate fixed memory and the VM genuinely needs it, you’ve just hard-capped it. No ballooning means no flexibility. But that’s the trade — predictable is better than “oh, the DB got evicted to swap at random times.”

For a database VM, allocate what it truly needs and keep ballooning off. For a light utility VM (Caddy, a small service), a small fixed allocation with ballooning is fine.

IO Threads: The Checkbox Nobody Touches That Actually Matters

In the Proxmox GUI, when you create a VM disk, there’s a checkbox for “IO Thread”. It’s usually off. Most people never touch it.

IO threads offload disk I/O operations to a dedicated thread pool, reducing latency and improving throughput, especially under concurrent load. This is the low-hanging fruit that most people miss.

Enable it on an existing disk:

qm set 102 --scsi0 local-lvm:vm-102-disk-1,iothread=1

Or edit /etc/pve/qemu-server/102.conf and change:

scsi0: local-lvm:vm-102-disk-1
scsi0: local-lvm:vm-102-disk-1,iothread=1

Then restart the VM:

qm stop 102 && sleep 3 && qm start 102

For VMs handling any real I/O (databases, media services, backup targets), iothread should be ON. The overhead is negligible and the gains are real.

Storage: dir vs. LVM-thin vs. ZFS

This isn’t a “which is objectively best” question. It’s a tradeoff matrix.

dir (directory)

Proxmox just stores .qcow2 or .vmdk files in a directory on an ext4/XFS filesystem. Pros: simple, no LVM overhead, good for NFS/network storage. Cons: no snapshots without file-level tricks, fragmentation over time.

Use dir if: you’re using network storage (NFS to a TrueNAS, or Ceph), or you’ve got a simple SSD pool and don’t care about snapshots.

LVM-thin

LVM-thin creates thin-provisioned logical volumes. You allocate a 100 GB volume but only use 20 GB — the other 80 is “virtual” until you write to it. Pros: snapshots work beautifully, allocate-on-demand is efficient. Cons: if you don’t monitor it, you can accidentally run out of space (the pool fills, VMs panic).

Use LVM-thin if: you want snapshots and live migrations, and you’re willing to monitor pool usage.

lvs /dev/pve | grep thin

Check your thin pool. If usage approaches 80-90%, either grow it or disable some snapshots.

ZFS

ZFS gives you snapshots, compression, copy-on-write semantics, and better handling of partial failures. Cons: higher CPU usage for compression, more RAM needed for ARC caching, steeper learning curve.

Use ZFS if: you’ve got fast NVMe, plenty of RAM (16+ GB), and you want sophisticated storage features (replication, compression, checksumming).

For a home lab that’s “good enough”, LVM-thin is the sweet spot. You get snapshots, migrations, and no hidden complexity.

Kernel Tuning: The Actual Values to Use

Generic advice like “tune vm.swappiness” is useless without numbers. Here’s what to actually set on a Proxmox host running real workloads:

cat >> /etc/sysctl.conf << 'EOF'
vm.swappiness=10
vm.nr_hugepages=256
net.core.rmem_max=134217728
net.core.wmem_max=134217728
net.ipv4.tcp_rmem=4096 87380 67108864
net.ipv4.tcp_wmem=4096 65536 67108864
EOF
sysctl -p

What these do:

vm.swappiness=10: Tells the kernel to prefer keeping things in RAM over swapping. Default is 60. Lower = less swap. Value of 10 is aggressive but safe on a host with real RAM.
vm.nr_hugepages=256: Pre-allocate 256 huge pages (512 MB on x86). If your VMs benefit from huge pages, this helps. Verify with: cat /proc/meminfo | grep Huge
net.core.rmem_max / wmem_max: Socket buffer sizes. 128 MB is a reasonable max for high-throughput workloads. Default is 128 KB — way too small for anything fast.
tcp_rmem / tcp_wmem: Per-connection buffer settings. Tuned for 10 Gbps networks. If you’re on 1 Gbps, these are overkill but harmless.

Verify they took:

sysctl vm.swappiness net.core.rmem_max

Output:

vm.swappiness = 10
net.core.rmem_max = 134217728

Apply these once and leave them. They’re proven values for high-performance home lab setups.

Putting It Together: A Real Example

You’ve got a database VM (ID 105) running Postgres. It’s slow under concurrent query load.

Step 1: Switch to virtio disks (already covered above). ✓

Step 2: Enable IO threads:

qm set 105 --scsi0 local-lvm:vm-105-disk-1,iothread=1

Step 3: Give it fixed, adequate memory:

qm set 105 --memory 16384 --balloon 0

Step 4: Pin vCPUs if you have spare cores (let’s say you’ve got 16 cores total, and this is the only high-load VM):

Edit /etc/pve/qemu-server/105.conf and add:

affinity: 0,1,2,3

(Pin to cores 0-3, leaving 4-15 for everything else.)

Step 5: Inside the VM, configure Postgres with reasonable memory parameters:

shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
work_mem = 256MB

(Assumes a 16 GB VM. Adjust down proportionally if smaller.)

Step 6: Ensure host kernel tuning is applied (as above).

Restart the VM, run your workload, measure. The latency will drop noticeably.

The Honest Truth

Proxmox is fast by default for light loads. But if you’re running actual workloads on it — not just test VMs — these tuning steps are table stakes. Virtio drivers, IO threads, fixed memory, and kernel parameters buy you tens of percent of performance for minimal effort.

CPU pinning and NUMA tuning are probably unnecessary unless you’ve genuinely got bursty, high-contention workloads.

And if you find yourself constantly tweaking, stepping back: maybe your hardware is undersized for what you’re trying to do. Optimization only goes so far. Sometimes you just need more cores or faster disks.

Start with the easy wins (virtio, iothread, kernel settings). See where you stand. Pin and tune in detail only if you’re still bottlenecked. Your future 2 AM self will appreciate the stability.

Understanding and Optimizing Performance in Proxmox VE

Virtio Drivers: Why SATA Emulation is Silently Destroying Your Performance

CPU Pinning and NUMA: When It Helps and When It’s Overkill

Memory Ballooning vs. Fixed Allocation: The 2 AM Wake-up Call

IO Threads: The Checkbox Nobody Touches That Actually Matters

Storage: dir vs. LVM-thin vs. ZFS

dir (directory)

LVM-thin

ZFS

Kernel Tuning: The Actual Values to Use

Putting It Together: A Real Example

The Honest Truth

Responses from around the web

Discussion

Related Posts

Three ways to upload ISOs to Proxmox

Understanding Docker vs. Full Virtual Machines (VMs)

A Guide to LXC/LXD

VLAN Basics for Home Labs: Segment Your Network Before It Segments You