Skip to content
Go back

GPU Passthrough on Proxmox: Run LLMs in a VM

By SumGuy 11 min read
GPU Passthrough on Proxmox: Run LLMs in a VM

Your 3090 Deserves Better Than Being Ignored by a Hypervisor

You spent a non-trivial amount of money on that GPU. Maybe a 3090, maybe a 4090, maybe something you grabbed used at a price that made your spouse give you the look. Either way, it’s sitting in your Proxmox box, and right now your LLM workloads are running on the host directly — or worse, on CPU — because getting GPU passthrough to actually work feels like debugging someone else’s cursed bash script at 2 AM.

Here’s the thing: GPU passthrough on Proxmox is absolutely worth doing. You get GPU isolation per-VM, you can run your Ollama or vLLM stack inside a clean Linux VM without contaminating the host, and if you have two GPUs you can even pass one to a Windows gaming VM and keep the other for inference. The catch is the process is annoying in very specific ways, and the NVIDIA driver situation makes everything worse by design.

This guide gets you from a working Proxmox host to nvidia-smi returning clean output inside your VM. Let’s do this.


Step 1: IOMMU in GRUB — Enable It or Nothing Else Matters

GPU passthrough relies on IOMMU (Input-Output Memory Management Unit) support in the CPU and chipset. Intel calls it VT-d, AMD calls it AMD-Vi. Both work. Neither is on by default.

Edit your GRUB config:

/etc/default/grub
# Intel
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
# AMD
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

The iommu=pt flag enables passthrough mode — it tells the kernel to only use IOMMU for devices that need it, which reduces overhead for everything else. Some guides skip this. Don’t skip it.

Terminal window
update-grub
reboot

After reboot, verify it took:

Terminal window
dmesg | grep -e IOMMU -e iommu | head -20

You want to see something like:

[ 0.000000] DMAR: IOMMU enabled
[ 0.275000] pci 0000:00:00.0: Adding to iommu group 0

If you see nothing, your CPU or BIOS doesn’t have VT-d/AMD-Vi enabled. Go into BIOS and enable it before continuing.


Step 2: Find Your IOMMU Groups

IOMMU groups determine what devices get passed through together. Ideally your GPU and its audio device land in their own group with nothing else. In practice, especially on consumer Intel boards, you’ll find your GPU sharing a group with other PCIe devices. That’s the pcie_acs_override situation we’ll address later.

Run this to see your groups:

Terminal window
for d in /sys/kernel/iommu_groups/*/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf 'IOMMU Group %s ' "$n"
lspci -nns "${d##*/}"
done | sort -V

You’re looking for output like this for your GPU:

IOMMU Group 14 0000:09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
IOMMU Group 14 0000:09:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)

Write down the PCI address (09:00.0 and 09:00.1) and the vendor:device IDs (10de:2204 and 10de:1aef). You’ll need both.

If your GPU shares a group with, say, a PCIe NVMe drive or your primary storage controller, that’s a problem. The ACS override patch can help, but it has security implications — more on that at the end.


Step 3: Blacklist Host GPU Drivers

You need the host Proxmox system to not touch the GPU. At all. Before the VM grabs it, before anything loads. This means blacklisting nouveau (the open-source NVIDIA driver) and the proprietary nvidia driver if installed.

Terminal window
cat > /etc/modprobe.d/blacklist-nvidia.conf << 'EOF'
blacklist nouveau
blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset
blacklist snd_hda_intel
EOF

Wait — why snd_hda_intel? Because the NVIDIA GPU’s HDMI audio device shows up as an Intel HD Audio controller to the kernel. If you’re not using it on the host, blacklist it. Otherwise it’ll grab the audio device and the full IOMMU group won’t be available.

If you need the host’s actual Intel audio, be more surgical: only blacklist it for the specific device in question. For most Proxmox-on-bare-metal setups, you don’t need any audio on the host at all.


Step 4: Bind vfio-pci at Boot

vfio-pci is the kernel module that holds a device “hostage” so a hypervisor (in this case KVM via Proxmox) can pass it to a guest. You need to bind it to your GPU before any other driver claims it.

Create the vfio configuration:

/etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2204,10de:1aef
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia_drm pre: vfio-pci

Replace 10de:2204,10de:1aef with your actual vendor:device IDs from Step 2.

Then update initramfs and modules:

Terminal window
echo "vfio" >> /etc/modules
echo "vfio_iommu_type1" >> /etc/modules
echo "vfio_pci" >> /etc/modules
echo "vfio_virqfd" >> /etc/modules
update-initramfs -u -k all
reboot

After reboot, confirm vfio grabbed the device:

Terminal window
lspci -nnk -d 10de:2204

Expected output:

09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:1492]
Kernel driver in use: vfio-pci
Kernel modules: nouveau, nvidia_drm, nvidia

The key line is Kernel driver in use: vfio-pci. If it says nouveau or nvidia, something went wrong — re-check the blacklist and initramfs update.


Step 5: VM Configuration — Q35, OVMF, and the Args Lines

This is where most guides fall apart. The Proxmox web UI is fine for getting the basics in, but GPU passthrough needs some manual config file editing.

Create your VM via the Proxmox web UI first. Pick:

Then add your PCI device through the UI: Hardware → Add → PCI Device. Select your GPU, enable “All Functions”, enable “ROM-Bar”, and if you’re doing a primary GPU passthrough enable “Primary GPU”.

The resulting config at /etc/pve/qemu-server/100.conf should look something like this (VM ID 100 here, adjust for yours):

/etc/pve/qemu-server/100.conf
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:09:00.0,allfunctions=1,pcie=1,rombar=1,x-vga=1
machine: q35
memory: 32768
name: llm-vm
net0: virtio=AA:BB:CC:DD:EE:FF,bridge=vmbr0
numa: 1
ostype: l26
scsi0: local-lvm:vm-100-disk-1,discard=on,iothread=1,size=200G
scsihw: virtio-scsi-single
sockets: 1
vga: none

The critical lines:


The Code 43 Problem (Consumer NVIDIA Cards)

Here’s where NVIDIA has historically played games: NVIDIA consumer cards (GeForce line) used to detect when they were running inside a VM and return error Code 43, disabling themselves. This was NVIDIA’s way of pushing people toward Quadro/Tesla cards for VM use.

The workaround used to be hiding the hypervisor from the guest by adding to your VM config:

/etc/pve/qemu-server/100.conf
args: -cpu 'host,kvm=off,hv_vendor_id=proxmoxKVM'

The good news: As of NVIDIA driver 465+, this Code 43 behavior was removed for most consumer cards. If you’re running a recent Ubuntu 22.04/24.04 with a 500-series or 550-series driver inside the guest, you probably don’t need kvm=off anymore.

The bad news: Some older driver versions, some specific card SKUs, and some edge cases still hit it. If your guest shows an error code 43 in Device Manager (Windows) or the card doesn’t initialize in Linux, add the args line above as your first troubleshooting step.

For Linux guests running Ollama or vLLM, you typically won’t hit Code 43 at all — it was primarily a Windows/GeForce gaming restriction.


Resizable BAR Drama

Resizable BAR (ReBAR) lets the CPU access the full GPU VRAM directly instead of through a 256MB window. It’s a performance feature, and modern GPUs support it. Inside a VM, it can cause problems.

If your VM fails to start with errors about BAR size, add this to your VM config:

/etc/pve/qemu-server/100.conf
hostpci0: 0000:09:00.0,allfunctions=1,pcie=1,rombar=1,x-vga=1,romfile=gpu.rom

Or, simpler: disable ReBAR in your BIOS for the slot your GPU lives in. For LLM workloads, the ReBAR performance gain is marginal compared to the passthrough headache it causes.


NUMA Pinning for Serious Performance

If your system has multiple NUMA nodes (common on EPYC, Threadripper, or dual-Xeon boards), make sure your VM’s vCPUs and memory are on the same NUMA node as the GPU. A mismatch tanks throughput because every GPU DMA operation crosses an interconnect.

Check your NUMA topology:

Terminal window
lstopo --of text | grep -A5 "NUMANode"

And find which NUMA node your GPU lives on:

Terminal window
cat /sys/bus/pci/devices/0000:09:00.0/numa_node

If it returns 1, your GPU is on NUMA node 1. In Proxmox, pin the VM to cores on that node:

/etc/pve/qemu-server/100.conf
numa: 1
cpuunits: 1024
numanode0: cpus=8-15,hostnodes=1,memory=32768,policy=bind

For a single-socket consumer box (AM5, LGA1700), you have one NUMA node and can ignore all of this. Lucky you.


Verifying It Works Inside the Guest

Boot your VM. If you’re using a headless Linux guest (which is the right call for Ollama/vLLM anyway), SSH in and install the NVIDIA driver normally:

Terminal window
# Ubuntu/Debian guest
apt install nvidia-driver-550 nvidia-utils-550
reboot

After reboot:

Terminal window
nvidia-smi

Expected output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:06:10.0 Off | N/A |
| 30% 42C P8 25W / 350W | 0MiB / 24576MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+

If you see your GPU listed: you’re done. Go install Ollama:

Terminal window
curl -fsSL https://ollama.com/install.sh | sh
ollama run llama3.2

And watch the GPU memory climb as the model loads. Genuinely satisfying.


pcie_acs_override — When Your IOMMU Groups Are a Mess

If your GPU is stuck in a group with other devices you can’t pass through and can’t remove, the ACS (Access Control Services) override patch forces the kernel to split IOMMU groups. Proxmox ships with it available via a kernel boot flag:

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction"

Use this only if you need it. The ACS override weakens isolation between devices — it’s the reason IOMMU groups exist in the first place. On a home lab machine that isn’t running untrusted VMs with direct hardware access, it’s fine. On a production multi-tenant server, it’s not something you want to explain to a security auditor.

If you’re on a modern X570/B550 or Z690/Z790 board with a decent PCIe topology, you probably don’t need it. The ACS override is mostly a problem on older boards and budget chipsets that dump everything into one or two groups.


Single GPU vs Two GPU: The Decision

Here’s the honest breakdown:

Single GPU (passthrough only)

Two GPU (one for host/gaming, one for VMs)

For a dedicated LLM server where you SSH in and don’t need a desktop: single GPU is fine. Run Proxmox headless, pass the GPU to your Linux VM permanently, and forget the host has a display. The VM runs Ollama/vLLM, you hit it over the network, done.

For a dual-use box where you also want to game: two GPUs is the right answer. Don’t fight the single-GPU situation for gaming — it works, but every time you switch you’re rebooting VMs and rebinding drivers, and that gets old faster than you’d think.


When Things Go Wrong

Quick reference for the most common failure modes:

Your 2 AM self will appreciate having this list bookmarked.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Frigate + Coral TPU: AI Cameras Without the Subscription
Next Post
Hoist: Label-Driven Docker Updates

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts