GPU Passthrough on Proxmox: Run LLMs in a VM

Your 3090 Deserves Better Than Being Ignored by a Hypervisor

You spent a non-trivial amount of money on that GPU. Maybe a 3090, maybe a 4090, maybe something you grabbed used at a price that made your spouse give you the look. Either way, it’s sitting in your Proxmox box, and right now your LLM workloads are running on the host directly, or worse, on CPU, because getting GPU passthrough to actually work feels like debugging someone else’s cursed bash script at 2 AM.

GPU passthrough on Proxmox is absolutely worth doing. You get GPU isolation per-VM, you can run your Ollama or vLLM stack inside a clean Linux VM without contaminating the host, and if you have two GPUs you can even pass one to a Windows gaming VM and keep the other for inference. The catch is the process is annoying in very specific ways, and the NVIDIA driver situation makes everything worse by design.

This guide gets you from a working Proxmox host to nvidia-smi returning clean output inside your VM. Let’s do this.

Step 1: IOMMU in GRUB, Enable It or Nothing Else Matters

GPU passthrough relies on IOMMU (Input-Output Memory Management Unit) support in the CPU and chipset. Intel calls it VT-d, AMD calls it AMD-Vi. Both work. Neither is on by default.

Edit your GRUB config:

# Intel
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

# AMD
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

The iommu=pt flag enables passthrough mode, it tells the kernel to only use IOMMU for devices that need it, which reduces overhead for everything else. Some guides skip this. Don’t skip it.

update-grub
reboot

After reboot, verify it took:

dmesg | grep -e IOMMU -e iommu | head -20

You want to see something like:

[    0.000000] DMAR: IOMMU enabled
[    0.275000] pci 0000:00:00.0: Adding to iommu group 0

If you see nothing, your CPU or BIOS doesn’t have VT-d/AMD-Vi enabled. Go into BIOS and enable it before continuing.

Step 2: Find Your IOMMU Groups

IOMMU groups determine what devices get passed through together. Ideally your GPU and its audio device land in their own group with nothing else. In practice, especially on consumer Intel boards, you’ll find your GPU sharing a group with other PCIe devices. That’s the pcie_acs_override situation we’ll address later.

Run this to see your groups:

for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/*}; n=${n%%/*}
  printf 'IOMMU Group %s ' "$n"
  lspci -nns "${d##*/}"
done | sort -V

You’re looking for output like this for your GPU:

IOMMU Group 14 0000:09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
IOMMU Group 14 0000:09:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)

Write down the PCI address (09:00.0 and 09:00.1) and the vendor:device IDs (10de:2204 and 10de:1aef). You’ll need both.

If your GPU shares a group with, say, a PCIe NVMe drive or your primary storage controller, that’s a problem. The ACS override patch can help, but it has security implications, more on that at the end.

Step 3: Blacklist Host GPU Drivers

You need the host Proxmox system to not touch the GPU. At all. Before the VM grabs it, before anything loads. This means blacklisting nouveau (the open-source NVIDIA driver) and the proprietary nvidia driver if installed.

cat > /etc/modprobe.d/blacklist-nvidia.conf << 'EOF'
blacklist nouveau
blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset
blacklist snd_hda_intel
EOF

Wait, why snd_hda_intel? Because the NVIDIA GPU’s HDMI audio device shows up as an Intel HD Audio controller to the kernel. If you’re not using it on the host, blacklist it. Otherwise it’ll grab the audio device and the full IOMMU group won’t be available.

If you need the host’s actual Intel audio, be more surgical: only blacklist it for the specific device in question. For most Proxmox-on-bare-metal setups, you don’t need any audio on the host at all.

Step 4: Bind vfio-pci at Boot

vfio-pci is the kernel module that holds a device “hostage” so a hypervisor (in this case KVM via Proxmox) can pass it to a guest. You need to bind it to your GPU before any other driver claims it.

Create the vfio configuration:

options vfio-pci ids=10de:2204,10de:1aef
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia_drm pre: vfio-pci

Replace 10de:2204,10de:1aef with your actual vendor:device IDs from Step 2.

Then update initramfs and modules:

echo "vfio" >> /etc/modules
echo "vfio_iommu_type1" >> /etc/modules
echo "vfio_pci" >> /etc/modules
echo "vfio_virqfd" >> /etc/modules

update-initramfs -u -k all
reboot

After reboot, confirm vfio grabbed the device:

lspci -nnk -d 10de:2204

Expected output:

09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:1492]
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia

The key line is Kernel driver in use: vfio-pci. If it says nouveau or nvidia, something went wrong, re-check the blacklist and initramfs update.

Step 5: VM Configuration, Q35, OVMF, and the Args Lines

This is where most guides fall apart. The Proxmox web UI is fine for getting the basics in, but GPU passthrough needs some manual config file editing.

Create your VM via the Proxmox web UI first. Pick:

Machine type: Q35 (not i440fx, Q35 handles PCIe passthrough correctly)
BIOS: OVMF (UEFI), required for GPU passthrough on consumer cards
CPU: Host (you want the real CPU to show up in the guest for performance)

Then add your PCI device through the UI: Hardware → Add → PCI Device. Select your GPU, enable “All Functions”, enable “ROM-Bar”, and if you’re doing a primary GPU passthrough enable “Primary GPU”.

The resulting config at /etc/pve/qemu-server/100.conf should look something like this (VM ID 100 here, adjust for yours):

agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:09:00.0,allfunctions=1,pcie=1,rombar=1,x-vga=1
machine: q35
memory: 32768
name: llm-vm
net0: virtio=AA:BB:CC:DD:EE:FF,bridge=vmbr0
numa: 1
ostype: l26
scsi0: local-lvm:vm-100-disk-1,discard=on,iothread=1,size=200G
scsihw: virtio-scsi-single
sockets: 1
vga: none

The critical lines:

hostpci0: 0000:09:00.0,allfunctions=1,pcie=1,rombar=1,x-vga=1: passes the GPU. The allfunctions=1 covers the audio device automatically. pcie=1 enables PCIe semantics instead of PCI. x-vga=1 means this is the primary VGA device.
vga: none: this goes with x-vga=1. You’re telling Proxmox the passed-through GPU is your only display output.
numa: 1: important for multi-socket or high-core-count systems (more on this shortly).

The Code 43 Problem (Consumer NVIDIA Cards)

Here’s where NVIDIA has historically played games: NVIDIA consumer cards (GeForce line) used to detect when they were running inside a VM and return error Code 43, disabling themselves. This was NVIDIA’s way of pushing people toward Quadro/Tesla cards for VM use.

The workaround used to be hiding the hypervisor from the guest by adding to your VM config:

args: -cpu 'host,kvm=off,hv_vendor_id=proxmoxKVM'

The good news: As of NVIDIA driver 465+, this Code 43 behavior was removed for most consumer cards. If you’re running a recent Ubuntu 22.04/24.04 with a 500-series or 550-series driver inside the guest, you probably don’t need kvm=off anymore.

The bad news: Some older driver versions, some specific card SKUs, and some edge cases still hit it. If your guest shows an error code 43 in Device Manager (Windows) or the card doesn’t initialize in Linux, add the args line above as your first troubleshooting step.

For Linux guests running Ollama or vLLM, you typically won’t hit Code 43 at all, it was primarily a Windows/GeForce gaming restriction.

Resizable BAR Drama

Resizable BAR (ReBAR) lets the CPU access the full GPU VRAM directly instead of through a 256MB window. It’s a performance feature, and modern GPUs support it. Inside a VM, it can cause problems.

If your VM fails to start with errors about BAR size, add this to your VM config:

hostpci0: 0000:09:00.0,allfunctions=1,pcie=1,rombar=1,x-vga=1,romfile=gpu.rom

Or, simpler: disable ReBAR in your BIOS for the slot your GPU lives in. For LLM workloads, the ReBAR performance gain is marginal compared to the passthrough headache it causes.

NUMA Pinning for Serious Performance

If your system has multiple NUMA nodes (common on EPYC, Threadripper, or dual-Xeon boards), make sure your VM’s vCPUs and memory are on the same NUMA node as the GPU. A mismatch tanks throughput because every GPU DMA operation crosses an interconnect.

Check your NUMA topology:

lstopo --of text | grep -A5 "NUMANode"

And find which NUMA node your GPU lives on:

cat /sys/bus/pci/devices/0000:09:00.0/numa_node

If it returns 1, your GPU is on NUMA node 1. In Proxmox, pin the VM to cores on that node:

numa: 1
cpuunits: 1024
numanode0: cpus=8-15,hostnodes=1,memory=32768,policy=bind

For a single-socket consumer box (AM5, LGA1700), you have one NUMA node and can ignore all of this. Lucky you.

Verifying It Works Inside the Guest

Boot your VM. If you’re using a headless Linux guest (which is the right call for Ollama/vLLM anyway), SSH in and install the NVIDIA driver normally:

# Ubuntu/Debian guest
apt install nvidia-driver-550 nvidia-utils-550
reboot

After reboot:

nvidia-smi

Expected output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:06:10.0 Off |                  N/A |
| 30%   42C    P8              25W / 350W |       0MiB / 24576MiB  |      0%      Default |
+-----------------------------------------+------------------------+----------------------+

If you see your GPU listed: you’re done. Go install Ollama:

curl -fsSL https://ollama.com/install.sh | sh
ollama run gemma3

And watch the GPU memory climb as the model loads. Genuinely satisfying.

pcie_acs_override, When Your IOMMU Groups Are a Mess

If your GPU is stuck in a group with other devices you can’t pass through and can’t remove, the ACS (Access Control Services) override patch forces the kernel to split IOMMU groups. Proxmox ships with it available via a kernel boot flag:

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction"

Use this only if you need it. The ACS override weakens isolation between devices, it’s the reason IOMMU groups exist in the first place. On a home lab machine that isn’t running untrusted VMs with direct hardware access, it’s fine. On a production multi-tenant server, it’s not something you want to explain to a security auditor.

If you’re on a modern X570/B550 or Z690/Z790 board with a decent PCIe topology, you probably don’t need it. The ACS override is mostly a problem on older boards and budget chipsets that dump everything into one or two groups.

Single GPU vs Two GPU: The Decision

Here’s the honest breakdown:

Single GPU (passthrough only)

Host has no display output once the VM claims the GPU
You lose the GPU on VM shutdown: it doesn’t auto-rebind to the host
Workarounds exist (hook scripts to rebind, virtual displays) but they’re fragile
Best when: dedicated inference box, no gaming VM needed, headless operation

Two GPU (one for host/gaming, one for VMs)

GPU 1: stays on host or goes to Windows gaming VM
GPU 2: passed through to Linux inference VM
Clean separation, both VMs can run simultaneously
Best when: you want to game on Windows and run inference concurrently
Cost: you need two PCIe x16 slots and two GPUs

For a dedicated LLM server where you SSH in and don’t need a desktop: single GPU is fine. Run Proxmox headless, pass the GPU to your Linux VM permanently, and forget the host has a display. The VM runs Ollama/vLLM, you hit it over the network, done.

For a dual-use box where you also want to game: two GPUs is the right answer. Don’t fight the single-GPU situation for gaming, it works, but every time you switch you’re rebooting VMs and rebinding drivers, and that gets old faster than you’d think.

When Things Go Wrong

Quick reference for the most common failure modes:

VM won’t start, “rombar” error: Try removing rombar=1 from the hostpci line, or dump the vBIOS from the host before binding vfio and pass it as romfile.
Code 43 in Windows guest: Add args: -cpu 'host,kvm=off,hv_vendor_id=proxmoxKVM' to the VM config.
nvidia-smi works but CUDA apps fail: Make sure you installed nvidia-utils and that the driver version matches the CUDA version in your container/app.
GPU not showing in VM, lspci empty: vfio-pci didn’t bind on the host. Check lspci -nnk and verify initramfs was rebuilt.
VM hangs on boot with GPU: ReBAR issue. Disable in BIOS or pass romfile explicitly.

Your 2 AM self will appreciate having this list bookmarked.

GPU Passthrough on Proxmox: Run LLMs in a VM

Your 3090 Deserves Better Than Being Ignored by a Hypervisor

Step 1: IOMMU in GRUB, Enable It or Nothing Else Matters

Step 2: Find Your IOMMU Groups

Step 3: Blacklist Host GPU Drivers

Step 4: Bind vfio-pci at Boot

Step 5: VM Configuration, Q35, OVMF, and the Args Lines

The Code 43 Problem (Consumer NVIDIA Cards)

Resizable BAR Drama

NUMA Pinning for Serious Performance

Verifying It Works Inside the Guest

pcie_acs_override, When Your IOMMU Groups Are a Mess

Single GPU vs Two GPU: The Decision

When Things Go Wrong

Responses from around the web

Discussion

Related Posts

BTCPay Server: Self-Hosted Crypto Payments

Komga vs Kavita: Self-Hosted Comic & Manga Servers

Headlamp: K8s UI Without the License Drama

K9s vs Lens vs Headlamp: Cluster UIs