Your 3090 Deserves Better Than Being Ignored by a Hypervisor
You spent a non-trivial amount of money on that GPU. Maybe a 3090, maybe a 4090, maybe something you grabbed used at a price that made your spouse give you the look. Either way, it’s sitting in your Proxmox box, and right now your LLM workloads are running on the host directly — or worse, on CPU — because getting GPU passthrough to actually work feels like debugging someone else’s cursed bash script at 2 AM.
Here’s the thing: GPU passthrough on Proxmox is absolutely worth doing. You get GPU isolation per-VM, you can run your Ollama or vLLM stack inside a clean Linux VM without contaminating the host, and if you have two GPUs you can even pass one to a Windows gaming VM and keep the other for inference. The catch is the process is annoying in very specific ways, and the NVIDIA driver situation makes everything worse by design.
This guide gets you from a working Proxmox host to nvidia-smi returning clean output inside your VM. Let’s do this.
Step 1: IOMMU in GRUB — Enable It or Nothing Else Matters
GPU passthrough relies on IOMMU (Input-Output Memory Management Unit) support in the CPU and chipset. Intel calls it VT-d, AMD calls it AMD-Vi. Both work. Neither is on by default.
Edit your GRUB config:
# IntelGRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
# AMDGRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"The iommu=pt flag enables passthrough mode — it tells the kernel to only use IOMMU for devices that need it, which reduces overhead for everything else. Some guides skip this. Don’t skip it.
update-grubrebootAfter reboot, verify it took:
dmesg | grep -e IOMMU -e iommu | head -20You want to see something like:
[ 0.000000] DMAR: IOMMU enabled[ 0.275000] pci 0000:00:00.0: Adding to iommu group 0If you see nothing, your CPU or BIOS doesn’t have VT-d/AMD-Vi enabled. Go into BIOS and enable it before continuing.
Step 2: Find Your IOMMU Groups
IOMMU groups determine what devices get passed through together. Ideally your GPU and its audio device land in their own group with nothing else. In practice, especially on consumer Intel boards, you’ll find your GPU sharing a group with other PCIe devices. That’s the pcie_acs_override situation we’ll address later.
Run this to see your groups:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*} printf 'IOMMU Group %s ' "$n" lspci -nns "${d##*/}"done | sort -VYou’re looking for output like this for your GPU:
IOMMU Group 14 0000:09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)IOMMU Group 14 0000:09:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)Write down the PCI address (09:00.0 and 09:00.1) and the vendor:device IDs (10de:2204 and 10de:1aef). You’ll need both.
If your GPU shares a group with, say, a PCIe NVMe drive or your primary storage controller, that’s a problem. The ACS override patch can help, but it has security implications — more on that at the end.
Step 3: Blacklist Host GPU Drivers
You need the host Proxmox system to not touch the GPU. At all. Before the VM grabs it, before anything loads. This means blacklisting nouveau (the open-source NVIDIA driver) and the proprietary nvidia driver if installed.
cat > /etc/modprobe.d/blacklist-nvidia.conf << 'EOF'blacklist nouveaublacklist nvidiablacklist nvidia_drmblacklist nvidia_modesetblacklist snd_hda_intelEOFWait — why snd_hda_intel? Because the NVIDIA GPU’s HDMI audio device shows up as an Intel HD Audio controller to the kernel. If you’re not using it on the host, blacklist it. Otherwise it’ll grab the audio device and the full IOMMU group won’t be available.
If you need the host’s actual Intel audio, be more surgical: only blacklist it for the specific device in question. For most Proxmox-on-bare-metal setups, you don’t need any audio on the host at all.
Step 4: Bind vfio-pci at Boot
vfio-pci is the kernel module that holds a device “hostage” so a hypervisor (in this case KVM via Proxmox) can pass it to a guest. You need to bind it to your GPU before any other driver claims it.
Create the vfio configuration:
options vfio-pci ids=10de:2204,10de:1aefsoftdep nouveau pre: vfio-pcisoftdep nvidia pre: vfio-pcisoftdep nvidia_drm pre: vfio-pciReplace 10de:2204,10de:1aef with your actual vendor:device IDs from Step 2.
Then update initramfs and modules:
echo "vfio" >> /etc/modulesecho "vfio_iommu_type1" >> /etc/modulesecho "vfio_pci" >> /etc/modulesecho "vfio_virqfd" >> /etc/modules
update-initramfs -u -k allrebootAfter reboot, confirm vfio grabbed the device:
lspci -nnk -d 10de:2204Expected output:
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1) Subsystem: NVIDIA Corporation Device [10de:1492] Kernel driver in use: vfio-pci Kernel modules: nouveau, nvidia_drm, nvidiaThe key line is Kernel driver in use: vfio-pci. If it says nouveau or nvidia, something went wrong — re-check the blacklist and initramfs update.
Step 5: VM Configuration — Q35, OVMF, and the Args Lines
This is where most guides fall apart. The Proxmox web UI is fine for getting the basics in, but GPU passthrough needs some manual config file editing.
Create your VM via the Proxmox web UI first. Pick:
- Machine type: Q35 (not i440fx — Q35 handles PCIe passthrough correctly)
- BIOS: OVMF (UEFI) — required for GPU passthrough on consumer cards
- CPU: Host (you want the real CPU to show up in the guest for performance)
Then add your PCI device through the UI: Hardware → Add → PCI Device. Select your GPU, enable “All Functions”, enable “ROM-Bar”, and if you’re doing a primary GPU passthrough enable “Primary GPU”.
The resulting config at /etc/pve/qemu-server/100.conf should look something like this (VM ID 100 here, adjust for yours):
agent: 1bios: ovmfboot: order=scsi0;ide2;net0cores: 8cpu: hostefidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4Mhostpci0: 0000:09:00.0,allfunctions=1,pcie=1,rombar=1,x-vga=1machine: q35memory: 32768name: llm-vmnet0: virtio=AA:BB:CC:DD:EE:FF,bridge=vmbr0numa: 1ostype: l26scsi0: local-lvm:vm-100-disk-1,discard=on,iothread=1,size=200Gscsihw: virtio-scsi-singlesockets: 1vga: noneThe critical lines:
hostpci0: 0000:09:00.0,allfunctions=1,pcie=1,rombar=1,x-vga=1— passes the GPU. Theallfunctions=1covers the audio device automatically.pcie=1enables PCIe semantics instead of PCI.x-vga=1means this is the primary VGA device.vga: none— this goes withx-vga=1. You’re telling Proxmox the passed-through GPU is your only display output.numa: 1— important for multi-socket or high-core-count systems (more on this shortly).
The Code 43 Problem (Consumer NVIDIA Cards)
Here’s where NVIDIA has historically played games: NVIDIA consumer cards (GeForce line) used to detect when they were running inside a VM and return error Code 43, disabling themselves. This was NVIDIA’s way of pushing people toward Quadro/Tesla cards for VM use.
The workaround used to be hiding the hypervisor from the guest by adding to your VM config:
args: -cpu 'host,kvm=off,hv_vendor_id=proxmoxKVM'The good news: As of NVIDIA driver 465+, this Code 43 behavior was removed for most consumer cards. If you’re running a recent Ubuntu 22.04/24.04 with a 500-series or 550-series driver inside the guest, you probably don’t need kvm=off anymore.
The bad news: Some older driver versions, some specific card SKUs, and some edge cases still hit it. If your guest shows an error code 43 in Device Manager (Windows) or the card doesn’t initialize in Linux, add the args line above as your first troubleshooting step.
For Linux guests running Ollama or vLLM, you typically won’t hit Code 43 at all — it was primarily a Windows/GeForce gaming restriction.
Resizable BAR Drama
Resizable BAR (ReBAR) lets the CPU access the full GPU VRAM directly instead of through a 256MB window. It’s a performance feature, and modern GPUs support it. Inside a VM, it can cause problems.
If your VM fails to start with errors about BAR size, add this to your VM config:
hostpci0: 0000:09:00.0,allfunctions=1,pcie=1,rombar=1,x-vga=1,romfile=gpu.romOr, simpler: disable ReBAR in your BIOS for the slot your GPU lives in. For LLM workloads, the ReBAR performance gain is marginal compared to the passthrough headache it causes.
NUMA Pinning for Serious Performance
If your system has multiple NUMA nodes (common on EPYC, Threadripper, or dual-Xeon boards), make sure your VM’s vCPUs and memory are on the same NUMA node as the GPU. A mismatch tanks throughput because every GPU DMA operation crosses an interconnect.
Check your NUMA topology:
lstopo --of text | grep -A5 "NUMANode"And find which NUMA node your GPU lives on:
cat /sys/bus/pci/devices/0000:09:00.0/numa_nodeIf it returns 1, your GPU is on NUMA node 1. In Proxmox, pin the VM to cores on that node:
numa: 1cpuunits: 1024numanode0: cpus=8-15,hostnodes=1,memory=32768,policy=bindFor a single-socket consumer box (AM5, LGA1700), you have one NUMA node and can ignore all of this. Lucky you.
Verifying It Works Inside the Guest
Boot your VM. If you’re using a headless Linux guest (which is the right call for Ollama/vLLM anyway), SSH in and install the NVIDIA driver normally:
# Ubuntu/Debian guestapt install nvidia-driver-550 nvidia-utils-550rebootAfter reboot:
nvidia-smiExpected output:
+-----------------------------------------------------------------------------------------+| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 ||-----------------------------------------+------------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+========================+======================|| 0 NVIDIA GeForce RTX 3090 Off | 00000000:06:10.0 Off | N/A || 30% 42C P8 25W / 350W | 0MiB / 24576MiB | 0% Default |+-----------------------------------------+------------------------+----------------------+If you see your GPU listed: you’re done. Go install Ollama:
curl -fsSL https://ollama.com/install.sh | shollama run llama3.2And watch the GPU memory climb as the model loads. Genuinely satisfying.
pcie_acs_override — When Your IOMMU Groups Are a Mess
If your GPU is stuck in a group with other devices you can’t pass through and can’t remove, the ACS (Access Control Services) override patch forces the kernel to split IOMMU groups. Proxmox ships with it available via a kernel boot flag:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction"Use this only if you need it. The ACS override weakens isolation between devices — it’s the reason IOMMU groups exist in the first place. On a home lab machine that isn’t running untrusted VMs with direct hardware access, it’s fine. On a production multi-tenant server, it’s not something you want to explain to a security auditor.
If you’re on a modern X570/B550 or Z690/Z790 board with a decent PCIe topology, you probably don’t need it. The ACS override is mostly a problem on older boards and budget chipsets that dump everything into one or two groups.
Single GPU vs Two GPU: The Decision
Here’s the honest breakdown:
Single GPU (passthrough only)
- Host has no display output once the VM claims the GPU
- You lose the GPU on VM shutdown — it doesn’t auto-rebind to the host
- Workarounds exist (hook scripts to rebind, virtual displays) but they’re fragile
- Best when: dedicated inference box, no gaming VM needed, headless operation
Two GPU (one for host/gaming, one for VMs)
- GPU 1: stays on host or goes to Windows gaming VM
- GPU 2: passed through to Linux inference VM
- Clean separation, both VMs can run simultaneously
- Best when: you want to game on Windows and run inference concurrently
- Cost: you need two PCIe x16 slots and two GPUs
For a dedicated LLM server where you SSH in and don’t need a desktop: single GPU is fine. Run Proxmox headless, pass the GPU to your Linux VM permanently, and forget the host has a display. The VM runs Ollama/vLLM, you hit it over the network, done.
For a dual-use box where you also want to game: two GPUs is the right answer. Don’t fight the single-GPU situation for gaming — it works, but every time you switch you’re rebooting VMs and rebinding drivers, and that gets old faster than you’d think.
When Things Go Wrong
Quick reference for the most common failure modes:
- VM won’t start, “rombar” error: Try removing
rombar=1from the hostpci line, or dump the vBIOS from the host before binding vfio and pass it asromfile. - Code 43 in Windows guest: Add
args: -cpu 'host,kvm=off,hv_vendor_id=proxmoxKVM'to the VM config. - nvidia-smi works but CUDA apps fail: Make sure you installed
nvidia-utilsand that the driver version matches the CUDA version in your container/app. - GPU not showing in VM, lspci empty: vfio-pci didn’t bind on the host. Check
lspci -nnkand verify initramfs was rebuilt. - VM hangs on boot with GPU: ReBAR issue. Disable in BIOS or pass
romfileexplicitly.
Your 2 AM self will appreciate having this list bookmarked.