Skip to content
Go back

Immich Hardware Acceleration: Stop Cooking Your CPU

By SumGuy 10 min read
Immich Hardware Acceleration: Stop Cooking Your CPU

The 4 AM Fan Noise Problem

You set up Immich on Friday night. By Saturday morning your server sounds like a Dyson with grievances, every CPU core is pinned at 100%, and the job queue still says “Smart Search: 38,412 remaining.”

Welcome to the part of self-hosting Google Photos that nobody warns you about. Immich does two genuinely heavy things by default: it transcodes every video you upload (so the mobile app doesn’t have to deal with HEVC on a five-year-old phone), and it runs a stack of ML models — CLIP for smart search, a face detection model, a face recognition model — over every single asset.

On a pure CPU path, that’s hours per thousand photos. On hardware-accelerated paths, it’s minutes. The catch is that Immich’s hardware acceleration is split across two separate containers and two separate config knobs, and the docs assume you already know which one does what. So that’s what we’re fixing first.

Full example: Working compose snippets for QSV, VAAPI, and NVENC live at github.com/KingPin/sumguy-examples/self-hosting/immich-hardware-acceleration.

What Actually Gets Accelerated

Two different containers, two different jobs.

immich-server handles video transcoding. Anything with frames. When you upload a 4K HEVC clip from your phone, this is the container that re-encodes it to a friendlier mobile preview. Hardware acceleration here means QSV (Intel iGPU), VAAPI (Intel/AMD), NVENC (NVIDIA), or RKMPP (Rockchip).

immich-machine-learning handles ML inference. CLIP embeddings (for the “show me photos with dogs” search), face detection, face recognition. Hardware acceleration here means CUDA (NVIDIA), OpenVINO (Intel iGPU/CPU AVX), ROCm (AMD), Apple Neural Engine (Mac), or Rockchip NPU.

Notice the asymmetry: a single Intel N100 mini PC can hardware-accelerate both paths — QSV for transcoding and OpenVINO for ML — using the same iGPU and no discrete GPU at all. That’s the configuration nobody talks about, and it’s the cheapest way to make Immich behave on home lab hardware.

Path 1: Intel iGPU (the home-lab sweet spot)

If you’re on an N100, N150, an old NUC, or any Intel chip with a UHD/Iris GPU, this is your path. Cheap, low-power, and good enough.

Transcoding with QSV

Drop a hwaccel.transcoding.yml next to your main compose file. Immich ships these as separate compose fragments by design — you pick the one for your hardware and include it.

hwaccel.transcoding.yml
services:
immich-server:
devices:
- /dev/dri:/dev/dri
group_add:
- "${VIDEO_GID:-44}"
- "${RENDER_GID:-104}"

Then in your main docker-compose.yml, extend the server service:

docker-compose.yml
services:
immich-server:
image: ghcr.io/immich-app/immich-server:release
extends:
file: hwaccel.transcoding.yml
service: immich-server
# rest of your config...

Find the right group IDs once with:

Terminal window
getent group video render
# video:x:44:
# render:x:104:

Drop those into a .env next to your compose file and you’re done at the host level. In the Immich admin UI go to Administration → System Settings → Video Transcoding → Hardware Acceleration and pick QSV. Re-run the “Transcode Videos” job — you’ll see GPU utilization in intel_gpu_top instead of cores melting.

ML with OpenVINO

Swap the ML container image. There’s a separate tag specifically for OpenVINO:

docker-compose.yml
immich-machine-learning:
image: ghcr.io/immich-app/immich-machine-learning:release-openvino
devices:
- /dev/dri:/dev/dri
group_add:
- "${RENDER_GID:-104}"
# rest of config...

Then in Administration → System Settings → Machine Learning flip Execution Provider to OpenVINO. The container will pull the OpenVINO-flavored model weights on first run (a few hundred MB), then smart search and face detection run on the iGPU.

On an N100 this turned a 28,000-asset reindex from ~7 hours to ~55 minutes in my notebook. Your mileage will vary with image resolution and model choice, but the order of magnitude is real.

Path 2: NVIDIA GPU

If you’ve got a discrete NVIDIA card — anything from a Quadro P400 you bought used for $40 up to a 4090 you definitely don’t need — this is the bigger hammer.

Prerequisites

You need the NVIDIA Container Toolkit installed on the host:

Terminal window
# Debian/Ubuntu
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Sanity check:

Terminal window
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

If nvidia-smi prints your card, you’re good.

Transcoding with NVENC

hwaccel.transcoding.yml
services:
immich-server:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities:
- gpu
- compute
- video

In the admin UI set transcoding hardware acceleration to NVENC. One note: consumer NVIDIA cards historically capped concurrent NVENC sessions at 3-8 depending on generation. The community-maintained patched driver removes that cap, but it’s not officially supported. For a single-user Immich, the stock limit is never going to bite you.

ML with CUDA

docker-compose.yml
immich-machine-learning:
image: ghcr.io/immich-app/immich-machine-learning:release-cuda
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities:
- gpu

Set execution provider to CUDA in the admin UI. On a $40 used Quadro P400 (2GB, two-slot), CLIP embedding for 10K images takes about 8 minutes. On a 3060 it’s under two. Past that you’re mostly waiting on disk.

Path 3: AMD (the “it depends” path)

AMD’s two acceleration paths in Immich are VAAPI (for transcoding, works fine on basically any Polaris-or-newer card with the kernel amdgpu driver) and ROCm (for ML, which is where things get spicy).

Transcoding with VAAPI

Identical device passthrough as Intel:

hwaccel.transcoding.yml
services:
immich-server:
devices:
- /dev/dri:/dev/dri
group_add:
- "${VIDEO_GID:-44}"
- "${RENDER_GID:-104}"

Pick VAAPI in the admin UI. Done. AMD’s video encode quality is generally a step behind Intel/NVIDIA for the same bitrate, but for phone-preview transcodes nobody will care.

ML with ROCm

docker-compose.yml
immich-machine-learning:
image: ghcr.io/immich-app/immich-machine-learning:release-rocm
devices:
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
group_add:
- "${VIDEO_GID:-44}"
- "${RENDER_GID:-104}"

ROCm in containers works, when it works. Driver/kernel version pairing matters a lot. If you’ve already got working ROCm on the host (you’d know — you’d have suffered through it), this’ll just go. If you don’t, and your AMD card is anything other than a current-gen RDNA3/RDNA4, you may be happier letting ML run on CPU OpenVINO and saving the ROCm exorcism for another weekend.

Path 4: Rockchip (Pi-class boards with NPUs)

If you’re running Immich on an Orange Pi 5, Radxa Rock 5B, or similar RK3588-class board, there’s a Rockchip-specific ML image (release-rknn) that uses the on-chip NPU. The setup is fiddly — you need the right kernel modules and the right librknnrt.so mounted in — but once it works, smart search runs on a $150 board at speeds that embarrass much beefier x86 setups for this specific workload.

I’ve documented the exact RK3588 setup separately because it’s enough material for its own post. The short version: it’s possible, it’s worth doing if you’re already on that hardware, and it’s not the path I’d pick first if you’re starting from scratch.

Verifying It’s Actually Working

The classic mistake: configure hardware acceleration, see job throughput go up slightly, declare victory, never actually check what the GPU is doing. Don’t be that person.

Intel:

Terminal window
# Install first: sudo apt install intel-gpu-tools
sudo intel_gpu_top

You should see the Video and Render/3D engines busy when transcoding/ML is running. If everything’s at 0% during a transcode job, the container fell back to CPU — check logs.

NVIDIA:

Terminal window
watch -n 1 nvidia-smi

You want to see immich-server or immich-machine-learning processes in the bottom table and non-zero GPU utilization.

AMD:

Terminal window
radeontop
# or
rocm-smi

In the Immich admin UI, the Jobs page also shows throughput. A QSV-accelerated thumbnail job on an N100 should run at hundreds of assets per minute, not single digits.

The Gotchas Nobody Warns You About

Group IDs aren’t portable. video is 44 on Debian/Ubuntu but 39 on Alpine. render is 104 on Debian, 109 on Ubuntu 24.04 in some images, different again on Arch. Always check with getent group on the actual host and put the right number in .env. If the container can’t open /dev/dri/renderD128, it falls back to CPU silently. No error, just slow.

Privileged Proxmox LXC. If you’re running Immich inside an LXC container on Proxmox, you need to pass the GPU device into the LXC, then into Docker. That’s two layers of cgroup.devices.allow and lxc.mount.entry lines in the LXC config, plus the normal container device mounts. Privileged container, GPU passthrough enabled. Unprivileged LXCs with GPU passthrough are doable but involve UID mapping for the render group, and at that point a VM is probably less pain.

The ML image tag matters. release is the CPU image. release-cuda, release-openvino, release-rocm, release-rknn are the accelerated variants. People paste the upstream compose file, leave the tag as release, set the execution provider to CUDA in the UI, and then wonder why nothing’s faster. The image has to match the provider.

First-run model download is on CPU. When you flip the execution provider, the model gets re-downloaded in the new format. That first run will look slow even though acceleration is working — give it the round to settle.

Mixed concurrency caps. In System Settings → Job → Concurrency you can crank up concurrent jobs. Don’t. On a small iGPU, two concurrent transcodes will fight for the same encode block and run slower than one at a time. Start at concurrency 1 for the heavy jobs (transcoding, smart search, face detection) and only raise it if intel_gpu_top / nvidia-smi show the GPU is genuinely idle between assets.

What Hardware to Actually Buy

If you’re picking a box specifically to run Immich and your library is, say, 50-100K assets:

There’s more general guidance on what to put in the rack in Home Lab Hardware Guide 2026 — it covers power, noise, and the “is it worth a used R730” question that hits every home labber eventually.

Bottom Line

Immich on stock CPU works. Immich on the right acceleration path is a different product. An iGPU you already own can do both transcoding and ML — that’s the configuration most people miss because the docs treat them as separate features. Match the ML image tag to the execution provider, get your group IDs right, watch intel_gpu_top or nvidia-smi to confirm the GPU is actually doing the work, and your fans will stop sounding like a hairdryer at 4 AM.

Your photos are still encrypted, still yours, still indexed by something other than Google. They just got there in less than a weekend now.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Hoist: Label-Driven Docker Updates
Next Post
Open WebUI Tools, Functions & Pipelines: Extend Your Local LLM

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts