Immich Hardware Acceleration: Stop Cooking Your CPU

The 4 AM Fan Noise Problem

You set up Immich on Friday night. By Saturday morning your server sounds like a Dyson with grievances, every CPU core is pinned at 100%, and the job queue still says “Smart Search: 38,412 remaining.”

Welcome to the part of self-hosting Google Photos that nobody warns you about. Immich does two genuinely heavy things by default: it transcodes every video you upload (so the mobile app doesn’t have to deal with HEVC on a five-year-old phone), and it runs a stack of ML models, CLIP for smart search, a face detection model, a face recognition model, over every single asset.

On a pure CPU path, that’s hours per thousand photos. On hardware-accelerated paths, it’s minutes. The catch is that Immich’s hardware acceleration is split across two separate containers and two separate config knobs, and the docs assume you already know which one does what. So that’s what we’re fixing first.

Full example: Working compose snippets for QSV, VAAPI, and NVENC live at github.com/KingPin/sumguy-examples/self-hosting/immich-hardware-acceleration.

What Actually Gets Accelerated

Two different containers, two different jobs.

immich-server handles video transcoding. Anything with frames. When you upload a 4K HEVC clip from your phone, this is the container that re-encodes it to a friendlier mobile preview. Hardware acceleration here means QSV (Intel iGPU), VAAPI (Intel/AMD), NVENC (NVIDIA), or RKMPP (Rockchip).

immich-machine-learning handles ML inference. CLIP embeddings (for the “show me photos with dogs” search), face detection, face recognition. Hardware acceleration here means CUDA (NVIDIA), OpenVINO (Intel iGPU/CPU AVX), ROCm (AMD), Apple Neural Engine (Mac), or Rockchip NPU.

Notice the asymmetry: a single Intel N100 mini PC can hardware-accelerate both paths, QSV for transcoding and OpenVINO for ML, using the same iGPU and no discrete GPU at all. That’s the configuration nobody talks about, and it’s the cheapest way to make Immich behave on home lab hardware.

Path 1: Intel iGPU (the home-lab sweet spot)

If you’re on an N100, N150, an old NUC, or any Intel chip with a UHD/Iris GPU, this is your path. Cheap, low-power, and good enough.

Transcoding with QSV

Drop a hwaccel.transcoding.yml next to your main compose file. Immich ships these as separate compose fragments by design, you pick the one for your hardware and include it.

services:
  immich-server:
    devices:
      - /dev/dri:/dev/dri
    group_add:
      - "${VIDEO_GID:-44}"
      - "${RENDER_GID:-104}"

Then in your main docker-compose.yml, extend the server service:

services:
  immich-server:
    image: ghcr.io/immich-app/immich-server:release
    extends:
      file: hwaccel.transcoding.yml
      service: immich-server
    # rest of your config...

Find the right group IDs once with:

getent group video render
# video:x:44:
# render:x:104:

Drop those into a .env next to your compose file and you’re done at the host level. In the Immich admin UI go to Administration → System Settings → Video Transcoding → Hardware Acceleration and pick QSV. Re-run the “Transcode Videos” job, you’ll see GPU utilization in intel_gpu_top instead of cores melting.

ML with OpenVINO

Swap the ML container image. There’s a separate tag specifically for OpenVINO:

  immich-machine-learning:
    image: ghcr.io/immich-app/immich-machine-learning:release-openvino
    devices:
      - /dev/dri:/dev/dri
    group_add:
      - "${RENDER_GID:-104}"
    # rest of config...

Then in Administration → System Settings → Machine Learning flip Execution Provider to OpenVINO. The container will pull the OpenVINO-flavored model weights on first run (a few hundred MB), then smart search and face detection run on the iGPU.

On an N100 this turned a 28,000-asset reindex from ~7 hours to ~55 minutes in my notebook. Your mileage will vary with image resolution and model choice, but the order of magnitude is real.

Path 2: NVIDIA GPU

If you’ve got a discrete NVIDIA card, anything from a Quadro P400 you bought used for $40 up to a 4090 you definitely don’t need, this is the bigger hammer.

Prerequisites

You need the NVIDIA Container Toolkit installed on the host:

# Debian/Ubuntu
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Sanity check:

docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

If nvidia-smi prints your card, you’re good.

Transcoding with NVENC

services:
  immich-server:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
                - compute
                - video

In the admin UI set transcoding hardware acceleration to NVENC. One note: consumer NVIDIA cards historically capped concurrent NVENC sessions at 3-8 depending on generation. The community-maintained patched driver removes that cap, but it’s not officially supported. For a single-user Immich, the stock limit is never going to bite you.

ML with CUDA

  immich-machine-learning:
    image: ghcr.io/immich-app/immich-machine-learning:release-cuda
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu

Set execution provider to CUDA in the admin UI. On a $40 used Quadro P400 (2GB, two-slot), CLIP embedding for 10K images takes about 8 minutes. On a 3060 it’s under two. Past that you’re mostly waiting on disk.

Path 3: AMD (the “it depends” path)

AMD’s two acceleration paths in Immich are VAAPI (for transcoding, works fine on basically any Polaris-or-newer card with the kernel amdgpu driver) and ROCm (for ML, which is where things get spicy).

Transcoding with VAAPI

Identical device passthrough as Intel:

services:
  immich-server:
    devices:
      - /dev/dri:/dev/dri
    group_add:
      - "${VIDEO_GID:-44}"
      - "${RENDER_GID:-104}"

Pick VAAPI in the admin UI. Done. AMD’s video encode quality is generally a step behind Intel/NVIDIA for the same bitrate, but for phone-preview transcodes nobody will care.

ML with ROCm

  immich-machine-learning:
    image: ghcr.io/immich-app/immich-machine-learning:release-rocm
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - "${VIDEO_GID:-44}"
      - "${RENDER_GID:-104}"

ROCm in containers works, when it works. Driver/kernel version pairing matters a lot. If you’ve already got working ROCm on the host (you’d know, you’d have suffered through it), this’ll just go. If you don’t, and your AMD card is anything other than a current-gen RDNA3/RDNA4, you may be happier letting ML run on CPU OpenVINO and saving the ROCm exorcism for another weekend.

Path 4: Rockchip (Pi-class boards with NPUs)

If you’re running Immich on an Orange Pi 5, Radxa Rock 5B, or similar RK3588-class board, there’s a Rockchip-specific ML image (release-rknn) that uses the on-chip NPU. The setup is fiddly, you need the right kernel modules and the right librknnrt.so mounted in, but once it works, smart search runs on a $150 board at speeds that embarrass much beefier x86 setups for this specific workload.

I’ve documented the exact RK3588 setup separately because it’s enough material for its own post. The short version: it’s possible, it’s worth doing if you’re already on that hardware, and it’s not the path I’d pick first if you’re starting from scratch.

Verifying It’s Actually Working

The classic mistake: configure hardware acceleration, see job throughput go up slightly, declare victory, never actually check what the GPU is doing. Don’t be that person.

Intel:

# Install first: sudo apt install intel-gpu-tools
sudo intel_gpu_top

You should see the Video and Render/3D engines busy when transcoding/ML is running. If everything’s at 0% during a transcode job, the container fell back to CPU, check logs.

NVIDIA:

watch -n 1 nvidia-smi

You want to see immich-server or immich-machine-learning processes in the bottom table and non-zero GPU utilization.

AMD:

radeontop
# or
rocm-smi

In the Immich admin UI, the Jobs page also shows throughput. A QSV-accelerated thumbnail job on an N100 should run at hundreds of assets per minute, not single digits.

The Gotchas Nobody Warns You About

Group IDs aren’t portable. video is 44 on Debian/Ubuntu but 39 on Alpine. render is 104 on Debian, 109 on Ubuntu 24.04 in some images, different again on Arch. Always check with getent group on the actual host and put the right number in .env. If the container can’t open /dev/dri/renderD128, it falls back to CPU silently. No error, just slow.

Privileged Proxmox LXC. If you’re running Immich inside an LXC container on Proxmox, you need to pass the GPU device into the LXC, then into Docker. That’s two layers of cgroup.devices.allow and lxc.mount.entry lines in the LXC config, plus the normal container device mounts. Privileged container, GPU passthrough enabled. Unprivileged LXCs with GPU passthrough are doable but involve UID mapping for the render group, and at that point a VM is probably less pain.

The ML image tag matters. release is the CPU image. release-cuda, release-openvino, release-rocm, release-rknn are the accelerated variants. People paste the upstream compose file, leave the tag as release, set the execution provider to CUDA in the UI, and then wonder why nothing’s faster. The image has to match the provider.

First-run model download is on CPU. When you flip the execution provider, the model gets re-downloaded in the new format. That first run will look slow even though acceleration is working, give it the round to settle.

Mixed concurrency caps. In System Settings → Job → Concurrency you can crank up concurrent jobs. Don’t. On a small iGPU, two concurrent transcodes will fight for the same encode block and run slower than one at a time. Start at concurrency 1 for the heavy jobs (transcoding, smart search, face detection) and only raise it if intel_gpu_top / nvidia-smi show the GPU is genuinely idle between assets.

What Hardware to Actually Buy

If you’re picking a box specifically to run Immich and your library is, say, 50-100K assets:

Cheapest competent option: Used Intel N100 / N150 mini PC ($150-250). QSV for transcoding, OpenVINO for ML, both on the iGPU. Sips power. Will index your whole library in an evening.
One step up: A used Dell OptiPlex or HP EliteDesk Micro with an 8th-gen-or-newer Intel CPU. Same iGPU acceleration path, more cores for everything else you’re inevitably going to run on the same box.
If you already have NVIDIA: Any Pascal-or-newer card works. A used Quadro P400/P1000 is the cheapest dedicated NVENC path that fits in a small chassis. Past a 1660-class card, you’re overspending for Immich alone.
Avoid: Older Pi-class boards without NPUs (Pi 4 / Pi 5 base model). Technically functional, practically painful for ML. The Rockchip RK3588 boards are the exception because of the on-chip NPU.

There’s more general guidance on what to put in the rack in Home Lab Hardware Guide 2026, it covers power, noise, and the “is it worth a used R730” question that hits every home labber eventually.

Immich vs PhotoPrism: Escape Google Photos, the big picture comparison if you’re still picking
Home Lab Hardware Guide 2026: what to actually run all this on
GPU Memory Math: if you’ve got the same GPU doing Immich and an LLM

Bottom Line

Immich on stock CPU works. Immich on the right acceleration path is a different product. An iGPU you already own can do both transcoding and ML, that’s the configuration most people miss because the docs treat them as separate features. Match the ML image tag to the execution provider, get your group IDs right, watch intel_gpu_top or nvidia-smi to confirm the GPU is actually doing the work, and your fans will stop sounding like a hairdryer at 4 AM.

Your photos are still encrypted, still yours, still indexed by something other than Google. They just got there in less than a weekend now.

Immich Hardware Acceleration: Stop Cooking Your CPU

The 4 AM Fan Noise Problem

What Actually Gets Accelerated

Path 1: Intel iGPU (the home-lab sweet spot)

Transcoding with QSV

ML with OpenVINO

Path 2: NVIDIA GPU

Prerequisites

Transcoding with NVENC

ML with CUDA

Path 3: AMD (the “it depends” path)

Transcoding with VAAPI

ML with ROCm

Path 4: Rockchip (Pi-class boards with NPUs)

Verifying It’s Actually Working

The Gotchas Nobody Warns You About

What Hardware to Actually Buy

Bottom Line

Responses from around the web

Discussion

Related Posts

Home Assistant Add-Ons vs Docker Containers

Hoist: Label-Driven Docker Updates

De-Googling: Self-Hosted Replacements for Google Apps

Watchtower: Safe Container Auto-Updates

Immich Hardware Acceleration: Stop Cooking Your CPU

The 4 AM Fan Noise Problem

What Actually Gets Accelerated

Path 1: Intel iGPU (the home-lab sweet spot)

Transcoding with QSV

ML with OpenVINO

Path 2: NVIDIA GPU

Prerequisites

Transcoding with NVENC

ML with CUDA

Path 3: AMD (the “it depends” path)

Transcoding with VAAPI

ML with ROCm

Path 4: Rockchip (Pi-class boards with NPUs)

Verifying It’s Actually Working

The Gotchas Nobody Warns You About

What Hardware to Actually Buy

Related Reading

Bottom Line

Responses from around the web

Discussion

Related Posts

Home Assistant Add-Ons vs Docker Containers

Hoist: Label-Driven Docker Updates

De-Googling: Self-Hosted Replacements for Google Apps

Watchtower: Safe Container Auto-Updates