cAdvisor + Prometheus: Per-Container Metrics Done Right

You Have 30 Containers and One Mystery Memory Hog

You’ve been watching your server slowly choke for three days. free -h says you’re at 94% memory. You do the reasonable thing, you guess. Probably Jellyfin. Maybe it’s that Postgres container you spun up for something you’ve since forgotten. Could be Nextcloud doing its hourly scan of your 2 TB photo library.

You guess wrong. It’s a tiny Uptime Kuma instance that somehow ballooned to 800 MB because of a bug you’d have caught immediately if you had per-container metrics.

This is the exact problem cAdvisor and Prometheus solve. Not “here’s a vague overview of your host”, here’s exactly which container, exactly how much, exactly when it started.

Let’s wire it up properly.

What cAdvisor Actually Is

cAdvisor (Container Advisor) is a Google-maintained exporter that reads Linux cgroups and translates container resource usage into Prometheus metrics. It mounts the Docker socket and cgroup filesystem, walks your running containers, and exposes a /metrics endpoint with everything labeled by container name, image, and Docker labels.

The key metric families you’ll care about:

container_cpu_usage_seconds_total: cumulative CPU time by container
container_memory_rss: RSS (resident set size), the real memory number, not the inflated container_memory_usage_bytes
container_network_receive_bytes_total / container_network_transmit_bytes_total
container_network_receive_drop_total: packet drops, great for spotting network congestion
container_fs_reads_bytes_total / container_fs_writes_bytes_total

It also exposes a per-container /metrics/cadvisor endpoint and a self-hosted web UI (accessible on port 8080), which is fine for a quick look but not something you’ll use long-term.

The Full Stack: Compose File

Here’s a production-usable Compose stack. This is everything: cAdvisor, Node Exporter, Prometheus, and Grafana.

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.56.2
    container_name: cadvisor
    privileged: true
    restart: unless-stopped
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro
      - /dev/disk:/dev/disk:ro
    devices:
      - /dev/kmsg
    ports:
      - "8080:8080"
    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:v1.11.1
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - "--path.procfs=/host/proc"
      - "--path.rootfs=/rootfs"
      - "--path.sysfs=/host/sys"
      - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
    ports:
      - "9100:9100"
    networks:
      - monitoring

  prometheus:
    image: prom/prometheus:v3.12.0
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--storage.tsdb.retention.time=15d"
      - "--web.enable-lifecycle"
    ports:
      - "9090:9090"
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:11.6.1
    container_name: grafana
    restart: unless-stopped
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"
    depends_on:
      - prometheus
    networks:
      - monitoring

volumes:
  prometheus_data:
  grafana_data:

networks:
  monitoring:
    driver: bridge

The privileged: true on cAdvisor isn’t great from a security standpoint, but it’s required to read cgroup data on most Linux setups. On newer kernels with cgroup v2, you can get away without privileged by mounting just the cgroups you need, but honestly, this is your homelab and that rabbit hole isn’t worth chasing today.

Prometheus Config with Label Relabeling

The default cAdvisor scrape dumps every metric including some you’ll never look at. More importantly, the default label set doesn’t give you Docker Compose service names, just raw container names like myproject_web_1 or the newer myproject-web-1.

Here’s a Prometheus config that pulls in Docker labels and does some cleanup:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node-exporter"
    static_configs:
      - targets: ["node-exporter:9100"]

  - job_name: "cadvisor"
    scrape_interval: 15s
    static_configs:
      - targets: ["cadvisor:8080"]
    metric_relabel_configs:
      # Drop noisy per-device filesystem metrics you don't query
      - source_labels: [__name__]
        regex: "container_fs_io_current|container_blkio_.*"
        action: drop

      # Pull the compose service name from container_label_com_docker_compose_service
      - source_labels: [container_label_com_docker_compose_service]
        target_label: compose_service

      # Pull the compose project name
      - source_labels: [container_label_com_docker_compose_project]
        target_label: compose_project

      # Drop internal cAdvisor housekeeping containers (pause containers, etc.)
      - source_labels: [container_label_io_kubernetes_container_name]
        regex: ".+"
        action: drop

      # Drop metrics with empty container names (host-level cAdvisor data)
      - source_labels: [container]
        regex: ""
        action: drop

The metric_relabel_configs section runs after scraping, which lets you rename, drop, or create labels based on existing ones. The key ones here:

container_label_com_docker_compose_service → compose_service: This is what Docker injects automatically when you use Compose. Now your PromQL can filter by service name rather than the ugly container name.
container_label_com_docker_compose_project: The project (directory name or --project-name value). Useful when you have multiple Compose stacks on one host.

Real PromQL Queries That Actually Matter

Open Prometheus at http://your-host:9090 and try these. All code blocks use text since most PromQL renderers don’t have a dedicated syntax highlighter.

Top 5 containers by RSS memory:

topk(5, container_memory_rss{container!=""})

That container!="" filter drops the host-level aggregates cAdvisor also exports. Without it, you’ll see a confusing entry for your whole system at the top.

CPU hot loop, containers using more than 20% of a core:

rate(container_cpu_usage_seconds_total{container!=""}[5m]) > 0.2

This gives you per-container CPU rate as a fraction of one core. A value of 1.0 means it’s pegging a full CPU. Multiply by 100 to get percentage. Wrap it in topk(5, ...) to find your worst offenders.

Network receive drops by container (non-zero only):

rate(container_network_receive_drop_total{container!=""}[5m]) > 0

If any container shows up here consistently, something is saturating the network path, either the container itself is generating too much traffic or the underlying host NIC is the bottleneck.

Memory usage as percentage of a limit (only if you set limits):

container_memory_rss{container!=""}
  / container_spec_memory_limit_bytes{container!=""}
  * 100

This only returns useful data for containers with a memory limit set. Containers without limits will divide by 0 and disappear from results, which is the correct behavior.

Disk write rate per container:

topk(5, rate(container_fs_writes_bytes_total{container!=""}[5m]))

This one surfaces your I/O heavy hitters. Great for catching runaway log writers or databases doing more work than expected.

Grafana Dashboard

Don’t build the dashboard from scratch. Use cAdvisor’s official Grafana dashboard, Dashboard ID 19792 (search Grafana.com for “cAdvisor exporter”). It’s maintained by the cAdvisor team and covers all the main container_* metrics out of the box.

To import it:

Open Grafana → Dashboards → Import
Enter 19792 in the “Import via grafana.com” field
Select your Prometheus data source
Done

The dashboard gives you a per-container memory/CPU/network overview with dropdown filters. If you did the relabeling above, add a variable for compose_service and filter by service name instead of digging through raw container names.

The High-Cardinality Warning You Actually Need to Read

Here’s the honest talk: cAdvisor generates a lot of metrics. For each container, across all label combinations, you can easily end up with 200 to 300 metric series per container. With 30 containers, that’s 6,000 to 9,000 active series before Node Exporter adds its own pile.

For a homelab Prometheus with 15-day retention, this is completely fine. A modern server with 16 GB RAM handles millions of series without breaking a sweat. But there are a few traps:

Ephemeral containers are the real killer. If you run containers that spin up and shut down frequently (CI runners, one-off tasks, Docker build jobs), each unique container name becomes a new label combination. After a week of ephemeral builds, you can have 50,000 dead series that Prometheus is still tracking. They’ll age out at the retention window, but they hurt query performance in the meantime.

Fix: drop the name label for short-lived containers with a metric_relabel_configs rule, or set a shorter retention window for those job scrapes.

Drop what you don’t use. The filesystem metrics per device are the noisiest. If you have 5 disk partitions and 30 containers, you get 150 label combinations just for filesystem reads. Add this to your metric_relabel_configs to cut the noise:

- source_labels: [__name__]
  regex: "container_fs_io_current|container_blkio_.*"
  action: drop

Tune the Scrape Interval for Your Use Case

The default 15-second scrape interval is right for general monitoring. You’ll catch most issues within a scrape cycle.

For tight CPU debugging, say you’re trying to catch a container that spikes for 3 seconds and goes quiet, drop to 5 seconds:

- job_name: "cadvisor"
  scrape_interval: 5s

This triples your ingest rate for that job, so don’t leave it there permanently. Set it tight, catch your bug, reset to 15s.

For anything less than 5 seconds, you’re fighting cAdvisor’s own collection frequency (it samples every 1 to 2 seconds internally) and diminishing returns kick in fast.

Honest Assessment: Is cAdvisor Worth It?

cAdvisor is not the lightest option. The container itself runs ~50 to 100 MB RSS. It requires privileged access. It generates hundreds of metrics per container by default. If you have 5 containers on a low-power mini PC, you might be burning 15% of your resources on observability.

Lighter alternatives worth knowing:

Docker daemon built-in metrics: since Docker 20.10+, enable "metrics-addr": "0.0.0.0:9323" in /etc/docker/daemon.json. You get basic engine-level metrics, not per-container breakdowns.
ctop: terminal UI, zero persistence, great for “what’s hot right now” diagnostics. Not Prometheus-compatible.
Beszel: lightweight agent-based monitoring with a slick UI. I’ve covered Beszel separately, it’s a great choice if you want something simpler that doesn’t require a full Prometheus stack.
Docker socket proxy + exporter: some setups use a Docker socket proxy to safely expose the Docker API, then use docker_state_exporter or similar to pull container states. Less data than cAdvisor, less overhead.

If your homelab has enough headroom and you want real per-container drill-down, cAdvisor + Prometheus is still the gold standard. If you’re running on an old NUC with 8 GB of RAM shared between 20 containers, weigh the overhead seriously before committing.

Putting It Together

Directory structure before you docker compose up -d:

monitoring/
  docker-compose.yml
  prometheus/
    prometheus.yml

That’s it. No extra setup, no config databases.

mkdir -p monitoring/prometheus
cd monitoring
# drop the compose file and prometheus.yml in place
docker compose up -d

Check everything came up:

docker compose ps

Verify Prometheus is scraping cAdvisor by opening http://your-host:9090/targets, you should see cadvisor (1/1 up) within 30 seconds. If it’s red, check that the monitoring network is resolving correctly and cAdvisor is actually healthy:

docker compose logs cadvisor --tail=20

Once targets are green, hit http://your-host:9090 and run the topk(5, container_memory_rss{container!=""}) query. You’ll immediately see which containers are eating your memory. Spoiler: it’s probably not the one you guessed.

The Bottom Line

Blind guessing which container is misbehaving is a great way to waste an afternoon. cAdvisor gives you per-container RSS, CPU rate, network drops, and disk I/O, all queryable with PromQL, all visualizable in Grafana in about 20 minutes of setup.

The full stack is four containers (cAdvisor + Node Exporter + Prometheus + Grafana), a Compose file, and a Prometheus config with a few relabel rules to make the labels actually useful. Use Dashboard ID 19792 in Grafana and you’re done.

If you’re running anything more than a handful of containers and you don’t have per-container metrics, you’re flying blind. Your future 2 AM self, staring at a container that’s been silently leaking memory for 72 hours, will appreciate having this in place.

cAdvisor + Prometheus: Per-Container Metrics Done Right

You Have 30 Containers and One Mystery Memory Hog

What cAdvisor Actually Is

The Full Stack: Compose File

Prometheus Config with Label Relabeling

Real PromQL Queries That Actually Matter

Grafana Dashboard

The High-Cardinality Warning You Actually Need to Read

Tune the Scrape Interval for Your Use Case

Honest Assessment: Is cAdvisor Worth It?

Putting It Together

The Bottom Line

Responses from around the web

Discussion

Related Posts

TIG: Telegraf + InfluxDB + Grafana

Promtail to Alloy Migration: A Practical Diff

LibreNMS for SNMP-Heavy Home Networks

SmokePing for Internet Connection Sanity

cAdvisor + Prometheus: Per-Container Metrics Done Right

You Have 30 Containers and One Mystery Memory Hog

What cAdvisor Actually Is

The Full Stack: Compose File

Prometheus Config with Label Relabeling

Real PromQL Queries That Actually Matter

Grafana Dashboard

The High-Cardinality Warning You Actually Need to Read

Tune the Scrape Interval for Your Use Case

Honest Assessment: Is cAdvisor Worth It?

Putting It Together

The Bottom Line

Related Reading

Responses from around the web

Discussion

Related Posts

TIG: Telegraf + InfluxDB + Grafana

Promtail to Alloy Migration: A Practical Diff

LibreNMS for SNMP-Heavy Home Networks

SmokePing for Internet Connection Sanity