Skip to content
Go back

cAdvisor + Prometheus: Per-Container Metrics Done Right

By SumGuy 10 min read
cAdvisor + Prometheus: Per-Container Metrics Done Right

You Have 30 Containers and One Mystery Memory Hog

You’ve been watching your server slowly choke for three days. free -h says you’re at 94% memory. You do the reasonable thing — you guess. Probably Jellyfin. Maybe it’s that Postgres container you spun up for something you’ve since forgotten. Could be Nextcloud doing its hourly scan of your 2 TB photo library.

You guess wrong. It’s a tiny Uptime Kuma instance that somehow ballooned to 800 MB because of a bug you’d have caught immediately if you had per-container metrics.

This is the exact problem cAdvisor and Prometheus solve. Not “here’s a vague overview of your host” — here’s exactly which container, exactly how much, exactly when it started.

Let’s wire it up properly.


What cAdvisor Actually Is

cAdvisor (Container Advisor) is a Google-maintained exporter that reads Linux cgroups and translates container resource usage into Prometheus metrics. It mounts the Docker socket and cgroup filesystem, walks your running containers, and exposes a /metrics endpoint with everything labeled by container name, image, and Docker labels.

The key metric families you’ll care about:

It also exposes a per-container /metrics/cadvisor endpoint and a self-hosted web UI (accessible on port 8080), which is fine for a quick look but not something you’ll use long-term.


The Full Stack: Compose File

Here’s a production-usable Compose stack. This is everything: cAdvisor, Node Exporter, Prometheus, and Grafana.

docker-compose.yml
services:
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.1
container_name: cadvisor
privileged: true
restart: unless-stopped
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
- /dev/disk:/dev/disk:ro
devices:
- /dev/kmsg
ports:
- "8080:8080"
networks:
- monitoring
node-exporter:
image: prom/node-exporter:v1.8.1
container_name: node-exporter
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- "--path.procfs=/host/proc"
- "--path.rootfs=/rootfs"
- "--path.sysfs=/host/sys"
- "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
ports:
- "9100:9100"
networks:
- monitoring
prometheus:
image: prom/prometheus:v2.52.0
container_name: prometheus
restart: unless-stopped
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_data:/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=15d"
- "--web.enable-lifecycle"
ports:
- "9090:9090"
networks:
- monitoring
grafana:
image: grafana/grafana:10.4.2
container_name: grafana
restart: unless-stopped
environment:
- GF_SECURITY_ADMIN_PASSWORD=changeme
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana_data:/var/lib/grafana
ports:
- "3000:3000"
depends_on:
- prometheus
networks:
- monitoring
volumes:
prometheus_data:
grafana_data:
networks:
monitoring:
driver: bridge

The privileged: true on cAdvisor isn’t great from a security standpoint, but it’s required to read cgroup data on most Linux setups. On newer kernels with cgroup v2, you can get away without privileged by mounting just the cgroups you need — but honestly, this is your homelab and that rabbit hole isn’t worth chasing today.


Prometheus Config with Label Relabeling

The default cAdvisor scrape dumps every metric including some you’ll never look at. More importantly, the default label set doesn’t give you Docker Compose service names — just raw container names like myproject_web_1 or the newer myproject-web-1.

Here’s a Prometheus config that pulls in Docker labels and does some cleanup:

prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node-exporter"
static_configs:
- targets: ["node-exporter:9100"]
- job_name: "cadvisor"
scrape_interval: 15s
static_configs:
- targets: ["cadvisor:8080"]
metric_relabel_configs:
# Drop noisy per-device filesystem metrics
- source_labels: [__name__]
regex: "container_fs_(reads|writes)_(bytes|completed)_total"
target_label: __tmp_drop
action: keep
# Pull the compose service name from container_label_com_docker_compose_service
- source_labels: [container_label_com_docker_compose_service]
target_label: compose_service
# Pull the compose project name
- source_labels: [container_label_com_docker_compose_project]
target_label: compose_project
# Drop internal cAdvisor housekeeping containers (pause containers, etc.)
- source_labels: [container_label_io_kubernetes_container_name]
regex: ".+"
action: drop
# Drop metrics with empty container names (host-level cAdvisor data)
- source_labels: [container]
regex: ""
action: drop

The metric_relabel_configs section runs after scraping, which lets you rename, drop, or create labels based on existing ones. The key ones here:


Real PromQL Queries That Actually Matter

Open Prometheus at http://your-host:9090 and try these. All code blocks use text since most PromQL renderers don’t have a dedicated syntax highlighter.

Top 5 containers by RSS memory:

topk(5, container_memory_rss{container!=""})

That container!="" filter drops the host-level aggregates cAdvisor also exports. Without it, you’ll see a confusing entry for your whole system at the top.

CPU hot loop — containers using more than 20% of a core:

rate(container_cpu_usage_seconds_total{container!=""}[5m]) > 0.2

This gives you per-container CPU rate as a fraction of one core. A value of 1.0 means it’s pegging a full CPU. Multiply by 100 to get percentage. Wrap it in topk(5, ...) to find your worst offenders.

Network receive drops by container (non-zero only):

rate(container_network_receive_drop_total{container!=""}[5m]) > 0

If any container shows up here consistently, something is saturating the network path — either the container itself is generating too much traffic or the underlying host NIC is the bottleneck.

Memory usage as percentage of a limit (only if you set limits):

container_memory_rss{container!=""}
/ container_spec_memory_limit_bytes{container!=""}
* 100

This only returns useful data for containers with a memory limit set. Containers without limits will divide by 0 and disappear from results, which is the correct behavior.

Disk write rate per container:

topk(5, rate(container_fs_writes_bytes_total{container!=""}[5m]))

This one surfaces your I/O heavy hitters. Great for catching runaway log writers or databases doing more work than expected.


Grafana Dashboard

Don’t build the dashboard from scratch. Use cAdvisor’s official Grafana dashboard — Dashboard ID 19792 (search Grafana.com for “cAdvisor exporter”). It’s maintained by the cAdvisor team and covers all the main container_* metrics out of the box.

To import it:

  1. Open Grafana → Dashboards → Import
  2. Enter 19792 in the “Import via grafana.com” field
  3. Select your Prometheus data source
  4. Done

The dashboard gives you a per-container memory/CPU/network overview with dropdown filters. If you did the relabeling above, add a variable for compose_service and filter by service name instead of digging through raw container names.


The High-Cardinality Warning You Actually Need to Read

Here’s the honest talk: cAdvisor generates a lot of metrics. For each container, across all label combinations, you can easily end up with 200–300 metric series per container. With 30 containers, that’s 6,000–9,000 active series before Node Exporter adds its own pile.

For a homelab Prometheus with 15-day retention, this is completely fine. A modern server with 16 GB RAM handles millions of series without breaking a sweat. But there are a few traps:

Ephemeral containers are the real killer. If you run containers that spin up and shut down frequently (CI runners, one-off tasks, Docker build jobs), each unique container name becomes a new label combination. After a week of ephemeral builds, you can have 50,000 dead series that Prometheus is still tracking. They’ll age out at the retention window, but they hurt query performance in the meantime.

Fix: drop the name label for short-lived containers with a metric_relabel_configs rule, or set a shorter retention window for those job scrapes.

Drop what you don’t use. The filesystem metrics per device are the noisiest. If you have 5 disk partitions and 30 containers, you get 150 label combinations just for filesystem reads. Add this to your metric_relabel_configs to cut the noise:

- source_labels: [__name__]
regex: "container_fs_io_current|container_blkio_.*"
action: drop

Tune the Scrape Interval for Your Use Case

The default 15-second scrape interval is right for general monitoring. You’ll catch most issues within a scrape cycle.

For tight CPU debugging — say you’re trying to catch a container that spikes for 3 seconds and goes quiet — drop to 5 seconds:

- job_name: "cadvisor"
scrape_interval: 5s

This triples your ingest rate for that job, so don’t leave it there permanently. Set it tight, catch your bug, reset to 15s.

For anything less than 5 seconds, you’re fighting cAdvisor’s own collection frequency (it samples every 1–2 seconds internally) and diminishing returns kick in fast.


Honest Assessment: Is cAdvisor Worth It?

cAdvisor is not the lightest option. The container itself runs ~50–100 MB RSS. It requires privileged access. It generates hundreds of metrics per container by default. If you have 5 containers on a low-power mini PC, you might be burning 15% of your resources on observability.

Lighter alternatives worth knowing:

If your homelab has enough headroom and you want real per-container drill-down, cAdvisor + Prometheus is still the gold standard. If you’re running on an old NUC with 8 GB of RAM shared between 20 containers, weigh the overhead seriously before committing.


Putting It Together

Directory structure before you docker compose up -d:

monitoring/
docker-compose.yml
prometheus/
prometheus.yml

That’s it. No extra setup, no config databases.

Terminal window
mkdir -p monitoring/prometheus
cd monitoring
# drop the compose file and prometheus.yml in place
docker compose up -d

Check everything came up:

Terminal window
docker compose ps

Verify Prometheus is scraping cAdvisor by opening http://your-host:9090/targets — you should see cadvisor (1/1 up) within 30 seconds. If it’s red, check that the monitoring network is resolving correctly and cAdvisor is actually healthy:

Terminal window
docker compose logs cadvisor --tail=20

Once targets are green, hit http://your-host:9090 and run the topk(5, container_memory_rss{container!=""}) query. You’ll immediately see which containers are eating your memory. Spoiler: it’s probably not the one you guessed.


The Bottom Line

Blind guessing which container is misbehaving is a great way to waste an afternoon. cAdvisor gives you per-container RSS, CPU rate, network drops, and disk I/O — all queryable with PromQL, all visualizable in Grafana in about 20 minutes of setup.

The full stack is four containers (cAdvisor + Node Exporter + Prometheus + Grafana), a Compose file, and a Prometheus config with a few relabel rules to make the labels actually useful. Use Dashboard ID 19792 in Grafana and you’re done.

If you’re running anything more than a handful of containers and you don’t have per-container metrics, you’re flying blind. Your future 2 AM self — staring at a container that’s been silently leaking memory for 72 hours — will appreciate having this in place.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
WASM Containers in 2026
Next Post
cri-o vs containerd

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts