You Have 30 Containers and One Mystery Memory Hog
You’ve been watching your server slowly choke for three days. free -h says you’re at 94% memory. You do the reasonable thing — you guess. Probably Jellyfin. Maybe it’s that Postgres container you spun up for something you’ve since forgotten. Could be Nextcloud doing its hourly scan of your 2 TB photo library.
You guess wrong. It’s a tiny Uptime Kuma instance that somehow ballooned to 800 MB because of a bug you’d have caught immediately if you had per-container metrics.
This is the exact problem cAdvisor and Prometheus solve. Not “here’s a vague overview of your host” — here’s exactly which container, exactly how much, exactly when it started.
Let’s wire it up properly.
What cAdvisor Actually Is
cAdvisor (Container Advisor) is a Google-maintained exporter that reads Linux cgroups and translates container resource usage into Prometheus metrics. It mounts the Docker socket and cgroup filesystem, walks your running containers, and exposes a /metrics endpoint with everything labeled by container name, image, and Docker labels.
The key metric families you’ll care about:
container_cpu_usage_seconds_total— cumulative CPU time by containercontainer_memory_rss— RSS (resident set size) — the real memory number, not the inflatedcontainer_memory_usage_bytescontainer_network_receive_bytes_total/container_network_transmit_bytes_totalcontainer_network_receive_drop_total— packet drops, great for spotting network congestioncontainer_fs_reads_bytes_total/container_fs_writes_bytes_total
It also exposes a per-container /metrics/cadvisor endpoint and a self-hosted web UI (accessible on port 8080), which is fine for a quick look but not something you’ll use long-term.
The Full Stack: Compose File
Here’s a production-usable Compose stack. This is everything: cAdvisor, Node Exporter, Prometheus, and Grafana.
services: cadvisor: image: gcr.io/cadvisor/cadvisor:v0.49.1 container_name: cadvisor privileged: true restart: unless-stopped volumes: - /:/rootfs:ro - /var/run:/var/run:ro - /sys:/sys:ro - /var/lib/docker:/var/lib/docker:ro - /dev/disk:/dev/disk:ro devices: - /dev/kmsg ports: - "8080:8080" networks: - monitoring
node-exporter: image: prom/node-exporter:v1.8.1 container_name: node-exporter restart: unless-stopped volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: - "--path.procfs=/host/proc" - "--path.rootfs=/rootfs" - "--path.sysfs=/host/sys" - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)" ports: - "9100:9100" networks: - monitoring
prometheus: image: prom/prometheus:v2.52.0 container_name: prometheus restart: unless-stopped volumes: - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro - prometheus_data:/prometheus command: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention.time=15d" - "--web.enable-lifecycle" ports: - "9090:9090" networks: - monitoring
grafana: image: grafana/grafana:10.4.2 container_name: grafana restart: unless-stopped environment: - GF_SECURITY_ADMIN_PASSWORD=changeme - GF_USERS_ALLOW_SIGN_UP=false volumes: - grafana_data:/var/lib/grafana ports: - "3000:3000" depends_on: - prometheus networks: - monitoring
volumes: prometheus_data: grafana_data:
networks: monitoring: driver: bridgeThe privileged: true on cAdvisor isn’t great from a security standpoint, but it’s required to read cgroup data on most Linux setups. On newer kernels with cgroup v2, you can get away without privileged by mounting just the cgroups you need — but honestly, this is your homelab and that rabbit hole isn’t worth chasing today.
Prometheus Config with Label Relabeling
The default cAdvisor scrape dumps every metric including some you’ll never look at. More importantly, the default label set doesn’t give you Docker Compose service names — just raw container names like myproject_web_1 or the newer myproject-web-1.
Here’s a Prometheus config that pulls in Docker labels and does some cleanup:
global: scrape_interval: 15s evaluation_interval: 15s
scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"]
- job_name: "node-exporter" static_configs: - targets: ["node-exporter:9100"]
- job_name: "cadvisor" scrape_interval: 15s static_configs: - targets: ["cadvisor:8080"] metric_relabel_configs: # Drop noisy per-device filesystem metrics - source_labels: [__name__] regex: "container_fs_(reads|writes)_(bytes|completed)_total" target_label: __tmp_drop action: keep
# Pull the compose service name from container_label_com_docker_compose_service - source_labels: [container_label_com_docker_compose_service] target_label: compose_service
# Pull the compose project name - source_labels: [container_label_com_docker_compose_project] target_label: compose_project
# Drop internal cAdvisor housekeeping containers (pause containers, etc.) - source_labels: [container_label_io_kubernetes_container_name] regex: ".+" action: drop
# Drop metrics with empty container names (host-level cAdvisor data) - source_labels: [container] regex: "" action: dropThe metric_relabel_configs section runs after scraping, which lets you rename, drop, or create labels based on existing ones. The key ones here:
container_label_com_docker_compose_service→compose_service: This is what Docker injects automatically when you use Compose. Now your PromQL can filter by service name rather than the ugly container name.container_label_com_docker_compose_project: The project (directory name or--project-namevalue). Useful when you have multiple Compose stacks on one host.
Real PromQL Queries That Actually Matter
Open Prometheus at http://your-host:9090 and try these. All code blocks use text since most PromQL renderers don’t have a dedicated syntax highlighter.
Top 5 containers by RSS memory:
topk(5, container_memory_rss{container!=""})That container!="" filter drops the host-level aggregates cAdvisor also exports. Without it, you’ll see a confusing entry for your whole system at the top.
CPU hot loop — containers using more than 20% of a core:
rate(container_cpu_usage_seconds_total{container!=""}[5m]) > 0.2This gives you per-container CPU rate as a fraction of one core. A value of 1.0 means it’s pegging a full CPU. Multiply by 100 to get percentage. Wrap it in topk(5, ...) to find your worst offenders.
Network receive drops by container (non-zero only):
rate(container_network_receive_drop_total{container!=""}[5m]) > 0If any container shows up here consistently, something is saturating the network path — either the container itself is generating too much traffic or the underlying host NIC is the bottleneck.
Memory usage as percentage of a limit (only if you set limits):
container_memory_rss{container!=""} / container_spec_memory_limit_bytes{container!=""} * 100This only returns useful data for containers with a memory limit set. Containers without limits will divide by 0 and disappear from results, which is the correct behavior.
Disk write rate per container:
topk(5, rate(container_fs_writes_bytes_total{container!=""}[5m]))This one surfaces your I/O heavy hitters. Great for catching runaway log writers or databases doing more work than expected.
Grafana Dashboard
Don’t build the dashboard from scratch. Use cAdvisor’s official Grafana dashboard — Dashboard ID 19792 (search Grafana.com for “cAdvisor exporter”). It’s maintained by the cAdvisor team and covers all the main container_* metrics out of the box.
To import it:
- Open Grafana → Dashboards → Import
- Enter
19792in the “Import via grafana.com” field - Select your Prometheus data source
- Done
The dashboard gives you a per-container memory/CPU/network overview with dropdown filters. If you did the relabeling above, add a variable for compose_service and filter by service name instead of digging through raw container names.
The High-Cardinality Warning You Actually Need to Read
Here’s the honest talk: cAdvisor generates a lot of metrics. For each container, across all label combinations, you can easily end up with 200–300 metric series per container. With 30 containers, that’s 6,000–9,000 active series before Node Exporter adds its own pile.
For a homelab Prometheus with 15-day retention, this is completely fine. A modern server with 16 GB RAM handles millions of series without breaking a sweat. But there are a few traps:
Ephemeral containers are the real killer. If you run containers that spin up and shut down frequently (CI runners, one-off tasks, Docker build jobs), each unique container name becomes a new label combination. After a week of ephemeral builds, you can have 50,000 dead series that Prometheus is still tracking. They’ll age out at the retention window, but they hurt query performance in the meantime.
Fix: drop the name label for short-lived containers with a metric_relabel_configs rule, or set a shorter retention window for those job scrapes.
Drop what you don’t use. The filesystem metrics per device are the noisiest. If you have 5 disk partitions and 30 containers, you get 150 label combinations just for filesystem reads. Add this to your metric_relabel_configs to cut the noise:
- source_labels: [__name__] regex: "container_fs_io_current|container_blkio_.*" action: dropTune the Scrape Interval for Your Use Case
The default 15-second scrape interval is right for general monitoring. You’ll catch most issues within a scrape cycle.
For tight CPU debugging — say you’re trying to catch a container that spikes for 3 seconds and goes quiet — drop to 5 seconds:
- job_name: "cadvisor" scrape_interval: 5sThis triples your ingest rate for that job, so don’t leave it there permanently. Set it tight, catch your bug, reset to 15s.
For anything less than 5 seconds, you’re fighting cAdvisor’s own collection frequency (it samples every 1–2 seconds internally) and diminishing returns kick in fast.
Honest Assessment: Is cAdvisor Worth It?
cAdvisor is not the lightest option. The container itself runs ~50–100 MB RSS. It requires privileged access. It generates hundreds of metrics per container by default. If you have 5 containers on a low-power mini PC, you might be burning 15% of your resources on observability.
Lighter alternatives worth knowing:
-
Docker daemon built-in metrics — since Docker 20.10+, enable
"metrics-addr": "0.0.0.0:9323"in/etc/docker/daemon.json. You get basic engine-level metrics, not per-container breakdowns. -
ctop— terminal UI, zero persistence, great for “what’s hot right now” diagnostics. Not Prometheus-compatible. -
Beszel — lightweight agent-based monitoring with a slick UI. I’ve covered Beszel separately — it’s a great choice if you want something simpler that doesn’t require a full Prometheus stack.
-
Docker socket proxy + exporter — some setups use a Docker socket proxy to safely expose the Docker API, then use
docker_state_exporteror similar to pull container states. Less data than cAdvisor, less overhead.
If your homelab has enough headroom and you want real per-container drill-down, cAdvisor + Prometheus is still the gold standard. If you’re running on an old NUC with 8 GB of RAM shared between 20 containers, weigh the overhead seriously before committing.
Putting It Together
Directory structure before you docker compose up -d:
monitoring/ docker-compose.yml prometheus/ prometheus.ymlThat’s it. No extra setup, no config databases.
mkdir -p monitoring/prometheuscd monitoring# drop the compose file and prometheus.yml in placedocker compose up -dCheck everything came up:
docker compose psVerify Prometheus is scraping cAdvisor by opening http://your-host:9090/targets — you should see cadvisor (1/1 up) within 30 seconds. If it’s red, check that the monitoring network is resolving correctly and cAdvisor is actually healthy:
docker compose logs cadvisor --tail=20Once targets are green, hit http://your-host:9090 and run the topk(5, container_memory_rss{container!=""}) query. You’ll immediately see which containers are eating your memory. Spoiler: it’s probably not the one you guessed.
The Bottom Line
Blind guessing which container is misbehaving is a great way to waste an afternoon. cAdvisor gives you per-container RSS, CPU rate, network drops, and disk I/O — all queryable with PromQL, all visualizable in Grafana in about 20 minutes of setup.
The full stack is four containers (cAdvisor + Node Exporter + Prometheus + Grafana), a Compose file, and a Prometheus config with a few relabel rules to make the labels actually useful. Use Dashboard ID 19792 in Grafana and you’re done.
If you’re running anything more than a handful of containers and you don’t have per-container metrics, you’re flying blind. Your future 2 AM self — staring at a container that’s been silently leaking memory for 72 hours — will appreciate having this in place.