You opened a shell on your home server last week to check disk usage and noticed six processes you forgot you were running: promtail, fluent-bit, node_exporter, telegraf, cadvisor, and some Python script you wrote in 2023 that scrapes a temperature sensor. Each one has its own config file, its own service unit, its own memory footprint, and its own way of lying to you when something breaks.
There is a better way. The OpenTelemetry Collector — otelcol — is a single binary that speaks every observability protocol in existence, and it will gladly eat all of that chaos and ship it somewhere useful.
Here is how to replace most of your agent zoo with one well-configured pipeline.
What the Collector Actually Is
The OTel Collector is not a framework or a library. It is a deployable binary with a declarative config that wires together three types of components:
- Receivers — pull or accept data in. OTLP gRPC/HTTP, Prometheus scrape,
hostmetrics,filelog,journald,kafka,dockerstats, and forty-something others. - Processors — transform data in flight. Batch for efficiency,
attributesto add/drop labels,filterto drop noise,transformfor full OTTL expressions,memory_limiterto keep you out of OOM-kill territory. - Exporters — push data out. OTLP to Tempo or any OTel backend, Prometheus remote write, Loki, Elasticsearch, Kafka, and more.
You connect these in pipelines — one per signal type (metrics, logs, traces). Multiple receivers can feed one pipeline. One pipeline can fan out to multiple exporters.
receivers → [optional processors] → exportersThat’s it. The rest is just YAML.
Agent Mode vs Gateway Mode
Before you write a single line of config, decide how you are deploying:
Agent mode runs otelcol on every host, collecting local data (host metrics, container logs, local app telemetry) and either shipping directly to backends or forwarding to a central gateway.
Gateway mode runs one or a few otelcol instances that receive data from many agents (or apps that speak OTLP) and centralize the fan-out to backends.
For a home lab with one to five servers, agent mode is almost always correct. You run one Collector per host, maybe with a single gateway instance if you want all your Grafana backends talking to one place. This guide covers the agent case — the config you will write lives on the same box as what it monitors.
The Config That Replaces Your Agent Zoo
This is a real, working config.yaml. It collects host metrics and container logs, ships metrics to Prometheus (remote write) and logs to Loki, and accepts OTLP traces to forward to Tempo. No fluff, no placeholder values.
extensions: health_check: endpoint: "0.0.0.0:13133" pprof: endpoint: "0.0.0.0:1777" zpages: endpoint: "0.0.0.0:55679"
receivers: # Host metrics — replaces node_exporter for most use cases hostmetrics: collection_interval: 30s scrapers: cpu: disk: load: filesystem: exclude_mount_points: mount_points: ["/dev", "/proc", "/sys", "/run/lock"] match_type: strict exclude_fs_types: fs_types: ["autofs", "binfmt_misc", "cgroup", "configfs", "debugfs", "devpts", "devtmpfs", "fusectl", "hugetlbfs", "mqueue", "nsfs", "overlay", "proc", "procfs", "pstore", "rpc_pipefs", "securityfs", "selinuxfs", "squashfs", "sysfs", "tracefs"] match_type: strict memory: network: exclude: interfaces: ["lo"] match_type: strict paging: processes:
# Container logs — replaces cadvisor + docker log driver hacks filelog: include: - /var/lib/docker/containers/*/*.log include_file_path: true include_file_name: false operators: - type: json_parser id: parse-docker-json on_error: send - type: move from: attributes.log to: body - type: regex_parser id: extract-container-id regex: '^/var/lib/docker/containers/(?P<container_id>[^/]+)/' parse_from: attributes["log.file.path"] - type: move from: attributes.container_id to: resource["container.id"]
# Journald — replaces promtail's journald scrape journald: directory: /run/log/journal units: - nginx - caddy - docker priority: info
# Accept OTLP from apps (your own services that export traces/metrics) otlp: protocols: grpc: endpoint: "0.0.0.0:4317" http: endpoint: "0.0.0.0:4318"
# Scrape existing Prometheus endpoints you haven't migrated yet prometheus: config: scrape_configs: - job_name: "otelcol-self" scrape_interval: 30s static_configs: - targets: ["0.0.0.0:8888"]
processors: # Always include memory_limiter — first in every pipeline memory_limiter: check_interval: 5s limit_percentage: 75 spike_limit_percentage: 20
# Batch before shipping — reduces API calls and cost batch: send_batch_size: 1000 timeout: 10s
# Stamp every piece of data with host identity resource: attributes: - key: service.name value: "homelab-host" action: upsert - key: service.instance.id from_attribute: host.name action: insert - key: host.environment value: "homelab" action: insert
# Drop noisy filesystem metrics for tmpfs mounts filter/drop-tmpfs: metrics: exclude: match_type: regexp resource_attributes: - key: system.filesystem.type value: "tmpfs|ramfs"
exporters: # Prometheus remote write → Grafana / Mimir / Thanos / bare Prometheus prometheusremotewrite: endpoint: "http://prometheus:9090/api/v1/write" tls: insecure: true resource_to_telemetry_conversion: enabled: true
# Loki for logs loki: endpoint: "http://loki:3100/loki/api/v1/push" tls: insecure: true default_labels_enabled: exporter: false job: true instance: true level: true
# Tempo for traces otlp/tempo: endpoint: "http://tempo:4317" tls: insecure: true
# Debug exporter — logs to stdout at info level, useful during setup debug: verbosity: basic
service: extensions: [health_check, pprof, zpages]
pipelines: metrics: receivers: [hostmetrics, prometheus, otlp] processors: [memory_limiter, filter/drop-tmpfs, resource, batch] exporters: [prometheusremotewrite]
logs: receivers: [filelog, journald, otlp] processors: [memory_limiter, resource, batch] exporters: [loki]
traces: receivers: [otlp] processors: [memory_limiter, resource, batch] exporters: [otlp/tempo]
telemetry: logs: level: "info" metrics: address: "0.0.0.0:8888"A few things worth calling out:
memory_limiter is always first. If you put it after batch, the batch can swell past your limit before the limiter sees it. Limiter goes first, always.
resource processor stamps identity on everything. service.name and service.instance.id are OTel semantic conventions. Downstream tools — Tempo, Grafana, Jaeger — use them to correlate signals. Set them once here instead of in every app.
debug exporter is your friend during setup. Add it temporarily to any pipeline and watch your terminal. Remove it when things work.
Running It
Docker Compose, because that is where most home lab gear ends up:
services: otelcol: image: otel/opentelemetry-collector-contrib:0.101.0 container_name: otelcol restart: unless-stopped volumes: - ./otelcol-config.yaml:/etc/otelcol-contrib/config.yaml:ro - /var/lib/docker/containers:/var/lib/docker/containers:ro - /run/log/journal:/run/log/journal:ro - /proc:/hostproc:ro - /sys:/hostsys:ro environment: - HOST_PROC=/hostproc - HOST_SYS=/hostsys ports: - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP - "13133:13133" # Health check - "8888:8888" # Self-metrics networks: - monitoring command: ["--config=/etc/otelcol-contrib/config.yaml"]
networks: monitoring: external: trueThe otel/opentelemetry-collector-contrib image includes all the community receivers and exporters. The otel/opentelemetry-collector slim image does not have loki, filelog, or journald — use contrib.
Start it:
docker compose up -d otelcoldocker compose logs -f otelcolHit the health check:
curl http://localhost:13133/# {"status":"Server available","upSince":"...","uptime":"..."}Migrating from Promtail
If you have Promtail shipping container logs to Loki, the mental model maps cleanly:
| Promtail | OTel Collector |
|---|---|
scrape_configs[].static_configs | filelog receiver include glob |
pipeline_stages | operators array in filelog |
labels block | resource processor attributes |
client block | loki exporter |
The main gotcha: Promtail’s docker scrape target auto-discovers container names from the Docker daemon. The OTel filelog receiver reads raw log files — you have to parse the container ID from the file path (that regex_parser in the config above does this) and then use a docker_observer or dockerstats receiver to enrich with container names if you need them.
For most home lab use cases, container ID + your own naming convention is enough. If you need full container metadata enrichment, add the resourcedetection processor with the docker detector:
processors: resourcedetection/docker: detectors: [docker, system] docker: resource_attributes: container.name: enabled: true container.image.name: enabled: trueMigrating from Prometheus Node Exporter
The hostmetrics receiver covers the 90% case. Feature gaps worth knowing:
What hostmetrics gives you that node_exporter also gives you:
- CPU (per-core, idle/user/system/iowait)
- Memory (used/free/cached/buffers)
- Disk I/O (read/write bytes, ops)
- Filesystem usage
- Network (bytes in/out, packets, errors)
- Load averages
- Process count
What node_exporter has that hostmetrics does not (yet):
- Hardware monitoring via
lm-sensors(fan speed, temperatures) - SMART disk health (
smartmontools) systemdunit states- EDAC memory error counters
- Specific NFS/NFS4 stats
If you need sensor temps or SMART data, keep node_exporter running alongside the Collector and scrape it with the prometheus receiver. This is not a shameful hybrid — it is a migration strategy. Scrape node_exporter’s endpoint in prometheus.config.scrape_configs and route it through the same pipeline.
receivers: prometheus: config: scrape_configs: - job_name: "node-exporter" scrape_interval: 30s static_configs: - targets: ["node-exporter:9100"]One agent less is still progress.
Memory and Performance
On a home server doing light observability work — one host, a dozen containers, 30s scrape interval — expect the Collector to run comfortably under 100 MB RSS. If you see it climbing, these are your levers:
Batch processor tuning. Larger batches mean fewer API calls to exporters but higher memory peaks. Start with send_batch_size: 1000 and timeout: 10s. If your Loki or Prometheus can’t keep up, drop the batch size.
Memory limiter configuration. limit_percentage: 75 means the Collector will start dropping data if it hits 75% of the container’s memory limit before OOM-kill gets there. Set your container’s mem_limit explicitly — 512m is reasonable for agent mode.
services: otelcol: # ... deploy: resources: limits: memory: 512mFilelog backpressure. If you have containers logging at 10k lines/second, filelog will faithfully try to keep up. Filter before export:
processors: filter/drop-debug-logs: logs: exclude: match_type: regexp record_attributes: - key: level value: "DEBUG|TRACE"When NOT to Use the OTel Collector
The Collector is not a silver bullet, and there are real cases where you should reach for something else:
Ultra-low-power devices. A Raspberry Pi Zero or an ESP32 gateway node — anything with under 128 MB RAM — is not a good home for the Collector. fluent-bit at 1-2 MB RSS still wins there.
Specialized exporters with no OTel support. If you are shipping to InfluxDB and need line protocol with specific measurement names, telegraf is still the right tool. The OTel InfluxDB exporter exists but is immature.
When your whole stack is already Prometheus-native. If you have Prometheus, Alertmanager, Thanos, and you are happy — don’t fix what isn’t broken. The Collector adds value when you are adding logs and traces to an existing metrics setup, not when you want to replace a working metrics stack.
When you need the UI configuration. If your team prefers the Grafana Agent Flow UI or Vector’s topology graph, those tools offer comparable pipelines with a friendlier operational model.
The Bottom Line
Six processes that each have their own config syntax, log format, and failure mode are not observability infrastructure — they are a support ticket waiting to happen.
The OpenTelemetry Collector gives you one config file, one service to restart, one health check endpoint, and one place to look when data stops flowing. The migration is not all-or-nothing — drop it alongside your existing agents, start routing one signal type through it, and rip out the old stuff when you trust what you see.
On a typical home server doing normal things, it runs in under 100 MB and doesn’t need babysitting. That’s the bar. Most of your current agents don’t clear it.
Your 2 AM self — the one who’s getting paged because promtail silently stopped and nobody noticed for three hours — will thank you.