Skip to content
Go back

OpenTelemetry Collector: One Pipeline to Rule Them All

By SumGuy 10 min read
OpenTelemetry Collector: One Pipeline to Rule Them All

You opened a shell on your home server last week to check disk usage and noticed six processes you forgot you were running: promtail, fluent-bit, node_exporter, telegraf, cadvisor, and some Python script you wrote in 2023 that scrapes a temperature sensor. Each one has its own config file, its own service unit, its own memory footprint, and its own way of lying to you when something breaks.

There is a better way. The OpenTelemetry Collector — otelcol — is a single binary that speaks every observability protocol in existence, and it will gladly eat all of that chaos and ship it somewhere useful.

Here is how to replace most of your agent zoo with one well-configured pipeline.


What the Collector Actually Is

The OTel Collector is not a framework or a library. It is a deployable binary with a declarative config that wires together three types of components:

You connect these in pipelines — one per signal type (metrics, logs, traces). Multiple receivers can feed one pipeline. One pipeline can fan out to multiple exporters.

receivers → [optional processors] → exporters

That’s it. The rest is just YAML.


Agent Mode vs Gateway Mode

Before you write a single line of config, decide how you are deploying:

Agent mode runs otelcol on every host, collecting local data (host metrics, container logs, local app telemetry) and either shipping directly to backends or forwarding to a central gateway.

Gateway mode runs one or a few otelcol instances that receive data from many agents (or apps that speak OTLP) and centralize the fan-out to backends.

For a home lab with one to five servers, agent mode is almost always correct. You run one Collector per host, maybe with a single gateway instance if you want all your Grafana backends talking to one place. This guide covers the agent case — the config you will write lives on the same box as what it monitors.


The Config That Replaces Your Agent Zoo

This is a real, working config.yaml. It collects host metrics and container logs, ships metrics to Prometheus (remote write) and logs to Loki, and accepts OTLP traces to forward to Tempo. No fluff, no placeholder values.

otelcol-config.yaml
extensions:
health_check:
endpoint: "0.0.0.0:13133"
pprof:
endpoint: "0.0.0.0:1777"
zpages:
endpoint: "0.0.0.0:55679"
receivers:
# Host metrics — replaces node_exporter for most use cases
hostmetrics:
collection_interval: 30s
scrapers:
cpu:
disk:
load:
filesystem:
exclude_mount_points:
mount_points: ["/dev", "/proc", "/sys", "/run/lock"]
match_type: strict
exclude_fs_types:
fs_types: ["autofs", "binfmt_misc", "cgroup", "configfs", "debugfs",
"devpts", "devtmpfs", "fusectl", "hugetlbfs", "mqueue",
"nsfs", "overlay", "proc", "procfs", "pstore",
"rpc_pipefs", "securityfs", "selinuxfs", "squashfs",
"sysfs", "tracefs"]
match_type: strict
memory:
network:
exclude:
interfaces: ["lo"]
match_type: strict
paging:
processes:
# Container logs — replaces cadvisor + docker log driver hacks
filelog:
include:
- /var/lib/docker/containers/*/*.log
include_file_path: true
include_file_name: false
operators:
- type: json_parser
id: parse-docker-json
on_error: send
- type: move
from: attributes.log
to: body
- type: regex_parser
id: extract-container-id
regex: '^/var/lib/docker/containers/(?P<container_id>[^/]+)/'
parse_from: attributes["log.file.path"]
- type: move
from: attributes.container_id
to: resource["container.id"]
# Journald — replaces promtail's journald scrape
journald:
directory: /run/log/journal
units:
- nginx
- caddy
- docker
priority: info
# Accept OTLP from apps (your own services that export traces/metrics)
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
# Scrape existing Prometheus endpoints you haven't migrated yet
prometheus:
config:
scrape_configs:
- job_name: "otelcol-self"
scrape_interval: 30s
static_configs:
- targets: ["0.0.0.0:8888"]
processors:
# Always include memory_limiter — first in every pipeline
memory_limiter:
check_interval: 5s
limit_percentage: 75
spike_limit_percentage: 20
# Batch before shipping — reduces API calls and cost
batch:
send_batch_size: 1000
timeout: 10s
# Stamp every piece of data with host identity
resource:
attributes:
- key: service.name
value: "homelab-host"
action: upsert
- key: service.instance.id
from_attribute: host.name
action: insert
- key: host.environment
value: "homelab"
action: insert
# Drop noisy filesystem metrics for tmpfs mounts
filter/drop-tmpfs:
metrics:
exclude:
match_type: regexp
resource_attributes:
- key: system.filesystem.type
value: "tmpfs|ramfs"
exporters:
# Prometheus remote write → Grafana / Mimir / Thanos / bare Prometheus
prometheusremotewrite:
endpoint: "http://prometheus:9090/api/v1/write"
tls:
insecure: true
resource_to_telemetry_conversion:
enabled: true
# Loki for logs
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
tls:
insecure: true
default_labels_enabled:
exporter: false
job: true
instance: true
level: true
# Tempo for traces
otlp/tempo:
endpoint: "http://tempo:4317"
tls:
insecure: true
# Debug exporter — logs to stdout at info level, useful during setup
debug:
verbosity: basic
service:
extensions: [health_check, pprof, zpages]
pipelines:
metrics:
receivers: [hostmetrics, prometheus, otlp]
processors: [memory_limiter, filter/drop-tmpfs, resource, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [filelog, journald, otlp]
processors: [memory_limiter, resource, batch]
exporters: [loki]
traces:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlp/tempo]
telemetry:
logs:
level: "info"
metrics:
address: "0.0.0.0:8888"

A few things worth calling out:

memory_limiter is always first. If you put it after batch, the batch can swell past your limit before the limiter sees it. Limiter goes first, always.

resource processor stamps identity on everything. service.name and service.instance.id are OTel semantic conventions. Downstream tools — Tempo, Grafana, Jaeger — use them to correlate signals. Set them once here instead of in every app.

debug exporter is your friend during setup. Add it temporarily to any pipeline and watch your terminal. Remove it when things work.


Running It

Docker Compose, because that is where most home lab gear ends up:

docker-compose.yaml
services:
otelcol:
image: otel/opentelemetry-collector-contrib:0.101.0
container_name: otelcol
restart: unless-stopped
volumes:
- ./otelcol-config.yaml:/etc/otelcol-contrib/config.yaml:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /run/log/journal:/run/log/journal:ro
- /proc:/hostproc:ro
- /sys:/hostsys:ro
environment:
- HOST_PROC=/hostproc
- HOST_SYS=/hostsys
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "13133:13133" # Health check
- "8888:8888" # Self-metrics
networks:
- monitoring
command: ["--config=/etc/otelcol-contrib/config.yaml"]
networks:
monitoring:
external: true

The otel/opentelemetry-collector-contrib image includes all the community receivers and exporters. The otel/opentelemetry-collector slim image does not have loki, filelog, or journald — use contrib.

Start it:

Terminal window
docker compose up -d otelcol
docker compose logs -f otelcol

Hit the health check:

Terminal window
curl http://localhost:13133/
# {"status":"Server available","upSince":"...","uptime":"..."}

Migrating from Promtail

If you have Promtail shipping container logs to Loki, the mental model maps cleanly:

PromtailOTel Collector
scrape_configs[].static_configsfilelog receiver include glob
pipeline_stagesoperators array in filelog
labels blockresource processor attributes
client blockloki exporter

The main gotcha: Promtail’s docker scrape target auto-discovers container names from the Docker daemon. The OTel filelog receiver reads raw log files — you have to parse the container ID from the file path (that regex_parser in the config above does this) and then use a docker_observer or dockerstats receiver to enrich with container names if you need them.

For most home lab use cases, container ID + your own naming convention is enough. If you need full container metadata enrichment, add the resourcedetection processor with the docker detector:

processors:
resourcedetection/docker:
detectors: [docker, system]
docker:
resource_attributes:
container.name:
enabled: true
container.image.name:
enabled: true

Migrating from Prometheus Node Exporter

The hostmetrics receiver covers the 90% case. Feature gaps worth knowing:

What hostmetrics gives you that node_exporter also gives you:

What node_exporter has that hostmetrics does not (yet):

If you need sensor temps or SMART data, keep node_exporter running alongside the Collector and scrape it with the prometheus receiver. This is not a shameful hybrid — it is a migration strategy. Scrape node_exporter’s endpoint in prometheus.config.scrape_configs and route it through the same pipeline.

receivers:
prometheus:
config:
scrape_configs:
- job_name: "node-exporter"
scrape_interval: 30s
static_configs:
- targets: ["node-exporter:9100"]

One agent less is still progress.


Memory and Performance

On a home server doing light observability work — one host, a dozen containers, 30s scrape interval — expect the Collector to run comfortably under 100 MB RSS. If you see it climbing, these are your levers:

Batch processor tuning. Larger batches mean fewer API calls to exporters but higher memory peaks. Start with send_batch_size: 1000 and timeout: 10s. If your Loki or Prometheus can’t keep up, drop the batch size.

Memory limiter configuration. limit_percentage: 75 means the Collector will start dropping data if it hits 75% of the container’s memory limit before OOM-kill gets there. Set your container’s mem_limit explicitly — 512m is reasonable for agent mode.

docker-compose.yaml (memory limits)
services:
otelcol:
# ...
deploy:
resources:
limits:
memory: 512m

Filelog backpressure. If you have containers logging at 10k lines/second, filelog will faithfully try to keep up. Filter before export:

processors:
filter/drop-debug-logs:
logs:
exclude:
match_type: regexp
record_attributes:
- key: level
value: "DEBUG|TRACE"

When NOT to Use the OTel Collector

The Collector is not a silver bullet, and there are real cases where you should reach for something else:

Ultra-low-power devices. A Raspberry Pi Zero or an ESP32 gateway node — anything with under 128 MB RAM — is not a good home for the Collector. fluent-bit at 1-2 MB RSS still wins there.

Specialized exporters with no OTel support. If you are shipping to InfluxDB and need line protocol with specific measurement names, telegraf is still the right tool. The OTel InfluxDB exporter exists but is immature.

When your whole stack is already Prometheus-native. If you have Prometheus, Alertmanager, Thanos, and you are happy — don’t fix what isn’t broken. The Collector adds value when you are adding logs and traces to an existing metrics setup, not when you want to replace a working metrics stack.

When you need the UI configuration. If your team prefers the Grafana Agent Flow UI or Vector’s topology graph, those tools offer comparable pipelines with a friendlier operational model.


The Bottom Line

Six processes that each have their own config syntax, log format, and failure mode are not observability infrastructure — they are a support ticket waiting to happen.

The OpenTelemetry Collector gives you one config file, one service to restart, one health check endpoint, and one place to look when data stops flowing. The migration is not all-or-nothing — drop it alongside your existing agents, start routing one signal type through it, and rip out the old stuff when you trust what you see.

On a typical home server doing normal things, it runs in under 100 MB and doesn’t need babysitting. That’s the bar. Most of your current agents don’t clear it.

Your 2 AM self — the one who’s getting paged because promtail silently stopped and nobody noticed for three hours — will thank you.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
age vs GPG: Modern File Encryption That Doesn't Make You Cry
Next Post
Sysbox vs gVisor vs Kata

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts