OpenTelemetry Collector: One Pipeline to Rule Them All

You opened a shell on your home server last week to check disk usage and noticed six processes you forgot you were running: promtail, fluent-bit, node_exporter, telegraf, cadvisor, and some Python script you wrote in 2023 that scrapes a temperature sensor. Each one has its own config file, its own service unit, its own memory footprint, and its own way of lying to you when something breaks.

There is a better way. The OpenTelemetry Collector, otelcol, is a single binary that speaks every observability protocol in existence, and it will gladly eat all of that chaos and ship it somewhere useful.

Here is how to replace most of your agent zoo with one well-configured pipeline.

What the Collector Actually Is

The OTel Collector is not a framework or a library. It is a deployable binary with a declarative config that wires together three types of components:

Receivers: pull or accept data in. OTLP gRPC/HTTP, Prometheus scrape, hostmetrics, filelog, journald, kafka, dockerstats, and forty-something others.
Processors: transform data in flight. Batch for efficiency, attributes to add/drop labels, filter to drop noise, transform for full OTTL expressions, memory_limiter to keep you out of OOM-kill territory.
Exporters: push data out. OTLP to Tempo or any OTel backend, Prometheus remote write, Loki, Elasticsearch, Kafka, and more.

You connect these in pipelines, one per signal type (metrics, logs, traces). Multiple receivers can feed one pipeline. One pipeline can fan out to multiple exporters.

receivers → [optional processors] → exporters

That’s it. The rest is just YAML.

Agent Mode vs Gateway Mode

Before you write a single line of config, decide how you are deploying:

Agent mode runs otelcol on every host, collecting local data (host metrics, container logs, local app telemetry) and either shipping directly to backends or forwarding to a central gateway.

Gateway mode runs one or a few otelcol instances that receive data from many agents (or apps that speak OTLP) and centralize the fan-out to backends.

For a home lab with one to five servers, agent mode is almost always correct. You run one Collector per host, maybe with a single gateway instance if you want all your Grafana backends talking to one place. In the agent case, the config you will write lives on the same box as what it monitors.

The Config That Replaces Your Agent Zoo

This is a real, working config.yaml. It collects host metrics and container logs, ships metrics to Prometheus (remote write) and logs to Loki, and accepts OTLP traces to forward to Tempo. No fluff, no placeholder values.

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"
  pprof:
    endpoint: "0.0.0.0:1777"
  zpages:
    endpoint: "0.0.0.0:55679"

receivers:
  # Host metrics — replaces node_exporter for most use cases
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu:
      disk:
      load:
      filesystem:
        exclude_mount_points:
          mount_points: ["/dev", "/proc", "/sys", "/run/lock"]
          match_type: strict
        exclude_fs_types:
          fs_types: ["autofs", "binfmt_misc", "cgroup", "configfs", "debugfs",
                     "devpts", "devtmpfs", "fusectl", "hugetlbfs", "mqueue",
                     "nsfs", "overlay", "proc", "procfs", "pstore",
                     "rpc_pipefs", "securityfs", "selinuxfs", "squashfs",
                     "sysfs", "tracefs"]
          match_type: strict
      memory:
      network:
        exclude:
          interfaces: ["lo"]
          match_type: strict
      paging:
      processes:

  # Container logs — replaces cadvisor + docker log driver hacks
  filelog:
    include:
      - /var/lib/docker/containers/*/*.log
    include_file_path: true
    include_file_name: false
    operators:
      - type: json_parser
        id: parse-docker-json
        on_error: send
      - type: move
        from: attributes.log
        to: body
      - type: regex_parser
        id: extract-container-id
        regex: '^/var/lib/docker/containers/(?P<container_id>[^/]+)/'
        parse_from: attributes["log.file.path"]
      - type: move
        from: attributes.container_id
        to: resource["container.id"]

  # Journald — replaces promtail's journald scrape
  journald:
    directory: /run/log/journal
    units:
      - nginx
      - caddy
      - docker
    priority: info

  # Accept OTLP from apps (your own services that export traces/metrics)
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"

  # Scrape existing Prometheus endpoints you haven't migrated yet
  prometheus:
    config:
      scrape_configs:
        - job_name: "otelcol-self"
          scrape_interval: 30s
          static_configs:
            - targets: ["0.0.0.0:8888"]

processors:
  # Always include memory_limiter — first in every pipeline
  memory_limiter:
    check_interval: 5s
    limit_percentage: 75
    spike_limit_percentage: 20

  # Batch before shipping — reduces API calls and cost
  batch:
    send_batch_size: 1000
    timeout: 10s

  # Stamp every piece of data with host identity
  resource:
    attributes:
      - key: service.name
        value: "homelab-host"
        action: upsert
      - key: service.instance.id
        from_attribute: host.name
        action: insert
      - key: host.environment
        value: "homelab"
        action: insert

  # Drop noisy filesystem metrics for tmpfs mounts
  filter/drop-tmpfs:
    metrics:
      exclude:
        match_type: regexp
        resource_attributes:
          - key: system.filesystem.type
            value: "tmpfs|ramfs"

exporters:
  # Prometheus remote write → Grafana / Mimir / Thanos / bare Prometheus
  prometheusremotewrite:
    endpoint: "http://prometheus:9090/api/v1/write"
    tls:
      insecure: true
    resource_to_telemetry_conversion:
      enabled: true

  # Loki for logs — via Loki's native OTLP endpoint.
  # The old `loki` exporter is deprecated; Loki v3+ ingests OTLP directly,
  # so ship logs with otlphttp instead.
  otlphttp/loki:
    logs_endpoint: "http://loki:3100/otlp/v1/logs"
    tls:
      insecure: true

  # Tempo for traces
  otlp/tempo:
    endpoint: "http://tempo:4317"
    tls:
      insecure: true

  # Debug exporter — logs to stdout at info level, useful during setup
  debug:
    verbosity: basic

service:
  extensions: [health_check, pprof, zpages]

  pipelines:
    metrics:
      receivers: [hostmetrics, prometheus, otlp]
      processors: [memory_limiter, filter/drop-tmpfs, resource, batch]
      exporters: [prometheusremotewrite]

    logs:
      receivers: [filelog, journald, otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlphttp/loki]

    traces:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlp/tempo]

  telemetry:
    logs:
      level: "info"
    metrics:
      address: "0.0.0.0:8888"

A few things worth calling out:

memory_limiter is always first. If you put it after batch, the batch can swell past your limit before the limiter sees it. Limiter goes first, always.

resource processor stamps identity on everything. service.name and service.instance.id are OTel semantic conventions. Downstream tools, Tempo, Grafana, Jaeger, use them to correlate signals. Set them once here instead of in every app.

debug exporter is your friend during setup. Add it temporarily to any pipeline and watch your terminal. Remove it when things work.

Running It

Docker Compose, because that is where most home lab gear ends up:

services:
  otelcol:
    image: otel/opentelemetry-collector-contrib:0.130.0
    container_name: otelcol
    restart: unless-stopped
    volumes:
      - ./otelcol-config.yaml:/etc/otelcol-contrib/config.yaml:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /run/log/journal:/run/log/journal:ro
      - /proc:/hostproc:ro
      - /sys:/hostsys:ro
    environment:
      - HOST_PROC=/hostproc
      - HOST_SYS=/hostsys
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "13133:13133" # Health check
      - "8888:8888"   # Self-metrics
    networks:
      - monitoring
    command: ["--config=/etc/otelcol-contrib/config.yaml"]

networks:
  monitoring:
    external: true

The otel/opentelemetry-collector-contrib image includes all the community receivers and exporters. The otel/opentelemetry-collector slim image does not have filelog or journald, use contrib.

Start it:

docker compose up -d otelcol
docker compose logs -f otelcol

Hit the health check:

curl http://localhost:13133/
# {"status":"Server available","upSince":"...","uptime":"..."}

Migrating from Promtail

If you have Promtail shipping container logs to Loki, the mental model maps cleanly:

Promtail	OTel Collector
`scrape_configs[].static_configs`	`filelog` receiver `include` glob
`pipeline_stages`	`operators` array in filelog
`labels` block	`resource` processor attributes
`client` block	`otlphttp` exporter → Loki’s OTLP endpoint

The main gotcha: Promtail’s docker scrape target auto-discovers container names from the Docker daemon. The OTel filelog receiver reads raw log files, you have to parse the container ID from the file path (that regex_parser in the config above does this) and then use a docker_observer or dockerstats receiver to enrich with container names if you need them.

For most home lab use cases, container ID + your own naming convention is enough. If you need full container metadata enrichment, add the resourcedetection processor with the docker detector:

processors:
  resourcedetection/docker:
    detectors: [docker, system]
    docker:
      resource_attributes:
        container.name:
          enabled: true
        container.image.name:
          enabled: true

Migrating from Prometheus Node Exporter

The hostmetrics receiver covers the 90% case. Feature gaps worth knowing:

What hostmetrics gives you that node_exporter also gives you:

CPU (per-core, idle/user/system/iowait)
Memory (used/free/cached/buffers)
Disk I/O (read/write bytes, ops)
Filesystem usage
Network (bytes in/out, packets, errors)
Load averages
Process count

What node_exporter has that hostmetrics does not (yet):

Hardware monitoring via lm-sensors (fan speed, temperatures)
SMART disk health (smartmontools)
systemd unit states
EDAC memory error counters
Specific NFS/NFS4 stats

If you need sensor temps or SMART data, keep node_exporter running alongside the Collector and scrape it with the prometheus receiver. This is not a shameful hybrid, it is a migration strategy. Scrape node_exporter’s endpoint in prometheus.config.scrape_configs and route it through the same pipeline.

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: "node-exporter"
          scrape_interval: 30s
          static_configs:
            - targets: ["node-exporter:9100"]

One agent less is still progress.

Memory and Performance

On a home server doing light observability work, one host, a dozen containers, 30s scrape interval, expect the Collector to run comfortably under 100 MB RSS. If you see it climbing, these are your levers:

Batch processor tuning. Larger batches mean fewer API calls to exporters but higher memory peaks. Start with send_batch_size: 1000 and timeout: 10s. If your Loki or Prometheus can’t keep up, drop the batch size.

Memory limiter configuration. limit_percentage: 75 means the Collector will start dropping data if it hits 75% of the container’s memory limit before OOM-kill gets there. Set your container’s mem_limit explicitly, 512m is reasonable for agent mode.

services:
  otelcol:
    # ...
    deploy:
      resources:
        limits:
          memory: 512m

Filelog backpressure. If you have containers logging at 10k lines/second, filelog will faithfully try to keep up. Filter before export:

processors:
  filter/drop-debug-logs:
    logs:
      exclude:
        match_type: regexp
        record_attributes:
          - key: level
            value: "DEBUG|TRACE"

When NOT to Use the OTel Collector

The Collector is not a silver bullet, and there are real cases where you should reach for something else:

Ultra-low-power devices. A Raspberry Pi Zero or an ESP32 gateway node, anything with under 128 MB RAM, is not a good home for the Collector. fluent-bit at 1-2 MB RSS still wins there.

Specialized exporters with no OTel support. If you are shipping to InfluxDB and need line protocol with specific measurement names, telegraf is still the right tool. The OTel InfluxDB exporter exists but is immature.

When your whole stack is already Prometheus-native. If you have Prometheus, Alertmanager, Thanos, and you are happy, don’t fix what isn’t broken. The Collector adds value when you are adding logs and traces to an existing metrics setup, not when you want to replace a working metrics stack.

When you need the UI configuration. If your team prefers Grafana Alloy’s built-in UI (the successor to the now-EOL Grafana Agent) or Vector’s topology graph, those tools offer comparable pipelines with a friendlier operational model.

The Bottom Line

Six processes that each have their own config syntax, log format, and failure mode are not observability infrastructure, they are a support ticket waiting to happen.

The OpenTelemetry Collector gives you one config file, one service to restart, one health check endpoint, and one place to look when data stops flowing. The migration is not all-or-nothing, drop it alongside your existing agents, start routing one signal type through it, and rip out the old stuff when you trust what you see.

On a typical home server doing normal things, it runs in under 100 MB and doesn’t need babysitting. That’s the bar. Most of your current agents don’t clear it.

Your 2 AM self, the one who’s getting paged because promtail silently stopped and nobody noticed for three hours, will thank you.

OpenTelemetry Collector: One Pipeline to Rule Them All

What the Collector Actually Is

Agent Mode vs Gateway Mode

The Config That Replaces Your Agent Zoo

Running It

Migrating from Promtail

Migrating from Prometheus Node Exporter

Memory and Performance

When NOT to Use the OTel Collector

The Bottom Line

Responses from around the web

Discussion

Related Posts

TIG: Telegraf + InfluxDB + Grafana

Promtail to Alloy Migration: A Practical Diff

LibreNMS for SNMP-Heavy Home Networks

SmokePing for Internet Connection Sanity

OpenTelemetry Collector: One Pipeline to Rule Them All

What the Collector Actually Is

Agent Mode vs Gateway Mode

The Config That Replaces Your Agent Zoo

Running It

Migrating from Promtail

Migrating from Prometheus Node Exporter

Memory and Performance

When NOT to Use the OTel Collector

The Bottom Line

Related Reading

Responses from around the web

Discussion

Related Posts

TIG: Telegraf + InfluxDB + Grafana

Promtail to Alloy Migration: A Practical Diff

LibreNMS for SNMP-Heavy Home Networks

SmokePing for Internet Connection Sanity