Logs Are Evidence, But Only If You Can Afford to Keep Them
Every service produces logs. Logs are how you find out why something broke, when it started breaking, and what it was doing before it broke. This is not optional information.
The problem is scale. A single Docker host running a dozen containers can produce millions of log lines per day. Reading them with docker logs -f and grep works until it doesn’t — until you need to correlate logs from three services to understand an incident, or find an error that happened six hours ago across 40 containers.
Centralized logging is the answer: ship all logs to one place, with timestamps and source labels, and query them together. The question is which log aggregation stack you put that one place on, and whether you have the hardware budget to run it.
ELK (Elasticsearch, Logstash, Kibana) is the traditional enterprise answer. It’s extraordinarily capable. It will full-text index every character of every log line, giving you Google-grade search across your entire log corpus. It will also eat your VPS budget for breakfast and ask for a second serving.
Grafana Loki is the modern self-hoster’s answer. It doesn’t index log content — it indexes labels (like “service=nginx, host=web1, level=error”). It stores compressed log chunks. It’s dramatically lighter. The query language is less powerful than Elasticsearch’s, but it covers 90% of what most people actually need from their logs.
The ELK Stack: Power at a Price
ELK is shorthand for three components:
- Elasticsearch: the storage and search engine — a distributed, full-text indexed document store
- Logstash: the ingestion pipeline — receives logs, parses them, enriches them, sends them to Elasticsearch
- Kibana: the UI — dashboards, queries, visualizations against Elasticsearch
Beats (Filebeat, Metricbeat, etc.) often replace Logstash for simpler ingestion scenarios — they’re lighter weight shippers that run on each host and send logs directly to Elasticsearch.
What ELK Does Well
Full-text search: Elasticsearch indexes every token in every log line. You can search for any string, anywhere in your logs, instantly. This is genuinely powerful for debugging — kubectl logs | grep but at scale and across time.
Structured query language: Kibana Query Language (KQL) and Elasticsearch Query DSL give you expressive search: level:error AND service:api AND NOT message:"connection reset". Regular expressions, ranges, nested queries — all supported.
Aggregations and analytics: Elasticsearch’s aggregation framework lets you do serious log analytics — histogram of error rates over time, top 10 slowest endpoints by average response time, count of log events per service per hour. This is closer to a database than a log viewer.
Mature ecosystem: Elastic has been at this since 2010. The product is polished, documentation is extensive, there are official APM agents for most languages, and Kibana’s dashboards are genuinely nice.
The Problems with ELK
Memory hunger: Elasticsearch is a JVM application. The recommended minimum heap size is 4GB, but in practice a single-node cluster with meaningful log volume needs 8-16GB to be usable. The Kibana node needs another 1-2GB. Logstash, if you’re running it, is another 1-2GB. On a homelab or small VPS, this is the entire machine.
Operational complexity: Elasticsearch requires care. Index management (old indices fill up disk), shard configuration (too many shards kills performance), upgrade paths (Elasticsearch upgrades are famously tricky), cluster state management — there’s real operational overhead that doesn’t exist for lighter tools.
Licensing drift: The Elasticsearch license changed in 2021 (from Apache 2.0 to SSPL), which is why AWS forked it as OpenSearch. The free “Basic” tier covers self-hosting but some features require paid licenses.
Cost at cloud scale: If you’re not self-hosting and use Elastic Cloud, it gets expensive fast once your log volume grows. This is the scenario that drove many teams to Loki.
Running ELK with Docker Compose
version: "3"
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
volumes:
- es-data:/usr/share/elasticsearch/data
ports:
- "9200:9200"
# WARNING: This will consume ~2-4GB RAM minimum
kibana:
image: docker.elastic.co/kibana/kibana:8.12.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- "5601:5601"
depends_on:
- elasticsearch
filebeat:
image: docker.elastic.co/beats/filebeat:8.12.0
user: root
volumes:
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
volumes:
es-data:
With filebeat.yml:
filebeat.autodiscover:
providers:
- type: docker
hints.enabled: true
output.elasticsearch:
hosts: ["elasticsearch:9200"]
Grafana Loki: Lighter and Cheaper
Loki was built by Grafana Labs with a specific design philosophy: don’t index log content. Only index metadata labels. Store the actual log content compressed in chunks (locally or in object storage like S3/MinIO).
This means:
- Much lower storage costs (compressed chunks vs. indexed terms)
- Much lower RAM requirements (no Elasticsearch inverted index in memory)
- Simpler operations (no shard management, no index rotation)
- Less query power (no full-text search — you stream-filter instead)
The tradeoff is real: if you need to search logs you haven’t pre-labeled, Loki is slower than Elasticsearch because it has to scan the raw compressed chunks. For most self-hosted use cases, this is fine. For a compliance tool where you need to search millions of log lines for an arbitrary string from three years ago, ELK is better.
The Loki Stack
Loki usually runs with two companions:
- Promtail: a lightweight agent that tails log files or Docker container logs and ships them to Loki with labels
- Grafana: the UI — Loki doesn’t have its own dashboard, it integrates with Grafana as a data source
This is elegant if you’re already running Grafana for metrics (Prometheus + Grafana). Adding Loki just adds a second data source to the same dashboard.
Running Loki + Promtail + Grafana
version: "3"
services:
loki:
image: grafana/loki:2.9.0
restart: always
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki-data:/loki
promtail:
image: grafana/promtail:2.9.0
restart: always
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock
- ./promtail-config.yml:/etc/promtail/config.yml:ro
command: -config.file=/etc/promtail/config.yml
grafana:
image: grafana/grafana:latest
restart: always
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=changeme
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- loki
volumes:
loki-data:
grafana-data:
With promtail-config.yml:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'logstream'
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'service'
This configuration auto-discovers Docker containers and ships their logs with container, logstream, and service labels. In Grafana, you add Loki as a data source (http://loki:3100) and start querying.
LogQL Basics
Loki’s query language is LogQL. It has two modes: log queries (return log lines) and metric queries (aggregate log data into metrics).
# All logs from the nginx container
{container="nginx"}
# Filter to lines containing "error"
{container="nginx"} |= "error"
# Filter using regex
{service="api"} |~ "status=[45][0-9][0-9]"
# Parse structured log lines
{service="api"} | json | status >= 500
# Count errors per minute over last hour
rate({service="api"} |= "error" [1m])
# Count by service
sum by (service) (
rate({job="docker"} |= "error" [5m])
)
The {label="value"} selector is mandatory — Loki requires at least one label selector to narrow down which log streams to query. You can’t just write |= "error" without a label selector. This is a design constraint that enforces good labeling practices but surprises people coming from Elasticsearch where you can search everything by default.
Loki vs ELK: Side by Side
| Feature | Loki | ELK Stack |
|---|---|---|
| RAM requirement | ~256MB-512MB | 8-16GB (Elasticsearch) |
| Full-text search | No (label + content filter) | Yes (indexed) |
| Query language | LogQL | KQL / ES Query DSL |
| Storage efficiency | High (compressed chunks) | Lower (indexed terms) |
| Setup complexity | Low | High |
| Index management | None | Required (ILM policies) |
| UI | Grafana (separate) | Kibana (included) |
| Alerting | Via Grafana | Via Kibana/ElastAlert |
| License | AGPL | SSPL (Elastic) |
| Cost (cloud) | Grafana Cloud free tier | Elastic Cloud (expensive) |
| Best for | Kubernetes/container logs | Enterprise log analytics |
| Docker Compose | 3 small containers | 2-3 heavy containers |
| Scales to | Millions of lines/day easily | Billions with cluster |
Which One Is Right for You
Choose Loki when:
- You’re on a VPS or homelab with limited RAM
- You’re already running Grafana for metrics
- You need container/Kubernetes log aggregation
- Your log query needs are “find errors from the last hour in service X”
- Budget matters
Choose ELK when:
- You need full-text search across arbitrary log content
- You’re doing serious log analytics (aggregations, statistical queries)
- You have the RAM budget (dedicated server or cloud)
- Compliance requires detailed searchable log retention
- You need APM/tracing integration (Elastic’s APM ecosystem is excellent)
For a typical self-hosted homelab or small production environment with a handful of Docker services, Loki is almost always the right answer. The resource savings are real, the setup is genuinely simple, and LogQL covers the actual log query scenarios you’ll encounter 95% of the time.
Save Elasticsearch for when you actually need what only Elasticsearch can provide. “I can full-text search everything” is a capability that costs 8GB of RAM to maintain at idle. Make sure you actually need it before paying that bill.