You’ve got five services throwing logs at you. A user reports a bug. You need to search across all of them, correlate timestamps, filter by request ID. Your laptop has 16 GB of RAM. Elasticsearch wants 8 of those just to breathe.
Welcome to the eternal logging question: ELK or Loki?
Here’s the thing: they solve the same problem (centralized log collection and search) in fundamentally different ways. ELK is the Swiss Army knife with the fancy case. Loki is the focused tool that does one thing better. Knowing which one is which will save you from a painful rebuild six months in.
Why You Need Centralized Logging
In a single-server world, you SSH in and tail -f the logs. Done. But the moment you add a second container, a second service, a second host — logs scatter everywhere.
Your authentication service crashes at 2:47 AM. The error appears in Container A. But the request came from Container B. Container B forwarded it to Container C, which called a database running on Host 2. The actual error is in Container C’s logs, but you’re checking Container A.
Centralized logging glues them together. One place. One search. Everything correlated by timestamp, request ID, service name.
The argument isn’t “do you need it?” — it’s “how much RAM are you willing to part with?”
ELK: The Full-Text Powerhouse
ELK stands for Elasticsearch (storage), Logstash (pipeline), Kibana (UI). It’s been the industry standard since before Docker was cool.
How it works: Logstash ingests logs from anywhere (syslog, files, APIs, message queues). It parses, filters, and mutates them. Elasticsearch indexes everything — every word, every field, every log line becomes searchable content. Kibana lets you query it.
What you get: Full-text search. Complex parsing rules. Built-in alerting. Weeks of detailed log retention. You can search “what requests hit the database server that also mention ‘timeout’ and were slower than 5 seconds?” and get an answer in milliseconds.
What it costs: Memory. CPU. Storage. Elasticsearch with a month of logs at any real scale easily needs 16–32 GB RAM, more if you care about speed. A small cluster (three nodes for redundancy) becomes three servers worth of resources. Logstash adds overhead. Kibana adds more. You’re looking at a weekend to set it up right, another weekend to tune it, and ongoing maintenance to prevent it from eating all your disk space.
Real talk: ELK is for teams running infrastructure seriously — cloud-native companies, platforms, anyone with a dedicated DevOps person and a budget. It’s the right call if you need to search “all logs containing X” across a year of data. Or compliance requires 12 months of retention and full-text auditability.
Loki: The Prometheus-Native Alternative
Loki shipped in 2018 from Grafana Labs and asks a radically different question: What if we only indexed labels, not content?
Instead of indexing every word in every log line, Loki indexes structured metadata — service name, pod name, environment, hostname. The log content itself is stored as-is, but you don’t search it like Elasticsearch does. You filter by labels, then optionally grep-search within that filtered stream.
How it works: Promtail (the log agent) scrapes logs from files or Docker containers, tags them with labels, and ships them to Loki. Loki compresses and streams them to object storage (S3, local disk, whatever). Grafana queries Loki using LogQL (a simple query language). You see logs live, or dig into recent history.
What you get: A log system that runs happily on 2–4 GB RAM. Integration with your existing Prometheus and Grafana stack (one unified platform). Lightning-fast queries if you label well. Lower storage costs because Loki compresses aggressively. Simple deployment — Loki is a single binary.
What it doesn’t do: Full-text search of log content. You can grep-search (prefix matching) within a filtered stream, but you can’t search “all logs mentioning ‘timeout’” across 50 services without knowing which service to filter first. Complex parsing pipelines. Deep retention (Loki keeps a week or two by default, weeks if you tune it).
The trade-off is intentional: Loki trades full-text flexibility for efficiency. It’s betting that you’ll design your logs well (structured, labeled, informative) and won’t need to search blindly.
Full Stack: Loki + Promtail + Grafana
Here’s a real working Docker Compose setup:
version: '3.8'
services: loki: image: grafana/loki:latest container_name: loki ports: - "3100:3100" volumes: - ./loki-config.yml:/etc/loki/local-config.yml - loki_data:/loki command: -config.file=/etc/loki/local-config.yml networks: - logging
promtail: image: grafana/promtail:latest container_name: promtail volumes: - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/run/docker.sock:/var/run/docker.sock - ./promtail-config.yml:/etc/promtail/config.yml command: -config.file=/etc/promtail/config.yml networks: - logging
grafana: image: grafana/grafana:latest container_name: grafana ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - grafana_data:/var/lib/grafana networks: - logging
volumes: loki_data: grafana_data:
networks: logging: driver: bridgeLoki config (loki-config.yml):
auth_enabled: false
ingester: chunk_idle_period: 3m max_chunk_age: 1h max_streams_per_user: 10000
limits_config: enforce_metric_name: false reject_old_samples: true reject_old_samples_max_age: 168h
schema_config: configs: - from: 2020-05-15 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24h
storage_config: filesystem: directory: /loki/chunks
server: http_listen_port: 3100Promtail config (promtail-config.yml) — scrapes Docker container logs:
server: http_listen_port: 9080 grpc_listen_port: 0
clients: - url: http://loki:3100/loki/api/v1/push
scrape_configs: - job_name: docker docker_sd_configs: - host: unix:///var/run/docker.sock relabel_configs: - source_labels: ['__meta_docker_container_name'] target_label: 'container' - source_labels: ['__meta_docker_container_image_name'] target_label: 'image' - source_labels: ['__meta_docker_container_label_service'] target_label: 'service'Spin it up: docker compose up -d. Grafana is at http://localhost:3000 (admin/admin). Add Loki as a data source: http://loki:3100.
Querying: LogQL Basics
LogQL is simpler than it looks. Filter by labels, optionally grep.
Find all logs from the auth service:
{service="auth"}Same, but only errors:
{service="auth"} |= "error"Errors from auth but exclude 404s:
{service="auth"} |= "error" != "404"Pattern match (regex):
{service="auth"} |~ "timeout|connection reset"That’s 80% of what you’ll do. Loki’s simplicity is the point.
Grafana Explore: Logs + Metrics
One of Loki’s superpowers is that it lives inside Grafana alongside Prometheus. In Grafana’s Explore tab, you can query logs and metrics side-by-side.
See a CPU spike in Prometheus at 2:15 PM? Jump to Logs, filter by timestamp and service, and watch what was actually happening in your application at that exact moment. No context-switching between two UIs. No timestamp matching. Everything is there.
Try that with ELK and Prometheus. You’ll be copying timestamps and flipping between tabs.
ELK vs Loki: The Real Trade-Off
Choose Loki if:
- You’re already running Prometheus and Grafana
- You’re log-conscious (label design is baked in)
- You want a single pane of glass (metrics + logs in one platform)
- Your team is small and resources are tight
Choose ELK if:
- You need to search log content like Google (full-text, wildcard, fuzzy)
- You do compliance audits or deep forensics
- You have complex parsing pipelines (Logstash transformations)
- You can burn CPU and RAM for the flexibility
Both work. Neither is objectively right. But if your infrastructure fits in a home lab or a small AWS account, Loki will save your wallet and your Saturday night.
The difference between a logging system that costs thirty bucks a month and one that costs three hundred is usually just label discipline and knowing what you’re actually looking for.