Skip to content
Go back

Riemann: The Forgotten Event-Stream Monitor for Home Labs

By SumGuy 10 min read
Riemann: The Forgotten Event-Stream Monitor for Home Labs

Prometheus Is Not Always the Right Tool

You have Prometheus. It scrapes your exporters every 15 seconds, you have Grafana dashboards, and your alerting rules fire on sustained conditions. Life is good.

Then something weird happens: ten containers OOM-kill in 60 seconds across the same host — individually, each one looks like a blip. No single container breached a threshold long enough to trigger an alert. Prometheus never fired. You found out at 2 AM because your disk ran out of inodes, not because anything paged you.

This is not a Prometheus failure. It is a category mismatch. Prometheus is a pull-based, time-series database. Its mental model is: “give me the value of this metric at this point in time.” That model is excellent for CPU utilization, memory pressure, request rates. It is the wrong model for questions like “did ten bad things happen in the same 60-second window on the same host?” — because that is an event-stream problem, not a time-series problem.

Riemann is the tool someone built to answer exactly that kind of question. It is also the tool most people have never heard of.


What Riemann Actually Is

Riemann was written by Kyle Kingsbury — yes, the Jepsen guy — starting around 2012. It is a JVM-based event stream processor with a configuration language written in Clojure. Events flow in via TCP, UDP, or WebSocket. You write streams in Clojure DSL that filter, aggregate, transform, and route those events to outputs: PagerDuty, Slack, InfluxDB, Graphite, email, or whatever you wire up.

The mental model is a pipeline:

[event sources] → [Riemann streams] → [outputs]

An “event” in Riemann is a map with fields: host, service, metric, state, time, ttl, tags, and arbitrary custom fields. Your app sends an event when something happens. Riemann receives it, runs it through your stream functions, and takes action.

That is the key difference: Riemann reacts to what you push, not to what it polls. If you push an event every time a container exits with OOM, Riemann can count ten of those within a 60-second rolling window and fire an alert. Prometheus cannot do that without stitching together Pushgateway, recording rules, and a lot of patience.


Running Riemann in 2026

Riemann’s Docker image is still maintained (community-driven at this point), and it runs fine on a home lab server:

Dockerfile
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y \
openjdk-17-jre-headless \
curl \
&& rm -rf /var/lib/apt/lists/*
RUN curl -Lo /opt/riemann.tar.bz2 \
https://github.com/riemann/riemann/releases/download/0.3.10/riemann-0.3.10.tar.bz2 \
&& tar -xjf /opt/riemann.tar.bz2 -C /opt \
&& ln -s /opt/riemann-0.3.10 /opt/riemann \
&& rm /opt/riemann.tar.bz2
COPY riemann.config /etc/riemann/riemann.config
EXPOSE 5555 5556 5557
CMD ["/opt/riemann/bin/riemann", "/etc/riemann/riemann.config"]

Or, if you prefer Compose:

compose.yaml
services:
riemann:
image: riemannio/riemann:0.3.10
ports:
- "5555:5555" # TCP events
- "5555:5555/udp" # UDP events
- "5556:5556" # WebSocket
- "5557:5557" # HTTP API
volumes:
- ./riemann.config:/etc/riemann/riemann.config:ro
restart: unless-stopped

The Config DSL (and Why Clojure Is Both the Feature and the Bug)

Riemann’s config is a Clojure program. This is simultaneously its best and worst feature.

Best: you get a real programming language. Conditionals, functions, let bindings, map/filter, custom logic. PromQL is a query language bolted onto a time-series DB; Riemann config is code.

Worst: if you have never written a Clojure parenthesis in your life, the config will look like someone’s cat walked across the keyboard.

Here is a minimal config that accepts events and sends high-severity ones to Slack:

riemann.config
; riemann.config — minimal working example
(logging/init {:file "/var/log/riemann/riemann.log"})
(let [host "0.0.0.0"]
(tcp-server {:host host})
(udp-server {:host host})
(ws-server {:host host}))
(periodically-expire 5)
; Slack webhook output (riemann-slack plugin or HTTP call)
(def slack-notify
(fn [event]
(let [msg (str "[" (:host event) "] "
(:service event) ""
(:state event) ": "
(:description event))]
(riemann.common/post-body
"https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
{:text msg}))))
; Main stream
(streams
; Drop anything with no host or service
(where (and host service)
; Route critical events to Slack
(where (= state "critical")
slack-notify)
; Log everything else
(fn [event]
(info "event" event))))

Not beautiful, but it works. Now for the interesting part.


Real Example: OOM Storm Detection

This is the kind of alert Riemann was built for. You want to fire a single “OOM storm” alert when 10 or more container OOM events hit the same host within 60 seconds — not 10 separate “container died” pages.

First, your containers need to send events. You can do this from a script that watches Docker events:

docker-event-watcher.py
import subprocess
import json
import socket
import struct
import time
RIEMANN_HOST = "riemann"
RIEMANN_PORT = 5555
def send_riemann_event(host, service, metric, state, description, tags=None):
"""Send a raw Riemann event over TCP (simplified, no proto3 encoding)."""
# In production, use the riemann-client Python library
import riemann_client.client as rc
import riemann_client.transport as rt
with rc.Client(rt.TCPTransport(RIEMANN_HOST, RIEMANN_PORT)) as client:
client.event(
host=host,
service=service,
metric=metric,
state=state,
description=description,
tags=tags or [],
ttl=120,
)
def watch_docker_events():
proc = subprocess.Popen(
["docker", "events", "--format", "{{json .}}", "--filter", "event=oom"],
stdout=subprocess.PIPE,
text=True,
)
import socket as s
hostname = s.gethostname()
for line in proc.stdout:
try:
ev = json.loads(line.strip())
container = ev.get("Actor", {}).get("Attributes", {}).get("name", "unknown")
send_riemann_event(
host=hostname,
service="docker.container.oom",
metric=1,
state="warning",
description=f"Container OOM: {container}",
tags=["docker", "oom", container],
)
except Exception as e:
print(f"Error processing event: {e}")
if __name__ == "__main__":
watch_docker_events()

Install the client:

Terminal window
pip install riemann-client

Now the Riemann side — the stream that detects the storm:

riemann.config (OOM storm detection)
(streams
; Only process OOM events
(where (= service "docker.container.oom")
; Rolling window: count OOM events per host over 60 seconds
(by [:host]
(moving-time-window 60
(fn [events]
(let [oom-count (count events)]
(when (>= oom-count 10)
; Build a synthetic "storm" event
(let [storm-event {:host (:host (first events))
:service "docker.oom.storm"
:metric oom-count
:state "critical"
:description (str oom-count " container OOM kills in 60s")
:tags ["storm" "oom" "docker"]}]
; Deduplicate: only fire once per storm, not once per event
(throttle 1 300
slack-notify)
(info "OOM storm detected:" storm-event))))))))

The by [:host] partitions the stream per host, so a noisy VM does not mask a quieter one. moving-time-window 60 keeps a rolling 60-second buffer of events. throttle 1 300 ensures you get at most one alert per 5 minutes per host — your phone will thank you.

This is genuinely hard to replicate in pure Prometheus. You could approximate it with:

But you still get the cardinality problem (one series per container), stale metric expiry issues, and you cannot deduplicate the alert cleanly without Alertmanager silences. Riemann does it in 10 lines of Clojure.


Wiring Outputs

Riemann has built-in outputs and a plugin ecosystem. Common ones:

InfluxDB (time-series storage for dashboards):

(def influx
(influxdb {:host "influxdb"
:port 8086
:db "riemann"
:username "riemann"
:password "secret"}))
(streams
(where metric
influx))

Graphite:

(def graphite-out (graphite {:host "graphite" :port 2003}))

PagerDuty (via riemann-pagerduty plugin):

(def pd (pagerduty "your-integration-key"))
(streams
(where (= state "critical")
pd))

Email:

(def mailer
(mailer {:host "smtp.example.com"
:from "riemann@example.com"}))
(streams
(where (= state "critical")
(email "oncall@example.com")))

Honest Talk About the Ecosystem

Riemann peaked around 2015-2017. The Clojure ecosystem has not collapsed, but it has not grown the way Go or Rust tooling has. You need to be aware of a few things before you commit:


Alternatives in the Same Niche

If Riemann’s vibe is not for you, here is where the same problem space lives in 2026:

ToolApproachClojure RequiredEvent Windows
RiemannJVM stream processorYesNative
Vector + VRLRust pipeline + transform languageNoLimited
Logstash + WatcherELK-native event routingNoGood (complex config)
Prometheus + PushgatewayPull-based with push bridgeNoApproximate
OpenObserveModern stream + alertingNoGood
Benthos / Redpanda ConnectGo-based stream processorNoGood

Vector (vector.dev) is the closest modern equivalent in spirit — events, transformations, routing, outputs. The VRL (Vector Remap Language) is more approachable than Clojure for most people and Rust performance is excellent. If you are starting fresh in 2026 and need event-stream processing without the JVM overhead, Vector is probably the move.

Riemann wins when you need complex stateful aggregation (rolling windows, per-host partitioning, deduplication) expressed in a real programming language. Vector’s alerting story is improving but not there yet for arbitrarily complex stream logic.


When NOT to Use Riemann

Save yourself the JVM overhead if:


The Bottom Line

Riemann is 12 years old and looks it. The dashboard is dated, the ecosystem is quiet, and the Clojure config will earn you some side-eyes in a PR review. None of that makes it wrong.

For a specific class of problem — correlating discrete events across hosts in fast rolling windows, with expressive logic that PromQL cannot cleanly express — Riemann is still one of the most direct tools available. The OOM storm detection example above is 10 lines of config. The equivalent Prometheus setup is a multi-component Rube Goldberg machine with three potential failure points.

Worth knowing it exists. Worth spinning it up in a container for a weekend to see if it fits your specific pain. Not worth ripping out Prometheus for — they solve different problems and run perfectly well side by side.

Your 2 AM self will appreciate knowing there is more than one hammer in the box.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Lima vs Multipass
Next Post
Compose Watch: Faster Dev Loops

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts