Riemann: The Forgotten Event-Stream Monitor for Home Labs

Prometheus Is Not Always the Right Tool

You have Prometheus. It scrapes your exporters every 15 seconds, you have Grafana dashboards, and your alerting rules fire on sustained conditions. Life is good.

Then something weird happens: ten containers OOM-kill in 60 seconds across the same host, individually, each one looks like a blip. No single container breached a threshold long enough to trigger an alert. Prometheus never fired. You found out at 2 AM because your disk ran out of inodes, not because anything paged you.

This is not a Prometheus failure. It is a category mismatch. Prometheus is a pull-based, time-series database. Its mental model is: “give me the value of this metric at this point in time.” That model is excellent for CPU utilization, memory pressure, request rates. It is the wrong model for questions like “did ten bad things happen in the same 60-second window on the same host?”, because that is an event-stream problem, not a time-series problem.

Riemann is the tool someone built to answer exactly that kind of question. It is also the tool most people have never heard of.

What Riemann Actually Is

Riemann was written by Kyle Kingsbury, yes, the Jepsen guy, starting around 2012. It is a JVM-based event stream processor with a configuration language written in Clojure. Events flow in via TCP, UDP, or WebSocket. You write streams in Clojure DSL that filter, aggregate, transform, and route those events to outputs: PagerDuty, Slack, InfluxDB, Graphite, email, or whatever you wire up.

The mental model is a pipeline:

[event sources] → [Riemann streams] → [outputs]

An “event” in Riemann is a map with fields: host, service, metric, state, time, ttl, tags, and arbitrary custom fields. Your app sends an event when something happens. Riemann receives it, runs it through your stream functions, and takes action.

That is the key difference: Riemann reacts to what you push, not to what it polls. If you push an event every time a container exits with OOM, Riemann can count ten of those within a 60-second rolling window and fire an alert. Prometheus cannot do that without stitching together Pushgateway, recording rules, and a lot of patience.

Running Riemann in 2026

Riemann’s Docker image is still maintained (community-driven at this point), and it runs fine on a home lab server:

FROM debian:bookworm-slim

RUN apt-get update && apt-get install -y \
    openjdk-17-jre-headless \
    curl \
    && rm -rf /var/lib/apt/lists/*

RUN curl -Lo /opt/riemann.tar.bz2 \
    https://github.com/riemann/riemann/releases/download/0.3.12/riemann-0.3.12.tar.bz2 \
    && tar -xjf /opt/riemann.tar.bz2 -C /opt \
    && ln -s /opt/riemann-0.3.12 /opt/riemann \
    && rm /opt/riemann.tar.bz2

COPY riemann.config /etc/riemann/riemann.config
EXPOSE 5555 5556 5557
CMD ["/opt/riemann/bin/riemann", "/etc/riemann/riemann.config"]

Or, if you prefer Compose:

services:
  riemann:
    image: riemannio/riemann:0.3.12
    ports:
      - "5555:5555"     # TCP events
      - "5555:5555/udp" # UDP events
      - "5556:5556"     # WebSocket
      - "5557:5557"     # HTTP API
    volumes:
      - ./riemann.config:/etc/riemann/riemann.config:ro
    restart: unless-stopped

The Config DSL (and Why Clojure Is Both the Feature and the Bug)

Riemann’s config is a Clojure program. This is simultaneously its best and worst feature.

Best: you get a real programming language. Conditionals, functions, let bindings, map/filter, custom logic. PromQL is a query language bolted onto a time-series DB; Riemann config is code.

Worst: if you have never written a Clojure parenthesis in your life, the config will look like someone’s cat walked across the keyboard.

Here is a minimal config that accepts events and sends high-severity ones to Slack:

; riemann.config — minimal working example

(logging/init {:file "/var/log/riemann/riemann.log"})

(let [host "0.0.0.0"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (ws-server  {:host host}))

(periodically-expire 5)

; Slack webhook output (riemann-slack plugin or HTTP call)
(def slack-notify
  (fn [event]
    (let [msg (str "[" (:host event) "] "
                   (:service event) " — "
                   (:state event) ": "
                   (:description event))]
      (riemann.common/post-body
        "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
        {:text msg}))))

; Main stream
(streams
  ; Drop anything with no host or service
  (where (and host service)
    ; Route critical events to Slack
    (where (= state "critical")
      slack-notify)

    ; Log everything else
    (fn [event]
      (info "event" event))))

Not beautiful, but it works. Now for the interesting part.

Real Example: OOM Storm Detection

This is the kind of alert Riemann was built for. You want to fire a single “OOM storm” alert when 10 or more container OOM events hit the same host within 60 seconds, not 10 separate “container died” pages.

First, your containers need to send events. You can do this from a script that watches Docker events:

import subprocess
import json
import socket
import struct
import time

RIEMANN_HOST = "riemann"
RIEMANN_PORT = 5555

def send_riemann_event(host, service, metric, state, description, tags=None):
    """Send a raw Riemann event over TCP (simplified, no proto3 encoding)."""
    # In production, use the riemann-client Python library
    import riemann_client.client as rc
    import riemann_client.transport as rt

    with rc.Client(rt.TCPTransport(RIEMANN_HOST, RIEMANN_PORT)) as client:
        client.event(
            host=host,
            service=service,
            metric=metric,
            state=state,
            description=description,
            tags=tags or [],
            ttl=120,
        )

def watch_docker_events():
    proc = subprocess.Popen(
        ["docker", "events", "--format", "{{json .}}", "--filter", "event=oom"],
        stdout=subprocess.PIPE,
        text=True,
    )
    import socket as s
    hostname = s.gethostname()

    for line in proc.stdout:
        try:
            ev = json.loads(line.strip())
            container = ev.get("Actor", {}).get("Attributes", {}).get("name", "unknown")
            send_riemann_event(
                host=hostname,
                service="docker.container.oom",
                metric=1,
                state="warning",
                description=f"Container OOM: {container}",
                tags=["docker", "oom", container],
            )
        except Exception as e:
            print(f"Error processing event: {e}")

if __name__ == "__main__":
    watch_docker_events()

Install the client:

pip install riemann-client

Now the Riemann side, the stream that detects the storm:

(streams
  ; Only process OOM events
  (where (= service "docker.container.oom")

    ; Rolling window: count OOM events per host over 60 seconds
    (by [:host]
      (moving-time-window 60
        (fn [events]
          (let [oom-count (count events)]
            (when (>= oom-count 10)
              ; Build a synthetic "storm" event
              (let [storm-event {:host   (:host (first events))
                                 :service "docker.oom.storm"
                                 :metric  oom-count
                                 :state   "critical"
                                 :description (str oom-count " container OOM kills in 60s")
                                 :tags   ["storm" "oom" "docker"]}]
                ; Deduplicate: only fire once per storm, not once per event
                (throttle 1 300
                  slack-notify)
                (info "OOM storm detected:" storm-event))))))))

The by [:host] partitions the stream per host, so a noisy VM does not mask a quieter one. moving-time-window 60 keeps a rolling 60-second buffer of events. throttle 1 300 ensures you get at most one alert per 5 minutes per host, your phone will thank you.

This is genuinely hard to replicate in pure Prometheus. You could approximate it with:

A Pushgateway receiving OOM events
A recording rule summing them over 60s
An alert rule firing when the sum >= 10

But you still get the cardinality problem (one series per container), stale metric expiry issues, and you cannot deduplicate the alert cleanly without Alertmanager silences. Riemann does it in 10 lines of Clojure.

Wiring Outputs

Riemann has built-in outputs and a plugin ecosystem. Common ones:

InfluxDB (time-series storage for dashboards):

(def influx
  (influxdb {:host "influxdb"
             :port 8086
             :db   "riemann"
             :username "riemann"
             :password "secret"}))

(streams
  (where metric
    influx))

Graphite:

(def graphite-out (graphite {:host "graphite" :port 2003}))

PagerDuty (via riemann-pagerduty plugin):

(def pd (pagerduty "your-integration-key"))

(streams
  (where (= state "critical")
    pd))

Email:

(def email
  (mailer {:host "smtp.example.com"
           :from "riemann@example.com"}))

(streams
  (where (= state "critical")
    (email "oncall@example.com")))

Honest Talk About the Ecosystem

Riemann peaked around 2015-2017. The Clojure ecosystem has not collapsed, but it has not grown the way Go or Rust tooling has. You need to be aware of a few things before you commit:

riemann-dash (the built-in dashboard) is dated. You will want to send metrics to InfluxDB/Graphite and use Grafana instead.
The plugin ecosystem has some unmaintained gems. Check GitHub last-commit dates before depending on any plugin.
The Docker image is community-maintained, not from a commercial entity. Releases are infrequent: 0.3.12 landed in mid-2025, the prior 0.3.10 was back in 2022.
JVM cold start is 3-5 seconds. On a Pi 4 with 4 GB RAM this is fine; on a Pi Zero it is not.
The Clojure barrier is real. If nobody on your team has touched a Lisp, budget an hour of confusion before anything makes sense.

Alternatives in the Same Niche

If Riemann’s vibe is not for you, here is where the same problem space lives in 2026:

Tool	Approach	Clojure Required	Event Windows
Riemann	JVM stream processor	Yes	Native
Vector + VRL	Rust pipeline + transform language	No	Limited
Logstash + Watcher	ELK-native event routing	No	Good (complex config)
Prometheus + Pushgateway	Pull-based with push bridge	No	Approximate
OpenObserve	Modern stream + alerting	No	Good
Benthos / Redpanda Connect	Go-based stream processor	No	Good

Vector (vector.dev) is the closest modern equivalent in spirit, events, transformations, routing, outputs. The VRL (Vector Remap Language) is more approachable than Clojure for most people and Rust performance is excellent. If you are starting fresh in 2026 and need event-stream processing without the JVM overhead, Vector is probably the move.

Riemann wins when you need complex stateful aggregation (rolling windows, per-host partitioning, deduplication) expressed in a real programming language. Vector’s alerting story is improving but not there yet for arbitrarily complex stream logic.

When NOT to Use Riemann

Save yourself the JVM overhead if:

You have one host. Prometheus + Alertmanager is fine. Riemann is most useful when you are correlating events across multiple hosts.
Your team is Clojure-skeptic. The config is not optional: you will be maintaining Clojure. If that word makes your team’s eyes glaze over, use Vector or just accept Prometheus’s limitations.
You need long-term storage. Riemann is a processor, not a database. You still need InfluxDB/Prometheus/Graphite downstream.
Your events are actually metrics. If you are thinking “I want to alert when CPU > 80% for 5 minutes,” that is a time-series problem. Prometheus wins.

The Bottom Line

Riemann is well over a decade old and looks it. The dashboard is dated, the ecosystem is quiet, and the Clojure config will earn you some side-eyes in a PR review. None of that makes it wrong.

For a specific class of problem, correlating discrete events across hosts in fast rolling windows, with expressive logic that PromQL cannot cleanly express, Riemann is still one of the most direct tools available. The OOM storm detection example above is 10 lines of config. The equivalent Prometheus setup is a multi-component Rube Goldberg machine with three potential failure points.

Worth knowing it exists. Worth spinning it up in a container for a weekend to see if it fits your specific pain. Not worth ripping out Prometheus for, they solve different problems and run perfectly well side by side.

Your 2 AM self will appreciate knowing there is more than one hammer in the box.

Riemann: The Forgotten Event-Stream Monitor for Home Labs

Prometheus Is Not Always the Right Tool

What Riemann Actually Is

Running Riemann in 2026

The Config DSL (and Why Clojure Is Both the Feature and the Bug)

Real Example: OOM Storm Detection

Wiring Outputs

Honest Talk About the Ecosystem

Alternatives in the Same Niche

When NOT to Use Riemann

The Bottom Line

Responses from around the web

Discussion

Related Posts

TIG: Telegraf + InfluxDB + Grafana

Promtail to Alloy Migration: A Practical Diff

LibreNMS for SNMP-Heavy Home Networks

SmokePing for Internet Connection Sanity

Riemann: The Forgotten Event-Stream Monitor for Home Labs

Prometheus Is Not Always the Right Tool

What Riemann Actually Is

Running Riemann in 2026

The Config DSL (and Why Clojure Is Both the Feature and the Bug)

Real Example: OOM Storm Detection

Wiring Outputs

Honest Talk About the Ecosystem

Alternatives in the Same Niche

When NOT to Use Riemann

The Bottom Line

Related Reading

Responses from around the web

Discussion

Related Posts

TIG: Telegraf + InfluxDB + Grafana

Promtail to Alloy Migration: A Practical Diff

LibreNMS for SNMP-Heavy Home Networks

SmokePing for Internet Connection Sanity