Your System Is Lying to You — And eBPF Catches It
You’ve got a service that gets slow every Tuesday at 3 PM. Metrics look fine. Logs say nothing useful. You add more print statements and redeploy, then wait. Maybe it’s the database. Maybe it’s DNS. Maybe it’s that one intern’s cronjob.
Here’s the thing: the answers are already there — happening in the kernel, in real time — and you just don’t have a window into them. That’s what eBPF gives you. Not a hypothesis. Not another log level to crank up. Actual visibility into what the kernel is doing, right now, with your process, on your machine.
eBPF is the wildest thing to happen to Linux observability in a decade. If you’ve been hearing the acronym at every KubeCon and infrastructure meetup and nodding along politely, this is the post where you stop nodding and start actually using it.
What eBPF Actually Is (No Kernel Hacking Required)
BPF originally stood for Berkeley Packet Filter — a thing from 1992 that let you filter network packets efficiently. Nobody outside networking cared. Then Linux 3.18 (2014) landed “extended BPF” — eBPF — and the scope exploded.
The short version: eBPF lets you write small programs that run inside the Linux kernel, triggered by events. Syscalls, network packets, function calls, disk I/O, scheduler events — all fair game. You write the program, load it into the kernel, and the kernel runs it in a sandboxed VM with a verifier that checks it won’t crash anything.
That verifier is the key detail. This isn’t a loadable kernel module you compile and pray. The kernel verifier statically analyzes your eBPF bytecode before loading it: no infinite loops, no illegal memory access, no crashing the kernel. If the verifier rejects it, it doesn’t load. If it passes, it runs at near-native speed.
The result: you can instrument literally anything the kernel touches — without recompiling the kernel, without rebooting, without a kernel module, and without breaking production.
This is why Netflix, Google, Meta, and Cloudflare have all built substantial observability and networking infrastructure on eBPF. It’s not hype. It genuinely removes an entire category of “we can’t observe this without significant downtime.”
The Hook Points: Where You Can Attach Programs
eBPF programs attach to “hook points” in the kernel. The main ones you’ll care about as a sysadmin or SRE:
Tracepoints — stable, kernel-defined instrumentation points. Things like syscalls:sys_enter_open, net:netif_receive_skb, sched:sched_process_fork. These are stable across kernel versions and are the safest option.
Kprobes / Kretprobes — attach to any kernel function, on entry or return. Powerful but can break if internal kernel function names change between versions. Use tracepoints when you can, kprobes when you need something tracepoints don’t cover.
Uprobes / Uretprobes — same idea but for user-space functions. Attach to a function in a binary (like a Go runtime function or a Postgres internal) without modifying the binary.
XDP (eXpress Data Path) — hook into the network stack at the earliest possible point, before the kernel even allocates an sk_buff. This is how Cloudflare drops DDoS traffic at line rate.
TC (Traffic Control) — hook into kernel traffic control for packet manipulation.
For the use cases in this article — tracing, profiling, observability — you’ll mostly be living in tracepoints, kprobes, and uprobes.
The Toolchain: Pick Your Abstraction Level
The eBPF ecosystem has layers. From “just run this command” to “I am writing production infrastructure”:
bpftrace — Ad-Hoc One-Liners and Scripts
bpftrace is awk for the kernel. It’s a high-level tracing language that compiles down to eBPF. Single-line commands, quick scripts, “what is happening right now” debugging. This is where you’ll spend 90% of your time as an SRE.
BCC (BPF Compiler Collection) — Python Front-Ends
BCC lets you write eBPF programs in C with a Python (or Lua) front-end that loads and controls them. More powerful than bpftrace, more verbose. The bcc-tools package ships a ton of pre-built tools: tcpconnect, execsnoop, biolatency, funclatency, and many more. Great for “I want a real tool, not a one-liner.”
libbpf + CO-RE — Production Programs
For shipping eBPF code in a product, you want CO-RE (Compile Once, Run Everywhere). libbpf handles the low-level loading; CO-RE uses BTF (BPF Type Format) metadata to relocate field offsets at load time so your compiled binary works across different kernel versions. libbpf-go and libbpf-rs wrap this for Go and Rust respectively.
Full Platforms
- Cilium — eBPF-based Kubernetes networking and security. Replaces kube-proxy, enforces NetworkPolicy at wire speed.
- Tetragon — runtime security enforcement with eBPF. Block syscalls, kill processes, enforce network policies based on kernel events.
- Pixie — auto-instrumentation for Kubernetes apps. No code changes, no agents in your container. eBPF intercepts traffic and infers request/response latency, error rates, etc.
- Falco — runtime security observability. Originally kernel module, now eBPF-first. Detects anomalous behavior (shell in container, privilege escalation attempts).
- Coroot — eBPF-based APM and infrastructure observability.
Installing bpftrace
On a modern distro, this is not hard.
Debian/Ubuntu:
sudo apt updatesudo apt install -y bpftrace bpfcc-tools linux-headers-$(uname -r)RHEL/Rocky/AlmaLinux 9:
sudo dnf install -y bpftrace bcc-tools kernel-develArch:
sudo pacman -S bpftrace bccCheck your kernel supports it (you need 4.9+, ideally 5.8+ for full feature support):
uname -r# Should be 5.x or higher for the good stuffbpftrace --versionOne gotcha: bpftrace needs CAP_BPF or root. In most cases you’ll just run it with sudo. On containers, you need --privileged or a very carefully crafted capability set.
8 bpftrace One-Liners That Will Change Your Day
These run as-is on most modern Linux systems. All require sudo.
1. Every file opened, by process
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'This hooks every openat syscall and prints the process name and file path. Pipe it through grep to watch a specific service. Find out what config files your app is actually reading (versus what it claims to read).
2. TCP connections being made
sudo bpftrace -e 'kprobe:tcp_connect { printf("%s -> %s\n", comm, ntop(((struct sock *)arg0)->__sk_common.skc_daddr)); }'Or the cleaner version using tcpconnect from bcc-tools:
sudo /usr/sbin/tcpconnectInstant visibility into what’s connecting where. Great for “is this service actually talking to the right database?“
3. Syscall latency — slowest calls by process
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; }tracepoint:raw_syscalls:sys_exit { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); }END { print(@ns); }'Run this for 10 seconds, Ctrl-C. You get a histogram of syscall latency per process. If you’ve got a service taking 50ms+ on individual syscalls, this is where you find it.
4. TCP retransmits — in real time
sudo bpftrace -e 'kprobe:tcp_retransmit_skb { @retransmits[comm] = count(); }'TCP retransmits are usually invisible until your p99 latency starts hurting. This shows you which processes are retransmitting and how often.
5. Processes being spawned
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%-10s -> %s\n", comm, str(args->filename)); }'Watch what processes are spawning children. Useful for “what the hell is cron doing” and “is my deploy script actually calling what I think it’s calling.”
6. Disk I/O latency histogram
sudo bpftrace -e 'kprobe:blk_account_io_start { @start[arg0] = nsecs; }kprobe:blk_account_io_done { @io_ms = hist((nsecs - @start[arg0]) / 1000000); delete(@start[arg0]); }END { print(@io_ms); }'Or just use the BCC tool:
sudo /usr/sbin/biolatencyThat gives you a histogram of block I/O latency in milliseconds. If your database is slow, check this before blaming the query.
7. OOM kills
sudo bpftrace -e 'kprobe:oom_kill_process { printf("OOM killed: %s (pid %d)\n", comm, pid); }'OOM kills sometimes don’t make it into your monitoring cleanly. This catches them at the kernel level the moment they happen.
8. Function call frequency in a specific binary (uprobes)
sudo bpftrace -e 'uprobe:/usr/bin/python3:PyEval_EvalFrameEx { @[comm] = count(); }'Replace the binary path and function name for your use case. This shows how often a specific function is being called. Useful for “is this hot path actually hot?”
Writing a Multi-Line bpftrace Script
One-liners are great but sometimes you want to save something reusable. Create a .bt file:
#!/usr/bin/env bpftrace
// Track files opened by a process and flag slow openstracepoint:syscalls:sys_enter_openat{ @start[tid] = nsecs; @fname[tid] = args->filename;}
tracepoint:syscalls:sys_exit_openat/@start[tid]/{ $delta_ms = (nsecs - @start[tid]) / 1000000; if ($delta_ms > 1) { printf("SLOW OPEN: %-16s %-8d %dms %s\n", comm, pid, $delta_ms, str(@fname[tid])); } delete(@start[tid]); delete(@fname[tid]);}sudo bpftrace open_files_slow.btThis prints any file open that took more than 1ms — useful for catching NFS mounts that are misbehaving or slow FUSE filesystems.
The BCC Tools Cheat Sheet
If you installed bcc-tools, you already have a toolkit of production-ready tools. Quick rundown of the ones worth knowing:
| Tool | What it does |
|---|---|
execsnoop | Trace all new processes system-wide |
tcpconnect | Trace outbound TCP connections |
tcpaccept | Trace inbound TCP connections accepted |
tcpretrans | Trace TCP retransmits with addresses |
biolatency | Block I/O latency histogram |
funclatency | Latency of any kernel function |
profile | CPU profiler, outputs flamegraph data |
filetop | Top file reads/writes by process |
opensnoop | All file opens with process info |
runqlat | Scheduler run queue latency |
Most of these are /usr/sbin/<toolname> after installing bcc-tools. They’re Python scripts that load eBPF programs — no compilation needed.
eBPF vs. The Old Ways
vs. strace: strace uses ptrace, which stops the process on every syscall. eBPF has near-zero overhead and doesn’t pause your process. Profiling a high-throughput service with strace is a production incident waiting to happen. With bpftrace, you can run on production without flinching.
vs. SystemTap: SystemTap requires kernel debug symbols and compiles kernel modules at runtime. It’s powerful but fragile, slow to iterate, and a pain to deploy. eBPF with CO-RE is verifiably portable.
vs. dtrace: If you’re coming from Solaris or macOS, bpftrace is essentially DTrace for Linux. The syntax is inspired by DTrace’s D language. The concepts map directly.
vs. perf: perf is excellent for CPU profiling and works great alongside eBPF. They’re complementary. Use perf for CPU flamegraphs, eBPF for everything else.
The “Maybe Stop Paying for Datadog” Aside
If you’re running Kubernetes and you’ve got a Datadog APM bill that makes you wince, look at Pixie (open source, CNCF incubating) and Coroot (open source APM on eBPF).
Pixie auto-instruments your entire cluster with eBPF — no sidecars, no code changes, no SDK. It captures HTTP/gRPC/DNS/PostgreSQL/Redis traffic and computes golden signals automatically. The free Community Cloud tier covers most small clusters. If you’d rather self-host, the open-source version deploys to your cluster with a Helm chart.
Coroot does similar things with an APM-style UI on top of eBPF metrics — service maps, latency breakdowns, CPU/memory profiling — all from kernel-level data collection.
Neither of these is a full Datadog replacement if you’re heavily invested in its log management or alerting. But if you’re mostly paying for APM traces and infrastructure metrics? Run the numbers. eBPF-based observability is genuinely mature now, and “we hooked into the kernel instead of instrumenting your code” is a more honest architecture than “please add our SDK to every service.”
Where to Go Next
You’ve got bpftrace installed. You’ve run a few one-liners. Here’s the progression:
-
Explore the bpftrace reference guide —
man bpftraceor the GitHub wiki. The built-in variables (comm,pid,tid,nsecs,retval) and map types (@map[],hist(),count()) unlock most of what you’ll want to do. -
Browse BCC examples —
/usr/share/bcc/examples/or the BCC GitHub repo has hundreds of example programs covering every subsystem. -
Brendan Gregg’s blog — If eBPF has a patron saint, it’s Brendan Gregg. His site (brendangregg.com) has the Linux performance tools map, flamegraph methodology, and years of BPF/bpftrace writeups. Bookmark it.
-
“BPF Performance Tools” (the book) — Gregg wrote the book. Literally. It’s comprehensive and stays practical. Worth it if you’re doing this for work.
-
Cilium — If you’re running Kubernetes and not using Cilium as your CNI, it’s worth evaluating. eBPF-native networking with built-in observability (Hubble) is genuinely better than the iptables-based alternatives.
The kernel has always known what’s happening. eBPF is finally how you get to ask it.
Your 2 AM self with an unexplained latency spike will appreciate having bpftrace already installed.