Skip to content
Go back

eBPF for the Curious: Kernel Tracing Without the PhD

By SumGuy 11 min read
eBPF for the Curious: Kernel Tracing Without the PhD

Your System Is Lying to You — And eBPF Catches It

You’ve got a service that gets slow every Tuesday at 3 PM. Metrics look fine. Logs say nothing useful. You add more print statements and redeploy, then wait. Maybe it’s the database. Maybe it’s DNS. Maybe it’s that one intern’s cronjob.

Here’s the thing: the answers are already there — happening in the kernel, in real time — and you just don’t have a window into them. That’s what eBPF gives you. Not a hypothesis. Not another log level to crank up. Actual visibility into what the kernel is doing, right now, with your process, on your machine.

eBPF is the wildest thing to happen to Linux observability in a decade. If you’ve been hearing the acronym at every KubeCon and infrastructure meetup and nodding along politely, this is the post where you stop nodding and start actually using it.


What eBPF Actually Is (No Kernel Hacking Required)

BPF originally stood for Berkeley Packet Filter — a thing from 1992 that let you filter network packets efficiently. Nobody outside networking cared. Then Linux 3.18 (2014) landed “extended BPF” — eBPF — and the scope exploded.

The short version: eBPF lets you write small programs that run inside the Linux kernel, triggered by events. Syscalls, network packets, function calls, disk I/O, scheduler events — all fair game. You write the program, load it into the kernel, and the kernel runs it in a sandboxed VM with a verifier that checks it won’t crash anything.

That verifier is the key detail. This isn’t a loadable kernel module you compile and pray. The kernel verifier statically analyzes your eBPF bytecode before loading it: no infinite loops, no illegal memory access, no crashing the kernel. If the verifier rejects it, it doesn’t load. If it passes, it runs at near-native speed.

The result: you can instrument literally anything the kernel touches — without recompiling the kernel, without rebooting, without a kernel module, and without breaking production.

This is why Netflix, Google, Meta, and Cloudflare have all built substantial observability and networking infrastructure on eBPF. It’s not hype. It genuinely removes an entire category of “we can’t observe this without significant downtime.”


The Hook Points: Where You Can Attach Programs

eBPF programs attach to “hook points” in the kernel. The main ones you’ll care about as a sysadmin or SRE:

Tracepoints — stable, kernel-defined instrumentation points. Things like syscalls:sys_enter_open, net:netif_receive_skb, sched:sched_process_fork. These are stable across kernel versions and are the safest option.

Kprobes / Kretprobes — attach to any kernel function, on entry or return. Powerful but can break if internal kernel function names change between versions. Use tracepoints when you can, kprobes when you need something tracepoints don’t cover.

Uprobes / Uretprobes — same idea but for user-space functions. Attach to a function in a binary (like a Go runtime function or a Postgres internal) without modifying the binary.

XDP (eXpress Data Path) — hook into the network stack at the earliest possible point, before the kernel even allocates an sk_buff. This is how Cloudflare drops DDoS traffic at line rate.

TC (Traffic Control) — hook into kernel traffic control for packet manipulation.

For the use cases in this article — tracing, profiling, observability — you’ll mostly be living in tracepoints, kprobes, and uprobes.


The Toolchain: Pick Your Abstraction Level

The eBPF ecosystem has layers. From “just run this command” to “I am writing production infrastructure”:

bpftrace — Ad-Hoc One-Liners and Scripts

bpftrace is awk for the kernel. It’s a high-level tracing language that compiles down to eBPF. Single-line commands, quick scripts, “what is happening right now” debugging. This is where you’ll spend 90% of your time as an SRE.

BCC (BPF Compiler Collection) — Python Front-Ends

BCC lets you write eBPF programs in C with a Python (or Lua) front-end that loads and controls them. More powerful than bpftrace, more verbose. The bcc-tools package ships a ton of pre-built tools: tcpconnect, execsnoop, biolatency, funclatency, and many more. Great for “I want a real tool, not a one-liner.”

libbpf + CO-RE — Production Programs

For shipping eBPF code in a product, you want CO-RE (Compile Once, Run Everywhere). libbpf handles the low-level loading; CO-RE uses BTF (BPF Type Format) metadata to relocate field offsets at load time so your compiled binary works across different kernel versions. libbpf-go and libbpf-rs wrap this for Go and Rust respectively.

Full Platforms


Installing bpftrace

On a modern distro, this is not hard.

Debian/Ubuntu:

Terminal window
sudo apt update
sudo apt install -y bpftrace bpfcc-tools linux-headers-$(uname -r)

RHEL/Rocky/AlmaLinux 9:

Terminal window
sudo dnf install -y bpftrace bcc-tools kernel-devel

Arch:

Terminal window
sudo pacman -S bpftrace bcc

Check your kernel supports it (you need 4.9+, ideally 5.8+ for full feature support):

Terminal window
uname -r
# Should be 5.x or higher for the good stuff
bpftrace --version

One gotcha: bpftrace needs CAP_BPF or root. In most cases you’ll just run it with sudo. On containers, you need --privileged or a very carefully crafted capability set.


8 bpftrace One-Liners That Will Change Your Day

These run as-is on most modern Linux systems. All require sudo.

1. Every file opened, by process

Terminal window
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'

This hooks every openat syscall and prints the process name and file path. Pipe it through grep to watch a specific service. Find out what config files your app is actually reading (versus what it claims to read).

2. TCP connections being made

Terminal window
sudo bpftrace -e 'kprobe:tcp_connect { printf("%s -> %s\n", comm, ntop(((struct sock *)arg0)->__sk_common.skc_daddr)); }'

Or the cleaner version using tcpconnect from bcc-tools:

Terminal window
sudo /usr/sbin/tcpconnect

Instant visibility into what’s connecting where. Great for “is this service actually talking to the right database?“

3. Syscall latency — slowest calls by process

Terminal window
sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; }
tracepoint:raw_syscalls:sys_exit { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); }
END { print(@ns); }'

Run this for 10 seconds, Ctrl-C. You get a histogram of syscall latency per process. If you’ve got a service taking 50ms+ on individual syscalls, this is where you find it.

4. TCP retransmits — in real time

Terminal window
sudo bpftrace -e 'kprobe:tcp_retransmit_skb { @retransmits[comm] = count(); }'

TCP retransmits are usually invisible until your p99 latency starts hurting. This shows you which processes are retransmitting and how often.

5. Processes being spawned

Terminal window
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%-10s -> %s\n", comm, str(args->filename)); }'

Watch what processes are spawning children. Useful for “what the hell is cron doing” and “is my deploy script actually calling what I think it’s calling.”

6. Disk I/O latency histogram

Terminal window
sudo bpftrace -e '
kprobe:blk_account_io_start { @start[arg0] = nsecs; }
kprobe:blk_account_io_done { @io_ms = hist((nsecs - @start[arg0]) / 1000000); delete(@start[arg0]); }
END { print(@io_ms); }'

Or just use the BCC tool:

Terminal window
sudo /usr/sbin/biolatency

That gives you a histogram of block I/O latency in milliseconds. If your database is slow, check this before blaming the query.

7. OOM kills

Terminal window
sudo bpftrace -e 'kprobe:oom_kill_process { printf("OOM killed: %s (pid %d)\n", comm, pid); }'

OOM kills sometimes don’t make it into your monitoring cleanly. This catches them at the kernel level the moment they happen.

8. Function call frequency in a specific binary (uprobes)

Terminal window
sudo bpftrace -e 'uprobe:/usr/bin/python3:PyEval_EvalFrameEx { @[comm] = count(); }'

Replace the binary path and function name for your use case. This shows how often a specific function is being called. Useful for “is this hot path actually hot?”


Writing a Multi-Line bpftrace Script

One-liners are great but sometimes you want to save something reusable. Create a .bt file:

open_files_slow.bt
#!/usr/bin/env bpftrace
// Track files opened by a process and flag slow opens
tracepoint:syscalls:sys_enter_openat
{
@start[tid] = nsecs;
@fname[tid] = args->filename;
}
tracepoint:syscalls:sys_exit_openat
/@start[tid]/
{
$delta_ms = (nsecs - @start[tid]) / 1000000;
if ($delta_ms > 1) {
printf("SLOW OPEN: %-16s %-8d %dms %s\n",
comm, pid, $delta_ms, str(@fname[tid]));
}
delete(@start[tid]);
delete(@fname[tid]);
}
Terminal window
sudo bpftrace open_files_slow.bt

This prints any file open that took more than 1ms — useful for catching NFS mounts that are misbehaving or slow FUSE filesystems.


The BCC Tools Cheat Sheet

If you installed bcc-tools, you already have a toolkit of production-ready tools. Quick rundown of the ones worth knowing:

ToolWhat it does
execsnoopTrace all new processes system-wide
tcpconnectTrace outbound TCP connections
tcpacceptTrace inbound TCP connections accepted
tcpretransTrace TCP retransmits with addresses
biolatencyBlock I/O latency histogram
funclatencyLatency of any kernel function
profileCPU profiler, outputs flamegraph data
filetopTop file reads/writes by process
opensnoopAll file opens with process info
runqlatScheduler run queue latency

Most of these are /usr/sbin/<toolname> after installing bcc-tools. They’re Python scripts that load eBPF programs — no compilation needed.


eBPF vs. The Old Ways

vs. strace: strace uses ptrace, which stops the process on every syscall. eBPF has near-zero overhead and doesn’t pause your process. Profiling a high-throughput service with strace is a production incident waiting to happen. With bpftrace, you can run on production without flinching.

vs. SystemTap: SystemTap requires kernel debug symbols and compiles kernel modules at runtime. It’s powerful but fragile, slow to iterate, and a pain to deploy. eBPF with CO-RE is verifiably portable.

vs. dtrace: If you’re coming from Solaris or macOS, bpftrace is essentially DTrace for Linux. The syntax is inspired by DTrace’s D language. The concepts map directly.

vs. perf: perf is excellent for CPU profiling and works great alongside eBPF. They’re complementary. Use perf for CPU flamegraphs, eBPF for everything else.


The “Maybe Stop Paying for Datadog” Aside

If you’re running Kubernetes and you’ve got a Datadog APM bill that makes you wince, look at Pixie (open source, CNCF incubating) and Coroot (open source APM on eBPF).

Pixie auto-instruments your entire cluster with eBPF — no sidecars, no code changes, no SDK. It captures HTTP/gRPC/DNS/PostgreSQL/Redis traffic and computes golden signals automatically. The free Community Cloud tier covers most small clusters. If you’d rather self-host, the open-source version deploys to your cluster with a Helm chart.

Coroot does similar things with an APM-style UI on top of eBPF metrics — service maps, latency breakdowns, CPU/memory profiling — all from kernel-level data collection.

Neither of these is a full Datadog replacement if you’re heavily invested in its log management or alerting. But if you’re mostly paying for APM traces and infrastructure metrics? Run the numbers. eBPF-based observability is genuinely mature now, and “we hooked into the kernel instead of instrumenting your code” is a more honest architecture than “please add our SDK to every service.”


Where to Go Next

You’ve got bpftrace installed. You’ve run a few one-liners. Here’s the progression:

  1. Explore the bpftrace reference guideman bpftrace or the GitHub wiki. The built-in variables (comm, pid, tid, nsecs, retval) and map types (@map[], hist(), count()) unlock most of what you’ll want to do.

  2. Browse BCC examples/usr/share/bcc/examples/ or the BCC GitHub repo has hundreds of example programs covering every subsystem.

  3. Brendan Gregg’s blog — If eBPF has a patron saint, it’s Brendan Gregg. His site (brendangregg.com) has the Linux performance tools map, flamegraph methodology, and years of BPF/bpftrace writeups. Bookmark it.

  4. “BPF Performance Tools” (the book) — Gregg wrote the book. Literally. It’s comprehensive and stays practical. Worth it if you’re doing this for work.

  5. Cilium — If you’re running Kubernetes and not using Cilium as your CNI, it’s worth evaluating. eBPF-native networking with built-in observability (Hubble) is genuinely better than the iptables-based alternatives.

The kernel has always known what’s happening. eBPF is finally how you get to ask it.


Your 2 AM self with an unexplained latency spike will appreciate having bpftrace already installed.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Object Storage on a Pi: SeaweedFS Cluster Walkthrough
Next Post
Bind Mounts vs NFS for Container Storage

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts