eBPF for the Curious: Kernel Tracing Without the PhD

Your System Is Lying to You, And eBPF Catches It

You’ve got a service that gets slow every Tuesday at 3 PM. Metrics look fine. Logs say nothing useful. You add more print statements and redeploy, then wait. Maybe it’s the database. Maybe it’s DNS. Maybe it’s that one intern’s cronjob.

The answers are already there, happening in the kernel in real time, and you just don’t have a window into them. That’s what eBPF gives you. Not a hypothesis. Not another log level to crank up. Actual visibility into what the kernel is doing, right now, with your process, on your machine.

eBPF is the wildest thing to happen to Linux observability in a decade. If you’ve been hearing the acronym at every KubeCon and infrastructure meetup and nodding along politely, this is the post where you stop nodding and start actually using it.

What eBPF Actually Is (No Kernel Hacking Required)

BPF originally stood for Berkeley Packet Filter, a thing from 1992 that let you filter network packets efficiently. Nobody outside networking cared. Then Linux 3.18 (2014) landed “extended BPF”, eBPF, and the scope exploded.

The short version: eBPF lets you write small programs that run inside the Linux kernel, triggered by events. Syscalls, network packets, function calls, disk I/O, scheduler events: all fair game. You write the program, load it into the kernel, and the kernel runs it in a sandboxed VM with a verifier that checks it won’t crash anything.

That verifier is the key detail. This isn’t a loadable kernel module you compile and pray. The kernel verifier statically analyzes your eBPF bytecode before loading it: no infinite loops, no illegal memory access, no crashing the kernel. If the verifier rejects it, it doesn’t load. If it passes, it runs at near-native speed.

The result: you can instrument literally anything the kernel touches, without recompiling the kernel, without rebooting, without a kernel module, and without breaking production.

This is why Netflix, Google, Meta, and Cloudflare have all built substantial observability and networking infrastructure on eBPF. It’s not hype. It genuinely removes an entire category of “we can’t observe this without significant downtime.”

The Hook Points: Where You Can Attach Programs

eBPF programs attach to “hook points” in the kernel. The main ones you’ll care about as a sysadmin or SRE:

Tracepoints are stable, kernel-defined instrumentation points. Things like syscalls:sys_enter_open, net:netif_receive_skb, sched:sched_process_fork. These are stable across kernel versions and are the safest option.

Kprobes / Kretprobes attach to any kernel function, on entry or return. Powerful but can break if internal kernel function names change between versions. Use tracepoints when you can, kprobes when you need something tracepoints don’t cover.

Uprobes / Uretprobes do the same for user-space functions. Attach to a function in a binary (like a Go runtime function or a Postgres internal) without modifying the binary.

XDP (eXpress Data Path) hooks into the network stack at the earliest possible point, before the kernel even allocates an sk_buff. This is how Cloudflare drops DDoS traffic at line rate.

TC (Traffic Control) hooks into kernel traffic control for packet manipulation.

For the use cases in this article (tracing, profiling, observability), you’ll mostly be living in tracepoints, kprobes, and uprobes.

The Toolchain: Pick Your Abstraction Level

The eBPF ecosystem has layers. From “just run this command” to “I am writing production infrastructure”:

bpftrace: Ad-Hoc One-Liners and Scripts

bpftrace is awk for the kernel. It’s a high-level tracing language that compiles down to eBPF. Single-line commands, quick scripts, “what is happening right now” debugging. This is where you’ll spend 90% of your time as an SRE.

BCC (BPF Compiler Collection): Python Front-Ends

BCC lets you write eBPF programs in C with a Python (or Lua) front-end that loads and controls them. More powerful than bpftrace, more verbose. The bcc-tools package ships a ton of pre-built tools: tcpconnect, execsnoop, biolatency, funclatency, and many more. Great for “I want a real tool, not a one-liner.”

libbpf + CO-RE: Production Programs

For shipping eBPF code in a product, you want CO-RE (Compile Once, Run Everywhere). libbpf handles the low-level loading; CO-RE uses BTF (BPF Type Format) metadata to relocate field offsets at load time so your compiled binary works across different kernel versions. libbpf-go and libbpf-rs wrap this for Go and Rust respectively.

Full Platforms

Cilium: eBPF-based Kubernetes networking and security. Replaces kube-proxy, enforces NetworkPolicy at wire speed.
Tetragon: runtime security enforcement with eBPF. Block syscalls, kill processes, enforce network policies based on kernel events.
Pixie: auto-instrumentation for Kubernetes apps. No code changes, no agents in your container. eBPF intercepts traffic and infers request/response latency, error rates, etc.
Falco: runtime security observability. Originally kernel module, now eBPF-first. Detects anomalous behavior (shell in container, privilege escalation attempts).
Coroot: eBPF-based APM and infrastructure observability.

Installing bpftrace

On a modern distro, this is not hard.

Debian/Ubuntu:

sudo apt update
sudo apt install -y bpftrace bpfcc-tools linux-headers-$(uname -r)

RHEL/Rocky/AlmaLinux 9:

sudo dnf install -y bpftrace bcc-tools kernel-devel

Arch:

sudo pacman -S bpftrace bcc

Check your kernel supports it (you need 4.9+, ideally 5.8+ for full feature support):

uname -r
# Should be 5.x or higher for the good stuff
bpftrace --version

One gotcha: bpftrace needs CAP_BPF or root. In most cases you’ll just run it with sudo. On containers, you need --privileged or a very carefully crafted capability set.

8 bpftrace One-Liners That Will Change Your Day

These run as-is on most modern Linux systems. All require sudo.

1. Every file opened, by process

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'

This hooks every openat syscall and prints the process name and file path. Pipe it through grep to watch a specific service. Find out what config files your app is actually reading (versus what it claims to read).

2. TCP connections being made

sudo bpftrace -e 'kprobe:tcp_connect { printf("%s -> %s\n", comm, ntop(((struct sock *)arg0)->__sk_common.skc_daddr)); }'

Or the cleaner version using tcpconnect from bcc-tools:

sudo /usr/sbin/tcpconnect

Instant visibility into what’s connecting where. Great for “is this service actually talking to the right database?“

3. Syscall latency: slowest calls by process

sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; }
tracepoint:raw_syscalls:sys_exit  { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); }
END { print(@ns); }'

Run this for 10 seconds, Ctrl-C. You get a histogram of syscall latency per process. If you’ve got a service taking 50ms+ on individual syscalls, this is where you find it.

4. TCP retransmits in real time

sudo bpftrace -e 'kprobe:tcp_retransmit_skb { @retransmits[comm] = count(); }'

TCP retransmits are usually invisible until your p99 latency starts hurting. This shows you which processes are retransmitting and how often.

5. Processes being spawned

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%-10s -> %s\n", comm, str(args->filename)); }'

Watch what processes are spawning children. Useful for “what the hell is cron doing” and “is my deploy script actually calling what I think it’s calling.”

6. Disk I/O latency histogram

sudo bpftrace -e '
tracepoint:block:block_rq_issue    { @start[args->dev, args->sector] = nsecs; }
tracepoint:block:block_rq_complete /@start[args->dev, args->sector]/ {
    @io_ms = hist((nsecs - @start[args->dev, args->sector]) / 1000000);
    delete(@start[args->dev, args->sector]);
}
END { print(@io_ms); }'

(Older guides use kprobe:blk_account_io_start here, but that function got inlined on kernels 5.17+ and the kprobe no longer attaches. The block: tracepoints are stable across versions, so use those.)

Or just use the BCC tool:

sudo /usr/sbin/biolatency

That gives you a histogram of block I/O latency in milliseconds. If your database is slow, check this before blaming the query.

7. OOM kills

sudo bpftrace -e 'kprobe:oom_kill_process { printf("OOM killed: %s (pid %d)\n", comm, pid); }'

OOM kills sometimes don’t make it into your monitoring cleanly. This catches them at the kernel level the moment they happen.

8. Function call frequency in a specific binary (uprobes)

sudo bpftrace -e 'uprobe:/usr/bin/python3:PyEval_EvalFrameEx { @[comm] = count(); }'

Replace the binary path and function name for your use case. This shows how often a specific function is being called. Useful for “is this hot path actually hot?”

Writing a Multi-Line bpftrace Script

One-liners are great but sometimes you want to save something reusable. Create a .bt file:

#!/usr/bin/env bpftrace

// Track files opened by a process and flag slow opens
tracepoint:syscalls:sys_enter_openat
{
    @start[tid] = nsecs;
    @fname[tid] = args->filename;
}

tracepoint:syscalls:sys_exit_openat
/@start[tid]/
{
    $delta_ms = (nsecs - @start[tid]) / 1000000;
    if ($delta_ms > 1) {
        printf("SLOW OPEN: %-16s %-8d %dms %s\n",
               comm, pid, $delta_ms, str(@fname[tid]));
    }
    delete(@start[tid]);
    delete(@fname[tid]);
}

sudo bpftrace open_files_slow.bt

This prints any file open that took more than 1ms, useful for catching NFS mounts that are misbehaving or slow FUSE filesystems.

The BCC Tools Cheat Sheet

If you installed bcc-tools, you already have a toolkit of production-ready tools. Quick rundown of the ones worth knowing:

Tool	What it does
`execsnoop`	Trace all new processes system-wide
`tcpconnect`	Trace outbound TCP connections
`tcpaccept`	Trace inbound TCP connections accepted
`tcpretrans`	Trace TCP retransmits with addresses
`biolatency`	Block I/O latency histogram
`funclatency`	Latency of any kernel function
`profile`	CPU profiler, outputs flamegraph data
`filetop`	Top file reads/writes by process
`opensnoop`	All file opens with process info
`runqlat`	Scheduler run queue latency

Most of these are /usr/sbin/<toolname> after installing bcc-tools. They’re Python scripts that load eBPF programs, no compilation needed.

eBPF vs. The Old Ways

vs. strace: strace uses ptrace, which stops the process on every syscall. eBPF has near-zero overhead and doesn’t pause your process. Profiling a high-throughput service with strace is a production incident waiting to happen. With bpftrace, you can run on production without flinching.

vs. SystemTap: SystemTap requires kernel debug symbols and compiles kernel modules at runtime. It’s powerful but fragile, slow to iterate, and a pain to deploy. eBPF with CO-RE is verifiably portable.

vs. dtrace: If you’re coming from Solaris or macOS, bpftrace is essentially DTrace for Linux. The syntax is inspired by DTrace’s D language. The concepts map directly.

vs. perf: perf is excellent for CPU profiling and works great alongside eBPF. They’re complementary. Use perf for CPU flamegraphs, eBPF for everything else.

The “Maybe Stop Paying for Datadog” Aside

If you’re running Kubernetes and you’ve got a Datadog APM bill that makes you wince, look at Pixie (open source, CNCF sandbox) and Coroot (open source APM on eBPF).

Pixie auto-instruments your entire cluster with eBPF, no sidecars, no code changes, no SDK. It captures HTTP/gRPC/DNS/PostgreSQL/Redis traffic and computes golden signals automatically. The free Community Cloud tier covers most small clusters. If you’d rather self-host, the open-source version deploys to your cluster with a Helm chart.

Coroot does similar things with an APM-style UI on top of eBPF metrics, showing service maps, latency breakdowns, CPU/memory profiling, all from kernel-level data collection.

Neither of these is a full Datadog replacement if you’re heavily invested in its log management or alerting. But if you’re mostly paying for APM traces and infrastructure metrics? Run the numbers. eBPF-based observability is genuinely mature now, and “we hooked into the kernel instead of instrumenting your code” is a more honest architecture than “please add our SDK to every service.”

Where to Go Next

You’ve got bpftrace installed. You’ve run a few one-liners. Here’s the progression:

Explore the bpftrace reference guide: man bpftrace or the GitHub wiki. The built-in variables (comm, pid, tid, nsecs, retval) and map types (@map[], hist(), count()) unlock most of what you’ll want to do.
Browse BCC examples: /usr/share/bcc/examples/ or the BCC GitHub repo has hundreds of example programs covering every subsystem.
Brendan Gregg’s blog: If eBPF has a patron saint, it’s Brendan Gregg. His site (brendangregg.com) has the Linux performance tools map, flamegraph methodology, and years of BPF/bpftrace writeups. Bookmark it.
“BPF Performance Tools” (the book): Gregg wrote the book. Literally. It’s thorough and stays practical. Worth it if you’re doing this for work.
Cilium: If you’re running Kubernetes and not using Cilium as your CNI, it’s worth evaluating. eBPF-native networking with built-in observability (Hubble) is genuinely better than the iptables-based alternatives.

The kernel has always known what’s happening. eBPF is finally how you get to ask it.

Your 2 AM self with an unexplained latency spike will appreciate having bpftrace already installed.