DevOpsJune 22, 20263 min read

eBPF-based socket monitoring: Tracking latency in Docker containers

eBPF-based socket monitoring lets you track network latency inside Docker containers. Learn how to pinpoint bottlenecks without adding overhead.

eBPFDockerLinuxNetworkingObservabilityPerformanceDevOpsCI/CD

Last month, a microservice in our staging environment started reporting intermittent "hiccups." The metrics showed a standard 30ms response time, but every few minutes, we’d see a spike to 800ms. Standard tools like netstat and ss were useless; they gave us a snapshot, but not the context of what was happening during those specific, fleeting spikes. That’s when I turned to eBPF to get the granular visibility I needed.

If you’ve dealt with Docker networking latency: Debugging with eBPF and tcpretrans, you know that standard Linux tools often fail to correlate packet events with specific container processes. When you're running deep in production, you can't afford the overhead of heavy packet captures. eBPF changes the game by allowing us to hook directly into the kernel’s networking stack without modifying the application code.

The Problem with Traditional Observability

We initially tried using standard logging and application-level tracing. It gave us a high-level view, but it didn't tell us if the delay was in the application code, the Docker bridge, or the host's TCP stack. We were flying blind regarding the kernel's behavior.

We considered Linux Kernel Tuning: Fixing Socket Exhaustion in Docker Proxies, but after checking our connection counts, we realized our issue wasn't exhaustion—it was latency jitter. We needed to see how long each socket spent in the TCP_ESTABLISHED state versus how long it spent waiting for a buffer.

Getting Started with eBPF-based Socket Monitoring

To track latency, we need to hook into the tcp_rcv_established and tcp_sendmsg kernel functions. By calculating the time delta between these events on a per-socket basis, we can identify exactly where the latency is being introduced.

Here is a simplified logic flow for a BCC (BPF Compiler Collection) script to trace this:

Attach kprobes: Hook into the entry and exit points of socket-related functions.
Filter by PID: Since Docker containers share the host kernel, we filter events based on the process namespace or the cgroup ID associated with the container.
Map data: Store the timestamp of the request start in a BPF map keyed by the socket structure.
Calculate latency: When the response hits the kernel, subtract the start timestamp and output the result.

Using BCC, the command to start tracing is straightforward:


Bash
# Tracing TCP latency for a specific process ID
/usr/share/bcc/tools/tcplatency -p <PID>

However, running this inside a container requires the host's kernel headers and specific capabilities. You’ll need to run your monitoring container with --privileged or at least provide CAP_BPF and CAP_PERFMON on modern kernels (5.8+).

Why Process-Level Tracing Matters

When you're running multiple services on a single host, Linux observability becomes a nightmare of overlapping namespaces. If you rely on host-level tcpdump, you’re going to get overwhelmed by noise from other containers.

By using eBPF, we can tag our data with the container ID. This allows us to correlate network performance with specific deployments. I’ve found that even if you’re not Implementing Zero-Trust Network Policies with Cilium and Hubble, just having the ability to see per-container socket latency is worth the setup time.

Lessons Learned and Trade-offs

The biggest "gotcha" with eBPF is the learning curve. I spent about two days just getting the kernel headers to match the running kernel version inside our Alpine-based Docker images. Don't be like me—just mount the host's /usr/src directory into your monitoring container.

Also, be careful with the volume of events. If you hook into every packet, you’ll burn CPU cycles. Always use BPF maps to aggregate data in-kernel before sending it to user-space. Only push the summary metrics up to your dashboard; don't try to log every single packet's latency unless you want to crash your observability backend.

Final Thoughts

eBPF has completely changed how I approach network performance debugging. It’s no longer a guessing game of "is it the code or the network?" You can see the kernel's perspective in real-time.

Next time, I want to experiment with kretprobes to track the time spent in tcp_v4_do_rcv more accurately, as I suspect some of our jitter is coming from interrupt coalescing on the host NIC. It’s a rabbit hole, but for production systems, there’s no better way to get the truth.

Back to Blog

eBPF-based socket monitoring: Tracking latency in Docker containers

The Problem with Traditional Observability

Getting Started with eBPF-based Socket Monitoring

Why Process-Level Tracing Matters

Lessons Learned and Trade-offs

Final Thoughts

Similar Posts

Docker networking latency: Debugging with eBPF and tcpretrans

Linux Kernel Tuning: Fixing Socket Exhaustion in Docker Proxies

Linux Performance Tuning: Managing Swap and OOM for Docker VPS