eBPF-based socket monitoring lets you track network latency inside Docker containers. Learn how to pinpoint bottlenecks without adding overhead.
Last month, a microservice in our staging environment started reporting intermittent "hiccups." The metrics showed a standard 30ms response time, but every few minutes, we’d see a spike to 800ms. Standard tools like netstat and ss were useless; they gave us a snapshot, but not the context of what was happening during those specific, fleeting spikes. That’s when I turned to eBPF to get the granular visibility I needed.
If you’ve dealt with Docker networking latency: Debugging with eBPF and tcpretrans, you know that standard Linux tools often fail to correlate packet events with specific container processes. When you're running deep in production, you can't afford the overhead of heavy packet captures. eBPF changes the game by allowing us to hook directly into the kernel’s networking stack without modifying the application code.
We initially tried using standard logging and application-level tracing. It gave us a high-level view, but it didn't tell us if the delay was in the application code, the Docker bridge, or the host's TCP stack. We were flying blind regarding the kernel's behavior.
We considered Linux Kernel Tuning: Fixing Socket Exhaustion in Docker Proxies, but after checking our connection counts, we realized our issue wasn't exhaustion—it was latency jitter. We needed to see how long each socket spent in the TCP_ESTABLISHED state versus how long it spent waiting for a buffer.
To track latency, we need to hook into the tcp_rcv_established and tcp_sendmsg kernel functions. By calculating the time delta between these events on a per-socket basis, we can identify exactly where the latency is being introduced.
Here is a simplified logic flow for a BCC (BPF Compiler Collection) script to trace this:
Using BCC, the command to start tracing is straightforward:
Bash# Tracing TCP latency for a specific process ID /usr/share/bcc/tools/tcplatency -p <PID>
However, running this inside a container requires the host's kernel headers and specific capabilities. You’ll need to run your monitoring container with --privileged or at least provide CAP_BPF and CAP_PERFMON on modern kernels (5.8+).
When you're running multiple services on a single host, Linux observability becomes a nightmare of overlapping namespaces. If you rely on host-level tcpdump, you’re going to get overwhelmed by noise from other containers.
By using eBPF, we can tag our data with the container ID. This allows us to correlate network performance with specific deployments. I’ve found that even if you’re not Implementing Zero-Trust Network Policies with Cilium and Hubble, just having the ability to see per-container socket latency is worth the setup time.
The biggest "gotcha" with eBPF is the learning curve. I spent about two days just getting the kernel headers to match the running kernel version inside our Alpine-based Docker images. Don't be like me—just mount the host's /usr/src directory into your monitoring container.
Also, be careful with the volume of events. If you hook into every packet, you’ll burn CPU cycles. Always use BPF maps to aggregate data in-kernel before sending it to user-space. Only push the summary metrics up to your dashboard; don't try to log every single packet's latency unless you want to crash your observability backend.
eBPF has completely changed how I approach network performance debugging. It’s no longer a guessing game of "is it the code or the network?" You can see the kernel's perspective in real-time.
Next time, I want to experiment with kretprobes to track the time spent in tcp_v4_do_rcv more accurately, as I suspect some of our jitter is coming from interrupt coalescing on the host NIC. It’s a rabbit hole, but for production systems, there’s no better way to get the truth.
Docker networking latency can kill your performance. Learn how to use eBPF and tcpretrans to find silent packet loss on your high-traffic VPS.
Read moreLinux kernel tuning is essential when your Docker proxies hit socket exhaustion. Learn how to optimize your TCP stack to handle high-concurrency traffic.