DevOpsJune 24, 20264 min read

eBPF Linux Networking: Debugging Docker Latency with tcptracer

eBPF and Linux networking tools like tcptracer are essential for debugging Docker latency. Learn how to pinpoint container network bottlenecks in real-time.

eBPFDockerLinuxNetworkingObservabilityDebuggingDevOpsCI/CD

Last month, we had a microservice reporting intermittent 300ms spikes in request handling. The container logs showed nothing, and our standard metrics—CPU and memory—looked perfectly healthy. We finally caught the culprit by dropping down to the host level using eBPF to trace the actual socket lifecycle.

When you're dealing with containerized applications, the abstraction layer of Docker networking often hides the messy reality of the Linux kernel. We first tried adding custom logging to the application code, but that just introduced more jitter and didn't show us what was happening in the TCP stack. That’s when we turned to tcptracer, a powerful tool in the BCC (BPF Compiler Collection) suite.

Why eBPF for Linux Networking Observability

Standard tools like netstat or ss provide a snapshot of connections, but they're blind to the transients that cause latency spikes. If a connection is established and dropped in a few milliseconds, you'll never see it.

eBPF (extended Berkeley Packet Filter) changes the game by allowing us to hook into kernel functions triggered by network events. Because it executes within the kernel, it’s incredibly efficient. It doesn't require us to restart containers or inject sidecars, which is exactly what makes it superior to traditional packet capture methods when you're already fighting production performance issues.

If you’ve already explored eBPF-based socket monitoring: Tracking latency in Docker containers, you know that socket-level data is where the truth lives. tcptracer takes that a step further by tracing the connect(), accept(), and close() syscalls, giving us a clear timeline of how long connections take to establish and how long they persist.

Getting Started with tcptracer

tcptracer is part of the bcc-tools package on most modern Linux distributions. On an Ubuntu 22.04 host, you can install it via:


Bash
sudo apt-get install bpfcc-tools

Once installed, running it is straightforward. I usually filter by the specific port my application uses to keep the noise down:


Bash
sudo /usr/sbin/tcptracer -p 8080

This will output every TCP event related to processes on that port. Here is a typical output format:


TEXT
PID    COMM         IP SADDR            DADDR            DPORT
1245   my-app       4  172.17.0.3       10.0.0.5         443

Identifying Latency Spikes

To find those elusive latency spikes, I look for the delta between the connection initiation and the handshake completion. If you're seeing a gap, it often points to a congested NAT bridge or a resource-starved host kernel.

We once spent about two days chasing a performance issue that turned out to be related to the Docker bridge's conntrack table hitting its limit. While Docker networking latency: Debugging with eBPF and tcpretrans is a common starting point for packet loss, tcptracer helped us see the connection failures before the retransmits even started.

If you aren't sure where to start your investigation, consider these steps:

Isolate the PID: Use docker inspect to find the process ID on the host if you need to trace a specific container.
Monitor the handshake: Look for connections that stay in SYN_SENT for more than a few milliseconds.
Correlate with host load: Compare the timestamps of the latency spikes with iostat or vmstat to see if the host is swapping or hitting disk I/O limits.

Common Pitfalls

Don't assume eBPF is a magic wand. If your kernel version is too old (anything pre-4.14 is usually a struggle), you'll run into issues with missing helper functions. We also found that running too many BPF programs simultaneously can increase CPU overhead, though it's usually negligible compared to the overhead of heavy logging frameworks.

Also, remember that tcptracer shows you the kernel perspective. If your application code has a bug that causes it to wait on a lock before it even initiates the connect() call, tcptracer won't show that as network latency. It will look like a "fast" connection that just happens to start late. For those cases, you’d need to look into eBPF-based network traffic inspection for Docker containers to correlate application-level events with network events.

FAQ: Frequently Asked Questions

Does running tcptracer impact production performance? The impact is minimal. eBPF programs are verified for safety before execution and run directly in the kernel, avoiding context switches between user and kernel space.

Can I use this on non-Docker systems? Absolutely. tcptracer works on any Linux host, regardless of whether you're using containers, VMs, or bare metal.

What kernel version do I need? I recommend at least Linux 4.15 or newer. Older kernels lack some of the advanced features that make modern eBPF tools stable and performant.

Final Thoughts

The best way to learn this is to run it on a staging environment and simulate some load. I’m still experimenting with using bpftrace to create custom one-liners that combine tcptracer data with kprobes for even deeper visibility. It’s not always the cleanest debugging process, but when you’re staring at a 300ms latency spike that no one else can explain, having host-level visibility makes all the difference. Next time, I think I'll write a script to automate the correlation between these socket events and the container logs—manual correlation is getting old.

Back to Blog

eBPF Linux Networking: Debugging Docker Latency with tcptracer

Why eBPF for Linux Networking Observability

Getting Started with tcptracer

Identifying Latency Spikes

Common Pitfalls

FAQ: Frequently Asked Questions

Final Thoughts

Similar Posts

eBPF-based Network Traffic Inspection for Docker Containers

eBPF-based socket monitoring: Tracking latency in Docker containers

Docker networking latency: Debugging with eBPF and tcpretrans