DevOpsJune 22, 20264 min read

Linux performance: Resolving Docker I/O Bottlenecks with eBPF

Linux performance drops when Docker I/O bottlenecks hit your disk. Learn to use iostat and eBPF to pinpoint storage latency and fix your container lag.

LinuxDockereBPFPerformanceTroubleshootingStorageDevOpsCI/CD

Last month, our primary database container started hanging intermittently during peak traffic. The application logs showed timeouts, but the CPU usage looked perfectly healthy, leaving me to hunt for the classic "invisible" culprit: storage latency.

When you're running production workloads in Docker, it's easy to assume the application code is the problem. But I’ve learned the hard way that disk I/O wait is often the silent killer of container responsiveness. If your host is struggling with disk contention, your containers will feel sluggish even if they're idle.

Getting Started with iostat

The first tool I always reach for is iostat. It’s part of the sysstat package and provides the baseline I need to see if the physical disk is actually screaming for mercy.

Run this command to get a real-time view of your block devices:


Bash
iostat -xz 1

I look specifically at the %util column. If it’s hovering near 100%, you’ve found your bottleneck. However, %util can be misleading on SSDs or NVMe drives because those devices handle multiple commands in parallel. A better metric to watch is await, which tells you the average time (in milliseconds) for I/O requests to be serviced. If your await is consistently above 10-20ms, your users are definitely feeling the lag.

Digging Deeper with eBPF

While iostat gives you the host-wide view, it’s blind to which specific container is generating the heavy load. This is where Linux performance tuning reaches a new level of precision. I use biolatency from the BCC (BPF Compiler Collection) tools to map I/O latency to specific processes.

eBPF allows us to trace kernel functions without modifying the application code or restarting the container. It’s incredibly efficient for production debugging. When you run biolatency, it creates a histogram of disk I/O latency:


Bash
# Trace block I/O latency for 10 seconds
/usr/share/bcc/tools/biolatency -D 10

The output will show you a distribution of latency buckets. If you see a tail latency—requests taking 100ms or more—you know you have a contention issue. If you need to map this back to Docker, you can use biosnoop, which shows you exactly which PID is issuing the I/O request. Since Docker containers share the host kernel, the PID you see in biosnoop is the PID on the host. You can map that back to a container using docker inspect or ps -ef.

Why Your Docker I/O Bottlenecks Persist

We once spent about two days chasing "slow database writes" that turned out to be a noisy neighbor container performing backups on the same physical disk. We first tried increasing the database memory buffer, but that didn't stop the storage latency spikes because the underlying block device was simply saturated.

When you're diagnosing Docker I/O issues, keep these three things in mind:

The Filesystem Driver: Overlay2 is the default, but it can be slow for write-heavy workloads. If you're doing heavy logging or database writes, move that data to a dedicated volume or a bind mount that uses a faster driver like ext4 or xfs.
I/O Weight: You can throttle containers using --blkio-weight. If one container is hogging the disk, setting a lower weight for that container can prevent it from starving your critical services.
Log Rotation: Never underestimate how much I/O a runaway log file can generate. If your stderr is redirected to a file, ensure logrotate is active and aggressive.

Beyond Storage: The Larger Performance Picture

It’s worth noting that storage is just one piece of the puzzle. When you're optimizing your stack, remember that networking often has its own set of ghosts. If you've fixed your disk latency but the app is still slow, you might want to look into Docker networking latency: Debugging with eBPF and tcpretrans to ensure you aren't dealing with silent packet loss.

Similarly, if you're seeing high system CPU usage alongside I/O wait, check your entropy levels. I’ve written before about Linux performance: Managing Entropy Issues in Docker Containers because crypto operations can lock up your processes while they wait for random numbers.

Summary

Debugging storage latency requires a shift from "guessing" to "tracing." Start with iostat to verify the host level, then jump into eBPF tools like biolatency to find the specific process responsible.

I’m still experimenting with using bcc tools in automated performance dashboards to trigger alerts before users notice the slowdown. It’s a lot of work to set up, but having the data ready before the on-call alert fires is worth every hour spent in the terminal.

Frequently Asked Questions

Q: Does eBPF add significant overhead to my production container? A: Generally, no. eBPF programs are JIT-compiled and run inside the kernel. The overhead is negligible for most workloads, usually less than 1-2% CPU.

Q: Can I use biolatency on any Linux kernel? A: You’ll need a kernel version 4.x or higher, though 5.x is recommended for better visibility. Most modern distributions (Ubuntu 20.04+, Debian 11+) support this out of the box.

Q: Is there an easy way to see which container is doing the most I/O? A: Use docker stats for a quick overview. For a more detailed, per-process view, use iotop or the BCC biotop tool.

Back to Blog

Linux performance: Resolving Docker I/O Bottlenecks with eBPF

Getting Started with iostat

Digging Deeper with eBPF

Why Your Docker I/O Bottlenecks Persist

Beyond Storage: The Larger Performance Picture

Summary

Frequently Asked Questions

Similar Posts

eBPF-based socket monitoring: Tracking latency in Docker containers

Docker networking latency: Debugging with eBPF and tcpretrans

Linux Performance Tuning: Managing Swap and OOM for Docker VPS