Linux performance drops when Docker I/O bottlenecks hit your disk. Learn to use iostat and eBPF to pinpoint storage latency and fix your container lag.
Last month, our primary database container started hanging intermittently during peak traffic. The application logs showed timeouts, but the CPU usage looked perfectly healthy, leaving me to hunt for the classic "invisible" culprit: storage latency.
When you're running production workloads in Docker, it's easy to assume the application code is the problem. But I’ve learned the hard way that disk I/O wait is often the silent killer of container responsiveness. If your host is struggling with disk contention, your containers will feel sluggish even if they're idle.
The first tool I always reach for is iostat. It’s part of the sysstat package and provides the baseline I need to see if the physical disk is actually screaming for mercy.
Run this command to get a real-time view of your block devices:
Bashiostat -xz 1
I look specifically at the %util column. If it’s hovering near 100%, you’ve found your bottleneck. However, %util can be misleading on SSDs or NVMe drives because those devices handle multiple commands in parallel. A better metric to watch is await, which tells you the average time (in milliseconds) for I/O requests to be serviced. If your await is consistently above 10-20ms, your users are definitely feeling the lag.
While iostat gives you the host-wide view, it’s blind to which specific container is generating the heavy load. This is where Linux performance tuning reaches a new level of precision. I use biolatency from the BCC (BPF Compiler Collection) tools to map I/O latency to specific processes.
eBPF allows us to trace kernel functions without modifying the application code or restarting the container. It’s incredibly efficient for production debugging. When you run biolatency, it creates a histogram of disk I/O latency:
Bash# Trace block I/O latency for 10 seconds /usr/share/bcc/tools/biolatency -D 10
The output will show you a distribution of latency buckets. If you see a tail latency—requests taking 100ms or more—you know you have a contention issue. If you need to map this back to Docker, you can use biosnoop, which shows you exactly which PID is issuing the I/O request. Since Docker containers share the host kernel, the PID you see in biosnoop is the PID on the host. You can map that back to a container using docker inspect or ps -ef.
We once spent about two days chasing "slow database writes" that turned out to be a noisy neighbor container performing backups on the same physical disk. We first tried increasing the database memory buffer, but that didn't stop the storage latency spikes because the underlying block device was simply saturated.
When you're diagnosing Docker I/O issues, keep these three things in mind:
ext4 or xfs.--blkio-weight. If one container is hogging the disk, setting a lower weight for that container can prevent it from starving your critical services.stderr is redirected to a file, ensure logrotate is active and aggressive.It’s worth noting that storage is just one piece of the puzzle. When you're optimizing your stack, remember that networking often has its own set of ghosts. If you've fixed your disk latency but the app is still slow, you might want to look into Docker networking latency: Debugging with eBPF and tcpretrans to ensure you aren't dealing with silent packet loss.
Similarly, if you're seeing high system CPU usage alongside I/O wait, check your entropy levels. I’ve written before about Linux performance: Managing Entropy Issues in Docker Containers because crypto operations can lock up your processes while they wait for random numbers.
Debugging storage latency requires a shift from "guessing" to "tracing." Start with iostat to verify the host level, then jump into eBPF tools like biolatency to find the specific process responsible.
I’m still experimenting with using bcc tools in automated performance dashboards to trigger alerts before users notice the slowdown. It’s a lot of work to set up, but having the data ready before the on-call alert fires is worth every hour spent in the terminal.
Q: Does eBPF add significant overhead to my production container? A: Generally, no. eBPF programs are JIT-compiled and run inside the kernel. The overhead is negligible for most workloads, usually less than 1-2% CPU.
Q: Can I use biolatency on any Linux kernel? A: You’ll need a kernel version 4.x or higher, though 5.x is recommended for better visibility. Most modern distributions (Ubuntu 20.04+, Debian 11+) support this out of the box.
Q: Is there an easy way to see which container is doing the most I/O?
A: Use docker stats for a quick overview. For a more detailed, per-process view, use iotop or the BCC biotop tool.
eBPF-based socket monitoring lets you track network latency inside Docker containers. Learn how to pinpoint bottlenecks without adding overhead.
Read moreDocker networking latency can kill your performance. Learn how to use eBPF and tcpretrans to find silent packet loss on your high-traffic VPS.