Linux performance issues often hide in storage. Learn to debug Docker storage latency using iotop and blktrace to isolate I/O stalls before they crash your app.
When my database container started hanging for about 400ms under moderate load, I assumed it was a standard resource constraint. It turned out to be a classic case of I/O wait propagation, where a single noisy neighbor saturated the host's block device queue.
We’ve all been there: the application metrics show a spike, the CPU usage is low, but the service is unresponsive. If you’re running production workloads, you’ve likely encountered Linux performance: Resolving Docker I/O Bottlenecks with eBPF at some point. While eBPF is powerful, sometimes you need to start with the basics to verify if your Linux performance is being throttled by the storage subsystem.
iotopBefore jumping into complex kernel tracing, I always start with iotop. It’s the easiest way to see which process is actually hammering the disk. If you see high IO percentages in iotop for a specific docker-proxy or containerd process, you’ve found your culprit.
Run it with the -o (only) flag to filter out idle processes:
Bashsudo iotop -o -P
I once wasted two hours debugging a "slow" disk, only to realize a backup script running in a sidecar container was saturating the write throughput. If iotop doesn't show a clear winner, don't panic. Sometimes the latency isn't coming from a single process, but from the way the host manages Docker storage requests across multiple mounts.
blktraceIf iotop gives you a clean bill of health but your application still reports high I/O latency, you need to look at the block layer. This is where blktrace comes in. It captures everything happening between the I/O scheduler and the physical disk.
First, identify your block device:
Bashlsblk
Once you know your device (e.g., /dev/nvme0n1), start the trace:
Bashsudo blktrace -d /dev/nvme0n1 -o - | blkparse -i -
This will dump a massive amount of data. I usually pipe the output to a file and look for "D" (issue) and "C" (complete) events. The time difference between the "D" and "C" events is your actual latency. If you see this gap growing, you aren't just looking at system debugging—you're looking at a queue depth issue or a hardware bottleneck.
We once tried to fix a storage stall by simply increasing the disk throughput limits in the cloud provider’s dashboard. It didn't work. The issue wasn't bandwidth; it was the IOPS limit on the volume.
Before you make major infrastructure changes, check your block devices for excessive wait times. If you haven't yet, you might want to look into Docker I/O throttling: Control container performance with Cgroup v2 to set hard limits on problematic containers. It’s often better to throttle a noisy container than to let it choke the entire host.
Q: Is iotop enough to diagnose all storage issues?
A: Not really. iotop shows you which process is requesting I/O, but it can't show you if the kernel or the storage controller is holding that I/O back. That's why you need blktrace for deep system debugging.
Q: Does blktrace add significant overhead to my host?
A: Yes, it can be heavy. Use it only when you have a clear hypothesis. Don't run it in production for extended periods unless you are prepared for a slight performance impact.
Q: My latency spikes only happen during backups. What should I do?
A: This is a classic Linux performance bottleneck. Use cgroups to limit the I/O weight of your backup process so it yields to your primary application during peak hours.
Debugging storage is rarely a straight line. I’ve found that most latency issues in Docker are caused by misconfigured I/O schedulers or simply exceeding the IOPS capacity of the underlying disk.
Next time you see a stall, don't guess. Capture the data with blktrace, correlate it with your application logs, and look for those gaps between issue and completion. I’m still experimenting with newer io_uring monitoring tools to see if they provide better visibility than the legacy block-layer tools, but for now, the basics usually get the job done.
Linux performance issues in Docker can be elusive. Learn how to use perf to profile CPU stalls and solve resource contention in your containers.
Read moreLinux performance drops when Docker I/O bottlenecks hit your disk. Learn to use iostat and eBPF to pinpoint storage latency and fix your container lag.