DevOpsJune 24, 20263 min read

Linux Performance: Debugging Docker Storage Latency with iotop and blktrace

Linux performance issues often hide in storage. Learn to debug Docker storage latency using iotop and blktrace to isolate I/O stalls before they crash your app.

LinuxDockerStoragePerformanceDebuggingDevOpsCI/CD

When my database container started hanging for about 400ms under moderate load, I assumed it was a standard resource constraint. It turned out to be a classic case of I/O wait propagation, where a single noisy neighbor saturated the host's block device queue.

We’ve all been there: the application metrics show a spike, the CPU usage is low, but the service is unresponsive. If you’re running production workloads, you’ve likely encountered Linux performance: Resolving Docker I/O Bottlenecks with eBPF at some point. While eBPF is powerful, sometimes you need to start with the basics to verify if your Linux performance is being throttled by the storage subsystem.

The First Line of Defense: `iotop`

Before jumping into complex kernel tracing, I always start with iotop. It’s the easiest way to see which process is actually hammering the disk. If you see high IO percentages in iotop for a specific docker-proxy or containerd process, you’ve found your culprit.

Run it with the -o (only) flag to filter out idle processes:


Bash
sudo iotop -o -P

I once wasted two hours debugging a "slow" disk, only to realize a backup script running in a sidecar container was saturating the write throughput. If iotop doesn't show a clear winner, don't panic. Sometimes the latency isn't coming from a single process, but from the way the host manages Docker storage requests across multiple mounts.

Digging Deeper with `blktrace`

If iotop gives you a clean bill of health but your application still reports high I/O latency, you need to look at the block layer. This is where blktrace comes in. It captures everything happening between the I/O scheduler and the physical disk.

First, identify your block device:


Bash
lsblk

Once you know your device (e.g., /dev/nvme0n1), start the trace:


Bash
sudo blktrace -d /dev/nvme0n1 -o - | blkparse -i -

This will dump a massive amount of data. I usually pipe the output to a file and look for "D" (issue) and "C" (complete) events. The time difference between the "D" and "C" events is your actual latency. If you see this gap growing, you aren't just looking at system debugging—you're looking at a queue depth issue or a hardware bottleneck.

Why Initial Attempts Often Fail

We once tried to fix a storage stall by simply increasing the disk throughput limits in the cloud provider’s dashboard. It didn't work. The issue wasn't bandwidth; it was the IOPS limit on the volume.

Before you make major infrastructure changes, check your block devices for excessive wait times. If you haven't yet, you might want to look into Docker I/O throttling: Control container performance with Cgroup v2 to set hard limits on problematic containers. It’s often better to throttle a noisy container than to let it choke the entire host.

FAQ: Common Storage Latency Questions

Q: Is iotop enough to diagnose all storage issues? A: Not really. iotop shows you which process is requesting I/O, but it can't show you if the kernel or the storage controller is holding that I/O back. That's why you need blktrace for deep system debugging.

Q: Does blktrace add significant overhead to my host? A: Yes, it can be heavy. Use it only when you have a clear hypothesis. Don't run it in production for extended periods unless you are prepared for a slight performance impact.

Q: My latency spikes only happen during backups. What should I do? A: This is a classic Linux performance bottleneck. Use cgroups to limit the I/O weight of your backup process so it yields to your primary application during peak hours.

Final Thoughts

Debugging storage is rarely a straight line. I’ve found that most latency issues in Docker are caused by misconfigured I/O schedulers or simply exceeding the IOPS capacity of the underlying disk.

Next time you see a stall, don't guess. Capture the data with blktrace, correlate it with your application logs, and look for those gaps between issue and completion. I’m still experimenting with newer io_uring monitoring tools to see if they provide better visibility than the legacy block-layer tools, but for now, the basics usually get the job done.

Back to Blog

Linux Performance: Debugging Docker Storage Latency with iotop and blktrace

The First Line of Defense: `iotop`

Digging Deeper with `blktrace`

Why Initial Attempts Often Fail

FAQ: Common Storage Latency Questions

Final Thoughts

Similar Posts

Linux Performance: Debugging CPU Stalls in Docker with perf

Linux performance: Resolving Docker I/O Bottlenecks with eBPF

Docker networking latency: Debugging with eBPF and tcpretrans

The First Line of Defense: iotop

Digging Deeper with blktrace

Why Initial Attempts Often Fail

FAQ: Common Storage Latency Questions

Final Thoughts

Similar Posts

Linux Performance: Debugging CPU Stalls in Docker with perf

Linux performance: Resolving Docker I/O Bottlenecks with eBPF

Docker networking latency: Debugging with eBPF and tcpretrans

The First Line of Defense: `iotop`

Digging Deeper with `blktrace`