Master Docker I/O throttling using Cgroup v2. Learn how to prevent noisy containers from crashing your host disk performance with practical, hands-on steps.
I remember the first time a background log-processing container decided to swallow all available disk bandwidth on a production VPS. The host system became unresponsive, SSH sessions lagged for roughly 4 seconds per keystroke, and our primary web application went down because it couldn't write its own logs. It’s a classic "noisy neighbor" problem, and it’s surprisingly easy to fix once you stop treating Docker as a black box and start looking at the kernel primitives underneath.
If you’ve already spent time managing swap and OOM for Docker VPS, you know that resource contention is the silent killer of uptime. Today, we’re going to look at the next step: controlling disk I/O at the cgroup level.
In the past, we used blkio controllers in Cgroup v1, which were often clunky and relied on fixed throughput limits (like bps or iops). That approach is flawed because disk performance isn't static. A drive might handle 500 MB/s when idle, but crawl when under heavy random read/write pressure.
Cgroup v2 changed the game by introducing latency-based I/O control. Instead of guessing how much bandwidth a container needs, you set a latency target. If the kernel detects that the disk is struggling to meet that target due to a specific container, it throttles that container’s I/O until the latency returns to an acceptable level.
To start, ensure your host is running a modern kernel (5.2+ is recommended for stable Cgroup v2 support). You can verify if you're using Cgroup v2 by checking the mount point:
Bashmount | grep cgroup # Should show: cgroup2 on /sys/fs/cgroup type cgroup2
When we first tried to limit I/O, we used the standard Docker --device-write-bps flag. It worked, but it was too rigid; it restricted the container even when the disk was idle. Now, we prefer using the --io-max options, which allow for more dynamic control.
You can apply these limits directly to your docker run command. Let's say you have a data-heavy workload that you want to keep from impacting your host system:
Bashdocker run -d \ --name my-heavy-worker \ --io-max-bandwidth 50mb \ --io-max-iops 1000 \ my-app:latest
This sets a hard ceiling. However, if you want to use the more sophisticated latency-based controls provided by the kernel, you need to interface with the cgroup files directly or use a tool that supports io.latency.
Inside the cgroup directory structure (usually /sys/fs/cgroup/system.slice/docker-<container-id>.scope/), you'll find io.latency. You can instruct the kernel to maintain a specific latency target:
Bash# Set a 10ms latency target for the disk (major:minor 8:0) echo "8:0 target=10000" > /sys/fs/cgroup/system.slice/docker-<container-id>.scope/io.latency
If the disk takes longer than 10ms to complete requests for that container, the kernel will start throttling its I/O requests. This is essentially performance tuning at the hardware-request level.
I’ve seen many developers try to solve disk contention by moving logs to a different mount or scaling up the VPS. While that helps, it doesn't fix the underlying issue of resource isolation. If a container has a bug—like a runaway recursive file search or an unoptimized database dump—it will find a way to saturate the disk.
If you are already debugging network latency using eBPF, you should treat disk I/O with the same scrutiny. Use iostat -xz 1 on the host to watch the %util column. If you see it hitting 100% while your application performance (like API response times) starts to spike, you have a contention issue.
I should mention that these settings aren't a silver bullet. If you throttle a container too aggressively, you might introduce "I/O wait" bottlenecks that make the container appear to hang. Always monitor your container's exit codes and internal logs when applying these limits.
I’m still experimenting with how these cgroup v2 settings interact with various storage drivers like overlay2. Sometimes the translation between the container's virtual filesystem and the host's physical disk can mask the latency metrics the kernel is trying to track. If you’re running high-traffic databases inside Docker, you’re usually better off mounting a dedicated volume and applying the limits to that specific mount point rather than the entire container root.
Ultimately, your goal shouldn't be to restrict performance, but to ensure that one rogue container can't bring down your entire production environment. Start with loose limits, monitor your disk wait times, and tighten the screws only when you have the metrics to justify it.
Linux performance can tank when Docker containers starve for entropy. Learn why your crypto operations are hanging and how to fix it with haveged.
Read moreLearn Linux performance tuning using Cgroups v2 and Systemd slices. Stop noisy neighbor syndrome on your VPS with practical, hands-on resource management.