DevOpsJune 22, 20264 min read

Linux performance: Managing Entropy Issues in Docker Containers

Linux performance can tank when Docker containers starve for entropy. Learn why your crypto operations are hanging and how to fix it with haveged.

LinuxDockerPerformanceEntropySystem AdministrationDevOpsCI/CD

I spent three days last month chasing a "ghost" in our staging environment. Our Go-based microservices were taking nearly 10 seconds to start up, and once they were running, any request requiring TLS termination would hang for about 800ms before processing. Everything looked fine—CPU usage was low, memory was stable, and there were no obvious network issues like those I’ve documented when debugging Docker networking latency.

It wasn't a resource limit or a bad container image. It was entropy exhaustion.

Why Your Containers Are Throttled

In Linux, /dev/random is a blocking device. It collects environmental noise—hard drive seeks, interrupt timings, and hardware events—to generate high-quality entropy. When a process requests random bits for things like SSH handshakes, TLS connections, or generating session keys, the kernel provides them. If the pool of entropy runs dry, the kernel blocks the process until it can harvest enough noise to generate more bits.

On a physical server with lots of spinning disks and active users, this rarely happens. But we aren't running on bare metal anymore. We're running in virtualized environments, often on VPS providers where the "noise" is intentionally dampened to keep the hypervisor efficient.

When you run dozens of containers on a single host, they all pull from the same kernel entropy pool. If your app is doing heavy cryptographic work, you'll hit a wall. You aren't seeing high CPU usage because your processes aren't "working"—they are sitting in an interruptible sleep state, waiting for the kernel to give them a random number.

Diagnosing the Problem

Before you start installing tools, confirm that this is actually your bottleneck. You can check the current entropy available on your host by reading the entropy_avail file:


Bash
cat /proc/sys/kernel/random/entropy_avail

A healthy system usually hovers between 2000 and 4000. If you see a number consistently below 200, your applications will experience significant latency during cryptographic operations.

I first tried to solve this by tweaking our Linux performance limits, thinking we had a cgroup constraint. That was a dead end. I then checked if we were leaking file descriptors, but that didn't explain the TLS handshake latency. It wasn't until I ran watch -n 1 cat /proc/sys/kernel/random/entropy_avail while triggering a service restart that I saw the number plummet to nearly zero.

Fixing Entropy Starvation with Haveged

The most reliable way to solve this in a virtualized or containerized environment is to use haveged. It uses the hardware AES instruction set (found on almost all modern CPUs) to generate a high-quality stream of random numbers, effectively feeding the entropy pool and preventing it from ever running dry.

On Debian or Ubuntu, the installation is straightforward:


Bash
sudo apt update
sudo apt install haveged
sudo systemctl enable --now haveged

Once installed, check the status of the service and verify that entropy_avail has climbed back into a safe range. In my experience, you’ll see it jump to 3000+ almost immediately.

Should You Use Haveged Everywhere?

There is a long-standing debate in the security community about haveged and whether its output is "truly" random enough for high-stakes cryptographic keys. For a standard web application, API, or internal microservice, it is perfectly adequate. If you are handling high-security financial transactions or generating long-lived root CA keys, you might want to look into hardware random number generators (TRNGs) like those found on some dedicated server motherboards.

However, for 99% of the Docker workloads I manage, haveged is the difference between a snappy application and one that feels like it’s running on a dial-up connection.

Lessons Learned

If I had to do this over, I would have integrated this check into our initial server provisioning scripts using Ansible. I spent too much time looking at application logs and network traces when the answer was sitting right in /proc.

One caveat: if you are running in a very strict, high-security environment, always verify with your security team before deploying haveged. While it’s the industry standard for fixing entropy starvation, some compliance frameworks prefer specific hardware-based entropy sources.

When you're optimizing your stack, don't forget that the kernel is just as much a part of your application's performance as your code. Whether you're working on Docker optimization or hardening your host, keep an eye on those low-level system metrics. They often hold the keys to those "unexplainable" production mysteries.

FAQ

Does haveged increase CPU usage? Negligibly. It uses the AES-NI instruction set, which is hardware-accelerated. You won't notice a spike on your host metrics.

Is this necessary for Rootless Docker? Yes. Even if you are using Rootless Docker, the containers still share the host's kernel and entropy pool. The bottleneck remains the same.

Can I just use rng-tools instead? You can, but haveged is generally easier to configure and more effective on virtualized hardware where entropy sources are limited.

Back to Blog

Linux performance: Managing Entropy Issues in Docker Containers

Why Your Containers Are Throttled

Diagnosing the Problem

Fixing Entropy Starvation with Haveged

Should You Use Haveged Everywhere?

Lessons Learned

FAQ

Similar Posts

eBPF-based socket monitoring: Tracking latency in Docker containers

Linux Kernel Tuning: Fixing Socket Exhaustion in Docker Proxies

Docker networking latency: Debugging with eBPF and tcpretrans