Mahamudul Hasan Rubel
HomeAboutProjectsSkillsExperienceBlogPhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
DevOpsJune 23, 20265 min read

Linux Performance: Debugging CPU Stalls in Docker with perf

Linux performance issues in Docker can be elusive. Learn how to use perf to profile CPU stalls and solve resource contention in your containers.

LinuxDockerPerformanceDebuggingperfKernelDevOpsCI/CD

Last month, our primary API service started spiking to 100% CPU usage every time we pushed a new deployment, despite the load remaining flat. We spent about two days chasing phantom memory leaks before realizing the issue wasn't the code—it was the kernel context switching like crazy due to hidden resource contention.

When your application is wrapped in a container, the abstraction often hides the reality of what the CPU is actually doing. You might see high "user" time in top, but that doesn't tell you if the CPU is stalling while waiting for memory or context switching between threads. To fix this, you need to look at the hardware level using perf.

Understanding Linux performance and CPU stalls

When you're dealing with Linux performance inside a container, the first mistake is relying on tools that don't account for the shared nature of the host. If you only look at docker stats, you see the container's perspective. It’s useful, but it won't show you if your process is blocked by a cache miss or a hardware interrupt on the host.

We’ve previously covered Linux performance: Resolving Docker I/O Bottlenecks with eBPF, but when it's pure CPU cycles that are being wasted, perf is the gold standard. It allows us to sample the CPU and see exactly which functions are consuming the most time, including kernel-level calls that your application triggers.

Setting up perf for Docker troubleshooting

To use perf effectively, you have to run it on the host. If you try to run it inside a standard Alpine or Debian-slim container, you’ll likely run into permission issues because perf requires access to the kernel's performance monitoring unit (PMU).

The easiest way to get started is to find the PID of your containerized process on the host:

Bash
# Find the PID on the host
docker inspect --format '{{.State.Pid}}' <container_name>

# Run perf on that specific PID
sudo perf top -p <PID>

If you see [kernel] at the top of your list with high percentages, you’re looking at kernel overhead. In our case, we found that our service was hitting the kernel constantly for mutex locks, which showed up as _raw_spin_lock. This is a classic sign of resource contention where threads are fighting for the same memory resource.

Deep dive: profiling CPU stalls

If perf top doesn't give you enough detail, you need to record the events. perf record is the go-to command for Docker troubleshooting. We usually run this for 10-15 seconds to get a representative sample:

Bash
sudo perf record -g -p <PID> sleep 15
sudo perf report

The -g flag is the most important part here—it enables call-graph recording. Without it, you just get a flat list of functions. With it, you can see the stack trace of what called the function that’s stalling.

We’ve seen scenarios where high CPU wasn't caused by logic, but by inefficient syscalls. If you are also seeing networking lag, you might want to look into Docker networking latency: Debugging with eBPF and tcpretrans to ensure your CPU isn't just spinning while waiting on the network stack.

Common pitfalls with perf

I’ve made the mistake of trying to run perf inside a containerized environment without the --privileged flag or the necessary capabilities. It simply won't work. Even if you get it running, if your container is missing debug symbols for your binary, the output will look like a bunch of hex addresses instead of function names.

Always ensure your binaries are compiled with frame pointers (e.g., -fno-omit-frame-pointer in GCC) if you want clean call graphs. If you're using Go or Rust, the compilers usually handle this, but it’s worth double-checking your build pipeline.

If you find that your CPU spikes are related to memory access patterns, you might consider Linux Performance: Tuning HugePages for High-Traffic Docker Databases to reduce the overhead of page table lookups. Sometimes, the CPU isn't "stalled" on logic—it's just working too hard to map memory.

FAQ: Common questions on CPU profiling

Q: Does running perf affect my production performance? A: Yes, slightly. perf introduces a non-zero overhead because it’s sampling the CPU. However, for short intervals (10-30 seconds), it’s usually negligible compared to the insight you gain. Just don't leave it running in a record loop indefinitely.

Q: Why don't I see function names in my report? A: This usually means you’re missing the debug symbols for the binary you’re profiling. Ensure the binary isn't stripped, or install the -dbg packages for the libraries you're using on the host.

Q: Can I use perf for non-Dockerized processes? A: Absolutely. perf is a host-level tool. It doesn't care if the process is inside a container or running directly on the host; it tracks the PID as the kernel sees it.

Final thoughts

The biggest lesson I’ve learned is that containers lie to you. They tell you that you have 4 CPUs, but they don't tell you that those CPUs are being throttled or that the kernel is struggling to manage the memory mappings for your workload.

Using perf for CPU profiling is the only way to peel back that layer of abstraction. Next time, I’d probably start by looking at perf stat first to get a high-level view of cache misses and branch mispredictions before diving into the call graphs. It’s a bit of a steep learning curve, but once you can read a flame graph, you’ll never go back to guessing why your service is running hot.

Back to Blog

Similar Posts

DevOpsJune 23, 20264 min read

Linux Performance: Tuning HugePages for High-Traffic Docker Databases

Linux performance gains are waiting in your RAM. Learn how to tune HugePages to reduce page table overhead for your high-traffic Docker databases.

Read more
DevOpsJune 22, 20264 min read

Docker networking latency: Debugging with eBPF and tcpretrans

Docker networking latency can kill your performance. Learn how to use eBPF and tcpretrans to find silent packet loss on your high-traffic VPS.

Read more
DevOpsJune 23, 20264 min read

Linux kernel security: How to harden your Docker host with LKRG

Linux kernel security starts at the host level. Learn how to implement LKRG for Docker host hardening and detect runtime integrity threats effectively.

Read more