Mahamudul Hasan Rubel
HomeAboutProjectsSkillsExperienceBlogCoursesPhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Courses
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
DevOpsJune 24, 20264 min read

GPU Passthrough with Docker: NVIDIA Container Toolkit Setup Guide

GPU Passthrough with Docker is tricky. Learn to configure the NVIDIA Container Toolkit, manage resource limits, and stabilize your Linux infrastructure.

DockerGPULinuxNVIDIASelf-HostingInfrastructureDevOpsCI/CD

Getting GPU acceleration working inside a Docker container isn't as simple as passing a flag to the daemon. When I first tried to offload some heavy LLM inference to a dedicated GPU on a rented bare-metal box, I spent about two days wrestling with driver mismatches and permission errors before I finally got a stable pipeline.

If you’re running a self-hosted VPS with a physical GPU, you aren't just managing software; you’re managing the bridge between the host kernel and the container runtime.

Understanding the GPU Passthrough Stack

Before you start, make sure your host is ready. You need the proprietary NVIDIA drivers installed on the host OS—don't try to install them inside the container. The container just needs the libraries to talk to the device nodes that the host exposes.

The NVIDIA Container Toolkit is the industry standard for this. It essentially hooks into the Docker engine to inject the necessary runtime libraries and device mappings at container creation time.

Here is the basic verification flow I use on Ubuntu 22.04:

  1. Ensure nvidia-smi returns a clean output on the host.
  2. Install the toolkit repository:
    Bash
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
    # Add the repo and update
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
  3. Configure the Docker daemon to use the NVIDIA runtime:
    Bash
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker

Managing Container Resource Limits

Once the toolkit is active, you’ll be tempted to just run docker run --gpus all. Don't. If you have multiple containers fighting for the same VRAM, your entire stack will crash with cryptic CUDA_ERROR_OUT_OF_MEMORY messages.

Managing Container Resource Limits is where most self-hosted setups fall apart. Unlike CPU or RAM, Docker doesn't have a native "VRAM limit" flag. You have to rely on the application itself to cap its usage or use environment variables like CUDA_VISIBLE_DEVICES to partition the hardware if you have multiple GPUs.

I’ve found that using Linux Infrastructure monitoring is essential here. I keep a close eye on memory pressure, as I discussed in my guide on Linux Performance Tuning: Managing Swap and OOM for Docker VPS. If your containers hit the OOM killer because they are swapping VRAM to system memory, your performance will drop by roughly 10x instantly.

The "Wrong Turn" I Took

My first attempt at this involved mapping the entire /dev/nvidia* tree into a privileged container. It worked for about 20 minutes until a container crash left the device nodes in a locked state, forcing me to hard-reboot the server.

I learned the hard way that you should let the NVIDIA Container Toolkit handle the device injection. Avoid using --privileged mode if you can help it. It’s a massive security hole and completely unnecessary when the toolkit is configured correctly.

Debugging Common Failures

If your container starts but nvidia-smi fails inside, check these three things:

  • Version Mismatch: The host driver version must be equal to or newer than the version expected by the toolkit libraries inside the container image.
  • Cgroup v2: If you're on a newer kernel, ensure your cgroup setup supports the NVIDIA runtime.
  • Docker Compose: Always specify the runtime in your docker-compose.yml:
    YAML
    services:
      inference:
        image: nvidia/cuda:12.0.0-base-ubuntu22.04
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: 1
                  capabilities: [gpu]

If you're running multiple services, consider using Uptime Kuma Self-Hosted Monitoring: A Simple Guide for VPS Health to watch the heartbeat of your GPU-dependent containers. It won't debug CUDA, but it will tell you exactly when a container has entered a crash loop due to a resource constraint.

FAQ

Q: Can I share one GPU between two containers? A: Yes, but they will compete for VRAM. Unless the application supports fractional GPU allocation (like NVIDIA MPS), they will fight, and the one that hits the VRAM ceiling first will crash.

Q: Does GPU passthrough work on virtualized VPS instances? A: Usually, no. Most cloud providers don't support true GPU passthrough unless you pay for a dedicated GPU instance. If you're using a KVM-based VPS, you need the hypervisor to support PCI passthrough, which is rare in shared hosting.

Q: How do I know if the toolkit is working? A: Run docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi. If you see your GPU table, it’s configured correctly.

I'm still tinkering with how to gracefully handle GPU failure. Right now, if the driver hangs, I’m stuck with a manual restart. If you have a better way to reset the GPU state without a full host reboot, I’d love to hear it. For now, this setup is stable enough for my internal tools, but it's definitely a "watch it closely" situation.

Back to Blog

Similar Posts

DevOpsJune 21, 20264 min read

Uptime Kuma Self-Hosted Monitoring: A Simple Guide for VPS Health

Master Uptime Kuma for self-hosted monitoring. Learn to track your VPS health and service uptime using Docker with this straightforward deployment guide.

Read more
DevOpsJune 24, 20264 min read

Docker Security: Implementing Immutable Infrastructure via Read-Only Root

Docker security improves drastically when you implement immutable infrastructure. Learn how to configure a read-only root filesystem to harden your VPS.

Read more
DevOpsJune 24, 20264 min read

Linux Administration: Building a Private APT/YUM Mirror for Docker

Master Linux administration by building a self-hosted repository for air-gapped Docker infrastructure. Ensure consistent, secure deployments every time.

Read more