DevOpsJune 24, 20264 min read

GPU Passthrough with Docker: NVIDIA Container Toolkit Setup Guide

GPU Passthrough with Docker is tricky. Learn to configure the NVIDIA Container Toolkit, manage resource limits, and stabilize your Linux infrastructure.

DockerGPULinuxNVIDIASelf-HostingInfrastructureDevOpsCI/CD

Getting GPU acceleration working inside a Docker container isn't as simple as passing a flag to the daemon. When I first tried to offload some heavy LLM inference to a dedicated GPU on a rented bare-metal box, I spent about two days wrestling with driver mismatches and permission errors before I finally got a stable pipeline.

If you’re running a self-hosted VPS with a physical GPU, you aren't just managing software; you’re managing the bridge between the host kernel and the container runtime.

Understanding the GPU Passthrough Stack

Before you start, make sure your host is ready. You need the proprietary NVIDIA drivers installed on the host OS—don't try to install them inside the container. The container just needs the libraries to talk to the device nodes that the host exposes.

The NVIDIA Container Toolkit is the industry standard for this. It essentially hooks into the Docker engine to inject the necessary runtime libraries and device mappings at container creation time.

Here is the basic verification flow I use on Ubuntu 22.04:

Ensure nvidia-smi returns a clean output on the host.

Install the toolkit repository:


Bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
# Add the repo and update
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

Configure the Docker daemon to use the NVIDIA runtime:


Bash
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Managing Container Resource Limits

Once the toolkit is active, you’ll be tempted to just run docker run --gpus all. Don't. If you have multiple containers fighting for the same VRAM, your entire stack will crash with cryptic CUDA_ERROR_OUT_OF_MEMORY messages.

Managing Container Resource Limits is where most self-hosted setups fall apart. Unlike CPU or RAM, Docker doesn't have a native "VRAM limit" flag. You have to rely on the application itself to cap its usage or use environment variables like CUDA_VISIBLE_DEVICES to partition the hardware if you have multiple GPUs.

I’ve found that using Linux Infrastructure monitoring is essential here. I keep a close eye on memory pressure, as I discussed in my guide on Linux Performance Tuning: Managing Swap and OOM for Docker VPS. If your containers hit the OOM killer because they are swapping VRAM to system memory, your performance will drop by roughly 10x instantly.

The "Wrong Turn" I Took

My first attempt at this involved mapping the entire /dev/nvidia* tree into a privileged container. It worked for about 20 minutes until a container crash left the device nodes in a locked state, forcing me to hard-reboot the server.

I learned the hard way that you should let the NVIDIA Container Toolkit handle the device injection. Avoid using --privileged mode if you can help it. It’s a massive security hole and completely unnecessary when the toolkit is configured correctly.

Debugging Common Failures

If your container starts but nvidia-smi fails inside, check these three things:

Version Mismatch: The host driver version must be equal to or newer than the version expected by the toolkit libraries inside the container image.
Cgroup v2: If you're on a newer kernel, ensure your cgroup setup supports the NVIDIA runtime.

Docker Compose: Always specify the runtime in your docker-compose.yml:


YAML
services:
  inference:
    image: nvidia/cuda:12.0.0-base-ubuntu22.04
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

If you're running multiple services, consider using Uptime Kuma Self-Hosted Monitoring: A Simple Guide for VPS Health to watch the heartbeat of your GPU-dependent containers. It won't debug CUDA, but it will tell you exactly when a container has entered a crash loop due to a resource constraint.

FAQ

Q: Can I share one GPU between two containers? A: Yes, but they will compete for VRAM. Unless the application supports fractional GPU allocation (like NVIDIA MPS), they will fight, and the one that hits the VRAM ceiling first will crash.

Q: Does GPU passthrough work on virtualized VPS instances? A: Usually, no. Most cloud providers don't support true GPU passthrough unless you pay for a dedicated GPU instance. If you're using a KVM-based VPS, you need the hypervisor to support PCI passthrough, which is rare in shared hosting.

Q: How do I know if the toolkit is working? A: Run docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi. If you see your GPU table, it’s configured correctly.

I'm still tinkering with how to gracefully handle GPU failure. Right now, if the driver hangs, I’m stuck with a manual restart. If you have a better way to reset the GPU state without a full host reboot, I’d love to hear it. For now, this setup is stable enough for my internal tools, but it's definitely a "watch it closely" situation.

Back to Blog

GPU Passthrough with Docker: NVIDIA Container Toolkit Setup Guide

Understanding the GPU Passthrough Stack

Managing Container Resource Limits

The "Wrong Turn" I Took

Debugging Common Failures

FAQ

Similar Posts

Uptime Kuma Self-Hosted Monitoring: A Simple Guide for VPS Health

Docker Security: Implementing Immutable Infrastructure via Read-Only Root

Linux Administration: Building a Private APT/YUM Mirror for Docker