GPU Passthrough with Docker is tricky. Learn to configure the NVIDIA Container Toolkit, manage resource limits, and stabilize your Linux infrastructure.
Getting GPU acceleration working inside a Docker container isn't as simple as passing a flag to the daemon. When I first tried to offload some heavy LLM inference to a dedicated GPU on a rented bare-metal box, I spent about two days wrestling with driver mismatches and permission errors before I finally got a stable pipeline.
If you’re running a self-hosted VPS with a physical GPU, you aren't just managing software; you’re managing the bridge between the host kernel and the container runtime.
Before you start, make sure your host is ready. You need the proprietary NVIDIA drivers installed on the host OS—don't try to install them inside the container. The container just needs the libraries to talk to the device nodes that the host exposes.
The NVIDIA Container Toolkit is the industry standard for this. It essentially hooks into the Docker engine to inject the necessary runtime libraries and device mappings at container creation time.
Here is the basic verification flow I use on Ubuntu 22.04:
nvidia-smi returns a clean output on the host.Bashcurl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg # Add the repo and update sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Bashsudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker
Once the toolkit is active, you’ll be tempted to just run docker run --gpus all. Don't. If you have multiple containers fighting for the same VRAM, your entire stack will crash with cryptic CUDA_ERROR_OUT_OF_MEMORY messages.
Managing Container Resource Limits is where most self-hosted setups fall apart. Unlike CPU or RAM, Docker doesn't have a native "VRAM limit" flag. You have to rely on the application itself to cap its usage or use environment variables like CUDA_VISIBLE_DEVICES to partition the hardware if you have multiple GPUs.
I’ve found that using Linux Infrastructure monitoring is essential here. I keep a close eye on memory pressure, as I discussed in my guide on Linux Performance Tuning: Managing Swap and OOM for Docker VPS. If your containers hit the OOM killer because they are swapping VRAM to system memory, your performance will drop by roughly 10x instantly.
My first attempt at this involved mapping the entire /dev/nvidia* tree into a privileged container. It worked for about 20 minutes until a container crash left the device nodes in a locked state, forcing me to hard-reboot the server.
I learned the hard way that you should let the NVIDIA Container Toolkit handle the device injection. Avoid using --privileged mode if you can help it. It’s a massive security hole and completely unnecessary when the toolkit is configured correctly.
If your container starts but nvidia-smi fails inside, check these three things:
docker-compose.yml:
YAMLservices: inference: image: nvidia/cuda:12.0.0-base-ubuntu22.04 deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu]
If you're running multiple services, consider using Uptime Kuma Self-Hosted Monitoring: A Simple Guide for VPS Health to watch the heartbeat of your GPU-dependent containers. It won't debug CUDA, but it will tell you exactly when a container has entered a crash loop due to a resource constraint.
Q: Can I share one GPU between two containers? A: Yes, but they will compete for VRAM. Unless the application supports fractional GPU allocation (like NVIDIA MPS), they will fight, and the one that hits the VRAM ceiling first will crash.
Q: Does GPU passthrough work on virtualized VPS instances? A: Usually, no. Most cloud providers don't support true GPU passthrough unless you pay for a dedicated GPU instance. If you're using a KVM-based VPS, you need the hypervisor to support PCI passthrough, which is rare in shared hosting.
Q: How do I know if the toolkit is working?
A: Run docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi. If you see your GPU table, it’s configured correctly.
I'm still tinkering with how to gracefully handle GPU failure. Right now, if the driver hangs, I’m stuck with a manual restart. If you have a better way to reset the GPU state without a full host reboot, I’d love to hear it. For now, this setup is stable enough for my internal tools, but it's definitely a "watch it closely" situation.
Master Uptime Kuma for self-hosted monitoring. Learn to track your VPS health and service uptime using Docker with this straightforward deployment guide.
Read moreDocker security improves drastically when you implement immutable infrastructure. Learn how to configure a read-only root filesystem to harden your VPS.