Master Kubernetes Admission Controllers with Kubebuilder. Learn how to build custom Validating Admission Webhooks to enforce cluster-wide policy and security.

During our Q3 infrastructure audit, we discovered that roughly 35% of the pods in our production cluster were missing mandatory resource limits. This wasn't just a "best practice" issue; it was causing noisy neighbor problems that occasionally crashed our Kubernetes Observability: Implementing Distributed Tracing with Tempo stack during peak hours. We needed a way to reject non-compliant deployments at the gate, specifically using Kubernetes Admission Controllers to enforce these constraints before they ever hit etcd.
When you need to intercept requests to the Kubernetes API, you have two primary options: Mutating or Validating. For this specific task, I chose Validating Admission Webhooks because I didn't want to change the user's YAML—I wanted to force them to fix it.
I started by using kubebuilder (v3.10.0), which scaffolds the necessary boilerplate to talk to the Kubernetes API. The process is straightforward, but the devil is in the TLS configuration and the controller's handshake with the API server.
First, initialize your project:
Bashkubebuilder init --domain mahamudul.com --repo github.com/mhrubel/admission-controller kubebuilder create webhook --group core --version v1 --kind Pod --programmatic-validation
This generates a webhook.go file where you implement the ValidateCreate and ValidateUpdate methods. Here is how I enforced the resource limits:
Gofunc (r *PodValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) { pod := obj.(*corev1.Pod) for _, container := range pod.Spec.Containers { if container.Resources.Limits.Cpu().IsZero() { return nil, fmt.Errorf("container %s missing CPU limit", container.Name) } } return nil, nil }
My first approach was to bundle the TLS certificates inside the container image. It was a disaster. Every time the pod restarted or moved to a new node, the IP-based certs became invalid, causing the API server to hang for exactly 30 seconds before timing out the request. This made our CI/CD pipelines brittle and slow.
I learned the hard way that you should use cert-manager to manage the CA injection. Once I shifted to using cert-manager for the webhook's certificate lifecycle, the latency dropped from a timeout-inducing 30s to roughly 120ms per validation check.
While I focused on resource limits, you can extend this to enforce image provenance. If you're interested in hardening your supply chain, you might want to look into Kubernetes Security Auditing: Automating Trivy with Admission Controllers to ensure that only scanned images get through your gate.
Integrating these tools is easier once you have the base Kubebuilder scaffolding established. Just remember that the API server must be able to reach your webhook service; if you’re running in a private VPC, ensure your MutatingWebhookConfiguration or ValidatingWebhookConfiguration has the correct service.namespace and service.name defined.
failurePolicy: Fail for strict enforcement, but realize that if your webhook pod dies, you’ve just effectively locked yourself out of creating new resources. Keep your deployment highly available.namespaceSelector to ignore the kube-system namespace. If you don't, you might accidentally block essential system components during a cluster upgrade, which is a headache you don't want.Q: Why use Kubebuilder instead of writing the server from scratch?
A: Kubebuilder handles the complex admission.Review JSON marshaling and provides a robust testing framework (envtest) that saves hours of debugging.
Q: Can I use this for cross-cluster policy enforcement? A: No. Admission controllers are local to the cluster where they are registered. For multi-cluster, look into OPA Gatekeeper or Kyverno.
Q: How do I debug the webhook if it keeps failing? A: Check the API server logs. If the webhook is unreachable, the API server will explicitly log the connection error, usually pointing to a misconfigured CA bundle or a service mismatch.
Developing these webhooks is a massive step toward "Policy as Code," but it adds a new layer of operational complexity. I'm still not entirely convinced that I've optimized the envtest suite—it currently takes longer to run than I'd like, and I suspect there's a way to mock the API server more efficiently. If you're starting out, keep your validation logic simple. It's better to have a webhook that checks one thing reliably than a complex one that fails silently under load.
Kubernetes ResourceQuotas and Kyverno are the keys to cluster stability. Learn to automate resource limits and prevent noisy neighbor issues in production.