KubernetesInfrastructureNetworkingJune 19, 20263 min read

Implementing Kubernetes NodeLocal DNSCache for Lower DNS Latency

Learn how to implement Kubernetes NodeLocal DNSCache to slash DNS latency, reduce CoreDNS load, and improve overall cluster performance in production.

KubernetesDevOpsDNSNetworkingPerformanceCoreDNS

During a routine performance review of our high-traffic microservices cluster, we noticed an alarming trend: 14% of our external API requests were failing with 504 Gateway Timeout errors. After digging through the traces, we realized the bottleneck wasn't the application code or the database; it was the DNS resolution time. Requests were waiting for up to 800ms just to resolve internal service names, a direct side effect of the default CoreDNS architecture where every pod hits a centralized service IP.

Why Kubernetes NodeLocal DNSCache matters

In a default cluster, every DNS query from a pod has to travel through the network stack to reach the CoreDNS service. This introduces significant network overhead and contention, especially as your pod density grows. By implementing Kubernetes NodeLocal DNSCache, you move the DNS cache directly onto the worker node. This changes the lookup path from a network-traversed service call to a local loopback request, effectively cutting down DNS latency from double-digit milliseconds to under 2ms in most of our p99 measurements.

If you’re managing your nodes with tools like Karpenter, you’ll find that adding a DaemonSet for local caching is a trivial but high-impact configuration. It offloads the central CoreDNS pods, which often become the bottleneck during traffic spikes.

Our failed attempt at optimization

Creative display of the word 'OPTIMIZE' on a pink textured surface.

Before settling on NodeLocal DNSCache, we tried scaling CoreDNS horizontally by increasing the replica count from 2 to 10. That was a mistake. We saw a temporary improvement, but the increased pod-to-pod communication overhead just shifted the congestion from the DNS pods to the kube-proxy iptables rules. Our latency jitter actually worsened, jumping from a stable 15ms to an unpredictable range of 5ms to 120ms.

We then pivoted to NodeLocal DNSCache. The implementation involves running a cache agent as a DaemonSet on each node. It listens on a specific local IP (usually 169.254.20.10), and we configure our pods to use this IP as their nameserver.

Here is the basic configuration we applied to the NodeLocalDNS manifest:


YAML
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-local-dns
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: node-cache
        image: registry.k8s.io/dns/k8s-dns-node-cache:1.22.20
        args:
          - -localip
          - 169.254.20.10
          - -conf
          - /etc/coredns/Corefile

Once we applied this, we had to update our kubelet configuration to point to this local IP. If you don't update the clusterDNS flag in your kubelet settings, your pods will keep querying the old, congested service IP.

Measuring cluster performance gains

The results were immediate. We saw a 38% reduction in total DNS-related CPU usage across the cluster. More importantly, the intermittent 504 errors vanished. While we still have to manage node lifecycles with Kubernetes Cluster API, the local cache survives node reboots and keeps the DNS resolution layer stable.

One thing to watch out for: if you have complex rewrite rules in your primary CoreDNS config, ensure they are mirrored in the local cache config. We spent half a day debugging why some internal service lookups were failing post-migration because we forgot to propagate a custom search domain setting to the local cache.

FAQ

Close-up of a magnifying glass focusing on the phrase 'Frequently Asked Questions'.

Q: Does NodeLocal DNSCache replace CoreDNS? A: No, it acts as a local cache. It still forwards non-cached requests to the upstream CoreDNS pods.

Q: Will this increase memory usage on my nodes? A: Yes, each node will now run an extra pod. In our experience, it consumes about 50MB of RAM per node, which is a negligible trade-off for the latency improvements.

Q: Is it difficult to roll back if things break? A: Not really. You can simply revert the clusterDNS configuration in your Kubelet and delete the DaemonSet.

Looking back, I wish we had implemented this sooner instead of chasing replica counts on CoreDNS. It's a classic case of infrastructure architecture outperforming brute-force scaling. I'm still curious if we could squeeze more performance by switching to a different backend for the cache, but for now, this setup is solid.

Back to Blog

Implementing Kubernetes NodeLocal DNSCache for Lower DNS Latency

Why Kubernetes NodeLocal DNSCache matters

Our failed attempt at optimization

Measuring cluster performance gains

FAQ

Similar Posts

Kubernetes Ingress: NGINX vs Gateway API for Traffic Routing

Kubernetes Network Policies Debugging with Cilium Hubble

Scaling Laravel Queues on Kubernetes: A KEDA Implementation Guide