Master Kubernetes node provisioning with Karpenter and Bottlerocket. Learn to optimize your cloud native infrastructure for speed, cost, and security.

During a recent sprint, our EKS cluster hit a wall. We were running a massive batch processing job that required spinning up 400 pods, but our standard Cluster Autoscaler was lagging behind by roughly 8 minutes. By the time the nodes actually joined the cluster, the job had already timed out, costing us around $1,200 in wasted compute cycles and a missed SLA. We realized the traditional autoscaler was just too slow at reconciling node groups, so we moved to Kubernetes Autoscaling: Karpenter vs Cluster Autoscaler Guide to handle our dynamic workloads.
The move to Karpenter wasn't just about speed; it was about granular control. Karpenter doesn't rely on pre-defined node groups. Instead, it observes the aggregate resource requests of unschedulable pods and makes direct calls to the EC2 fleet API. To secure the underlying OS, we paired it with Bottlerocket, an AWS-provided Linux-based OS purpose-built for hosting containers.
We first attempted to use standard Amazon Linux 2 AMIs with Karpenter. It broke because our security team required strict CIS benchmarks, and managing those configurations across thousands of ephemeral nodes became a configuration drift nightmare. Switching to Bottlerocket simplified this because the OS is read-only and lacks a traditional package manager, forcing us to handle security via Kubernetes Security: Implementing Zero-Trust with Kyverno and Policies.
To get started, you need to define an EC2NodeClass and a NodePool. Here is how we configured our initial setup:
YAMLapiVersion: karpenter.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: default spec: amiFamily: Bottlerocket role: "KarpenterNodeRole" subnetSelectorTerms: - tags: karpenter.sh/discovery: "my-cluster" --- apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: default spec: template: spec: nodeClassRef: name: default requirements: - key: "karpenter.sh/capacity-type" operator: In values: ["spot", "on-demand"]
Once we applied these manifests, the difference was immediate. Pods that previously waited nearly 10 minutes for capacity were now scheduling in under 90 seconds.
Bottlerocket is a game-changer for Cloud Native Infrastructure. Because it’s stripped down, the attack surface is significantly smaller. When we need to perform updates, we don't patch; we roll out a new node version and drain the old ones. If you're looking for further isolation, you can even pair this with Kubernetes Security: Hardening Runtimes with gVisor and Kata to ensure that even if a container is compromised, the host remains untouched.
We encountered one significant hurdle during implementation: our logging agent (FluentBit) required specific kernel parameters that Bottlerocket didn't expose by default. We had to write a custom user-data script to inject these settings during the node bootstrap phase. It wasn't the clean "plug-and-play" experience I expected, but it forced us to be more deliberate about our node configurations.

Q: Is Karpenter compatible with standard EKS managed node groups? A: Yes, you can run both simultaneously. We use managed node groups for our control plane and core services, and Karpenter for our bursty, ephemeral workloads.
Q: Does Bottlerocket require special management tools? A: Not strictly. It supports standard Kubernetes APIs, but you should use the Bottlerocket API for host-level tasks if you really need to drill down into the node.
Q: What happens if Karpenter fails to provision a node?
A: Karpenter logs errors directly to the controller pod. We’ve set up Prometheus alerts to notify us if the karpenter_provisioner_scheduling_duration_seconds metric exceeds our internal threshold, which usually happens when we hit AWS account service quotas.

I'm still not entirely convinced that our current TTL (Time-To-Live) settings for nodes are optimal. We’re currently set to terminate underutilized nodes after 30 minutes, but I suspect we might be incurring unnecessary churn during periods of low traffic. Next time, I’d like to experiment with a more aggressive consolidation policy, but for now, we’re focusing on stability. If you're managing complex stateful sets, remember that Karpenter doesn't magically solve data persistence issues—you'll still need robust solutions like Kubernetes Backup Strategies: Implementing Velero and MinIO to handle your volume snapshots before the nodes disappear.
Scaling Kubernetes Autoscaling with Karpenter and Bottlerocket has fundamentally changed how we view our cloud bill. We no longer over-provision for peak capacity; we provision for reality. It's a tighter loop, a smaller footprint, and frankly, a lot less headache during on-call rotations.
Master Laravel Pulse for Kubernetes observability. Learn to track application performance and resource usage with real-time insights in your cluster.