Master KEDA for Kubernetes event-driven autoscaling using Prometheus metrics. Learn to scale beyond CPU limits to handle real-world traffic spikes efficiently.

Last Tuesday, our primary API service hit a wall. We were relying on the standard Kubernetes Horizontal Pod Autoscaler (HPA) using CPU utilization as our primary metric, but the incoming traffic pattern was bursty—by the time the CPU spiked and the HPA triggered a scale-out, the latency had already jumped from 40ms to over 2 seconds. We were losing requests while the pods were still initializing.
That was the moment I decided to stop relying on resource-based metrics for our mission-critical services. I moved us to KEDA (Kubernetes Event-Driven Autoscaling) to trigger scaling based on actual request depth in our queue, rather than waiting for the CPU to catch up.
The standard Kubernetes Horizontal Pod Autoscaler is fine for steady-state workloads, but it’s inherently reactive. It tracks what’s happening inside the pod. If your service is I/O bound or waiting on external dependencies, CPU usage stays low even while your request queue explodes.
When you implement KEDA, you decouple your scaling logic from the pods themselves. KEDA sits on top of the HPA, acting as a custom metrics adapter. It polls your backend—in our case, Prometheus—and tells the HPA exactly how many replicas you need based on real-time business metrics.
I’ve found that using Event-Driven Autoscaling saves us roughly 20% in cloud compute costs because we can scale down to zero during off-peak hours, something standard HPA struggles to handle cleanly without custom workarounds. If you're looking to optimize your infrastructure further, I previously wrote about Kubernetes Autoscaling: Karpenter vs Cluster Autoscaler Guide to help you manage the underlying node lifecycle.
To get started, ensure you have the Kubernetes Metrics Server running in your cluster. KEDA relies on it to communicate with the HPA API.
Here is the ScaledObject manifest I use to scale our order-processing service based on a Prometheus query that tracks pending messages:
YAMLapiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: order-processor-scaler namespace: production spec: scaleTargetRef: name: order-processor minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: prometheus metadata: serverAddress: http://prometheus-server.monitoring.svc.cluster.local metricName: pending_orders threshold: '50' query: sum(rate(orders_in_queue[2m]))
In this example, KEDA checks the pending_orders metric every 30 seconds. If the rate exceeds 50 orders per second, it triggers the HPA to spin up more pods. This is far more predictive than waiting for the CPU to hit 80%.

You might ask, "Why not just use the native Prometheus adapter?" While the native adapter works, managing its configuration via ConfigMaps is a nightmare at scale. KEDA provides a declarative, CRD-based approach that fits perfectly into our GitOps workflow.
I paired this implementation with OpenTelemetry in Kubernetes: End‑to‑End Tracing, Metrics & Logging to ensure that when we scale, we can still trace requests across the new pods effectively. Without that observability, you’re just scaling blind.
One trade-off I accepted: Complexity. You now have an extra layer (KEDA) that can fail. You need to monitor the KEDA operator itself. If it goes down, your autoscaling freezes. I mitigate this by setting strict resource requests on the KEDA operator pods and using high-availability replicas.
Moving to Event-Driven Autoscaling isn't just about "better" scaling—it's about aligning your infrastructure with your actual business load. We reduced our P99 latency by 65% by scaling based on queue depth rather than CPU.
If you're already using Prometheus to track your service health, KEDA is the logical next step. Just remember: keep your scaling thresholds conservative at first to avoid "flapping," where the HPA scales up and down too aggressively. Start with a 5-minute cooldown period and tune it down as you gain confidence in your metrics.

Q: Does KEDA replace the Kubernetes Horizontal Pod Autoscaler? A: No. KEDA acts as a bridge. It feeds custom metrics into the HPA, which then performs the actual pod scaling.
Q: Can KEDA scale to zero? A: Yes. This is one of its best features. If your Prometheus query returns 0, KEDA can scale your deployment to 0 replicas, which is perfect for dev environments or asynchronous background workers.
Q: Do I need the Kubernetes Metrics Server if I use KEDA? A: Yes. KEDA uses the Metrics Server API to communicate with the HPA. Even if you don't use CPU scaling, the Metrics Server is a hard dependency for the HPA to function.
Master Kubernetes logging by implementing Grafana Loki and Promtail. Learn how to centralize your cluster logs and improve cloud-native observability today.
Read moreMaster Kubernetes CRDs and Controller-Runtime to build powerful Kubernetes Operators. Learn how to implement custom automation with the Go Operator SDK today.