Master Kubernetes Canary Deployments using Flagger and Istio. Learn how to automate traffic shifting, run health checks, and achieve safer progressive delivery.
I’ve spent countless hours dealing with the "deploy and pray" method. You push a change, hold your breath, and watch the error rates climb. It’s stressful, and frankly, it’s unnecessary. If you’re running on Kubernetes, you have the tools to make this painless.
In this post, I’ll show you how to implement Kubernetes Canary Deployments using Flagger and Istio Service Mesh. We’ll move away from manual releases and toward automated progressive delivery.
Istio handles the heavy lifting of traffic routing at the network level, but managing those weight shifts manually is a recipe for disaster. That’s where Flagger comes in. It acts as an operator that watches your deployments, automates the canary analysis, and shifts traffic based on real-time metrics from Prometheus.
By combining these, you get:
Before we touch the code, understand the flow. Flagger creates a "Canary" custom resource. When you update your deployment image, Flagger detects the change, creates a clone of the deployment, and starts shifting traffic from your primary service to the canary version via Istio’s VirtualService.
First, ensure you have Istio (1.18+) and Flagger (1.30+) installed in your cluster. I’m assuming you have Prometheus running, as it’s the source of truth for your health checks.
Define your Canary resource like this:
YAMLapiVersion: flagger.app/v1beta1 kind: Canary metadata: name: backend-service namespace: prod spec: targetRef: apiVersion: apps/v1 kind: Deployment name: backend-service service: port: 80 gateways: - public-gateway.istio-system.svc.cluster.local hosts: - api.example.com analysis: interval: 1m threshold: 5 maxWeight: 50 stepWeight: 10 metrics: - name: request-success-rate thresholdRange: min: 99 interval: 1m - name: request-duration thresholdRange: max: 500 interval: 1m
Once you apply this, the workflow changes. You stop applying Deployment manifests directly. Instead, you update your image tag in your deployment manifest, and Flagger takes the wheel.
When you change the image, Flagger:
backend-service-primary and backend-service-canary deployments.VirtualService to route 5% of traffic to the canary.I’ve learned a few hard lessons implementing this in production.
Don't ignore the feedback loop. If your analysis window is too short, you’ll promote buggy code. If it’s too long, your deployments will take forever. I usually stick to a 1-minute interval for 5-10 iterations. It’s the "Goldilocks" zone for most microservices.
Watch your Prometheus queries. Flagger uses specific Prometheus queries for success rates. Ensure your application is exporting standard Istio metrics (like istio_requests_total). If those aren't firing, Flagger will hang in a "Waiting for metrics" state indefinitely.
Use Webhooks for smoke tests. You can add webhooks to the analysis section to run automated integration tests during the canary phase. It’s the best way to catch logic errors that metrics alone might miss.
Implementing Kubernetes Canary Deployments isn't just about the technology; it's about shifting your mindset. You stop fearing the release because you've automated the safety net. With Flagger and Istio, you can sleep better knowing the system is watching your error rates for you.
Start small. Apply this to a non-critical service first. Once you see the traffic shifting in the Istio dashboard, you'll never want to go back to manual updates again.
Learn how to automate canary deployments using Flagger and Istio. Follow a step‑by‑step guide with real‑world examples, CI/CD integration, and progressive delivery best practices.
Read moreMaster GitOps-driven canary deployments using Argo Rollouts and Flagger. Learn how to automate Kubernetes progressive delivery for safer, faster production releases.