Learn how to automate canary deployments using Flagger and Istio. Follow a step‑by‑step guide with real‑world examples, CI/CD integration, and progressive delivery best practices.
If you’ve ever rolled a new version to production and then spent the night watching logs, you know why canary deployments matter. In this post I’ll show you how to let Flagger and Istio do the heavy lifting, turning a risky push into a painless, automated rollout.
TL;DR – Install Istio 1.18, Flagger 1.33, configure a
CanaryCRD, and let the controller shift traffic based on Prometheus metrics.
| Feature | Flagger | Istio |
|---|---|---|
| Automated traffic shifting | ✅ | ✅ (via VirtualService) |
| Metric‑driven analysis | ✅ (Prometheus, Datadog, etc.) | ✅ (Telemetry) |
| Rollback on failure | ✅ | ✅ |
| Multi‑cluster support | ✅ | ✅ |
Together they give you progressive delivery without writing custom scripts. Flagger watches your Helm chart (or Kustomize) releases, queries Prometheus for the SLO you define, and tells Istio how much traffic each version should receive. If the canary misbehaves, Flagger rolls it back automatically.
| Item | Version |
|---|---|
| Kubernetes cluster | 1.27+ |
kubectl | 1.27+ |
| Istio | 1.18.0 (or later) |
| Flagger | 1.33.0 (or later) |
| Helm | 3.12.0 (optional, for chart install) |
| Prometheus | 2.44 (already bundled with Istio) |
You need cluster admin rights to install Istio and Flagger. I run everything on a GKE standard cluster (3‑node, n1‑standard‑4) but any conformant K8s works.
istioctl CLIBash# Download matching version curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.18.0 sh - cd istio-1.18.0 # Install the default profile (includes Prometheus) istioctl install --set profile=default -y # Verify kubectl get pods -n istio-system
The default profile gives us Envoy sidecars, Prometheus, and Grafana out of the box. If you already have Istio, just make sure the istio-ingressgateway and istio-pilot pods are running.
Flagger runs as a single deployment that watches your canary resources. Install it with Helm:
Bashhelm repo add flagger https://flagger.app helm repo update helm upgrade -i flagger flagger/flagger \ --namespace=istio-system \ --set meshProvider=istio \ --set metricsServer=http://prometheus.istio-system:9090 \ --set version=v1.33.0
Note: The
meshProvider=istioflag tells Flagger to generate IstioVirtualServiceobjects instead of NGINX or Contour configs.
Check the pods:
Bashkubectl get pods -n istio-system -l app=flagger
You should see one flagger pod in Running state.
Let’s use a simple sockshop demo that ships a frontend service. Deploy it with Helm:
Bashhelm repo add flagger-demo https://flagger.app helm upgrade -i sockshop flagger-demo/sockshop \ --namespace=default \ --set image.tag=v0.1.0
Istio automatically injects sidecars because the namespace has the istio-injection=enabled label (the demo chart sets it). Verify:
Bashkubectl get pods -l app=frontend -o=jsonpath='{.items[*].metadata.labels}'
You’ll see istio-injection=enabled attached.
The heart of the automation lives in a Canary custom resource. Here’s a minimal example that rolls out a new frontend image version and validates the 99th‑percentile latency stays under 250 ms.
YAMLapiVersion: flagger.app/v1beta1 kind: Canary metadata: name: frontend namespace: default spec: targetRef: apiVersion: apps/v1 kind: Deployment name: frontend service: port: 80 targetPort: 8080 analysis: interval: 30s threshold: 5 iterations: 10 metrics: - name: request-success-rate threshold: 99 query: | sum(rate(istio_requests_total{destination_workload="frontend",response_code=~"2.*"}[1m])) / sum(rate(istio_requests_total{destination_workload="frontend"}[1m])) * 100 - name: request-duration-p99 threshold: 250 query: | histogram_quantile(0.99, sum(rate(istio_request_duration_seconds_bucket{destination_workload="frontend"}[1m])) by (le))
Save this as frontend-canary.yaml and apply:
Bashkubectl apply -f frontend-canary.yaml
What happens next?
frontend-primary (the stable version) and frontend-canary.frontend-primary.Update the image tag to simulate a new version:
Bashhelm upgrade sockshop flagger-demo/sockshop \ --namespace=default \ --set image.tag=v0.2.0
Flagger detects the change (it watches the frontend Deployment), spins up the canary pods, and starts the traffic shift automatically. You can watch the progress:
Bashkubectl get canary frontend -n default -w
You’ll see fields like Weight, Status, and Iterations. When the rollout finishes, the Weight becomes 100 and the Status reads Succeeded.
Istio ships a pre‑configured Grafana dashboard at http://<gateway>/grafana. Import the Flagger Canary dashboard (ID 12407) to see a live view of:
These visual cues help you convince stakeholders that the canary is behaving.
In a real‑world setup you’ll want your CI system (GitHub Actions, GitLab CI, Jenkins) to push the new image and then wait for Flagger to finish. Here’s a concise GitHub Actions snippet:
YAMLjobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up kubectl uses: azure/setup-kubectl@v3 with: version: v1.27 - name: Deploy image run: | helm upgrade sockshop ./chart \ --namespace=default \ --set image.tag=${{ github.sha }} - name: Wait for canary run: | while true; do STATUS=$(kubectl get canary frontend -n default -o jsonpath='{.status.phase}') if [[ "$STATUS" == "Succeeded" ]]; then break; fi if [[ "$STATUS" == "Failed" ]]; then exit 1; fi sleep 15 done
The loop polls the Canary resource until it reaches Succeeded or Failed. If it fails, the job exits with a non‑zero code, automatically failing the pipeline.
| Issue | What happened | Fix |
|---|---|---|
| High latency spikes during canary rollout | Prometheus query returned NaN because the destination_workload label changed after a Helm chart rename. | Pin the workload label with istio.io/workloadGroup annotation. |
| Rollback never triggered | Metric threshold was too lenient (threshold: 0 by mistake). | Double‑check the Canary manifest; Flagger logs a warning if a metric is mis‑configured. |
| Istio sidecar injection missing | Namespace lacked istio-injection=enabled. | Label the namespace before deploying the app: kubectl label namespace default istio-injection=enabled. |
| Prometheus scrape timeout | Large cluster caused query latency > 5 s, Flagger timed out. | Increase metricsServer.timeout in Flagger Helm values (--set metricsServer.timeout=10s). |
These quirks cost me a few hours, but documenting them saved the next team a lot of headaches.
Bashistioctl install -y && helm upgrade -i flagger flagger/flagger \ --set meshProvider=istio --set metricsServer=http://prometheus.istio-system:9090 && \ kubectl apply -f frontend-canary.yaml && \ helm upgrade sockshop flagger-demo/sockshop --set image.tag=v0.2.0
Run those commands, watch the Canary resource, and you’ve got automated canary deployments with Flagger and Istio.
MeshGateway for cross‑region progressive delivery.Progressive delivery isn’t a silver bullet, but with Flagger + Istio you get a battle‑tested, production‑ready automation layer for free. Give it a spin, break a few things intentionally, and you’ll quickly see the safety net in action.