MHRubel
HomeAboutProjectsSkillsExperienceBlogContact
MHRubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
Software EngineeringTechnologyJune 18, 20266 min read

Automating Canary Deployments with Flagger & Istio on Kubernetes

Learn how to automate canary deployments using Flagger and Istio. Follow a step‑by‑step guide with real‑world examples, CI/CD integration, and progressive delivery best practices.

FlaggerIstioCanary DeploymentsKubernetesCI/CDProgressive DeliveryPrometheusDevOpsLinuxServer

Automating Canary Deployments with Flagger & Istio on Kubernetes

If you’ve ever rolled a new version to production and then spent the night watching logs, you know why canary deployments matter. In this post I’ll show you how to let Flagger and Istio do the heavy lifting, turning a risky push into a painless, automated rollout.

TL;DR – Install Istio 1.18, Flagger 1.33, configure a Canary CRD, and let the controller shift traffic based on Prometheus metrics.


Why Flagger + Istio?

FeatureFlaggerIstio
Automated traffic shifting✅✅ (via VirtualService)
Metric‑driven analysis✅ (Prometheus, Datadog, etc.)✅ (Telemetry)
Rollback on failure✅✅
Multi‑cluster support✅✅

Together they give you progressive delivery without writing custom scripts. Flagger watches your Helm chart (or Kustomize) releases, queries Prometheus for the SLO you define, and tells Istio how much traffic each version should receive. If the canary misbehaves, Flagger rolls it back automatically.


Prerequisites

ItemVersion
Kubernetes cluster1.27+
kubectl1.27+
Istio1.18.0 (or later)
Flagger1.33.0 (or later)
Helm3.12.0 (optional, for chart install)
Prometheus2.44 (already bundled with Istio)

You need cluster admin rights to install Istio and Flagger. I run everything on a GKE standard cluster (3‑node, n1‑standard‑4) but any conformant K8s works.


Step 1 – Install Istio with the istioctl CLI

Bash
# Download matching version
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.18.0 sh -
cd istio-1.18.0

# Install the default profile (includes Prometheus)
istioctl install --set profile=default -y

# Verify
kubectl get pods -n istio-system

The default profile gives us Envoy sidecars, Prometheus, and Grafana out of the box. If you already have Istio, just make sure the istio-ingressgateway and istio-pilot pods are running.


Step 2 – Deploy Flagger

Flagger runs as a single deployment that watches your canary resources. Install it with Helm:

Bash
helm repo add flagger https://flagger.app
helm repo update

helm upgrade -i flagger flagger/flagger \
  --namespace=istio-system \
  --set meshProvider=istio \
  --set metricsServer=http://prometheus.istio-system:9090 \
  --set version=v1.33.0

Note: The meshProvider=istio flag tells Flagger to generate Istio VirtualService objects instead of NGINX or Contour configs.

Check the pods:

Bash
kubectl get pods -n istio-system -l app=flagger

You should see one flagger pod in Running state.


Step 3 – Create a Sample Application

Let’s use a simple sockshop demo that ships a frontend service. Deploy it with Helm:

Bash
helm repo add flagger-demo https://flagger.app
helm upgrade -i sockshop flagger-demo/sockshop \
  --namespace=default \
  --set image.tag=v0.1.0

Istio automatically injects sidecars because the namespace has the istio-injection=enabled label (the demo chart sets it). Verify:

Bash
kubectl get pods -l app=frontend -o=jsonpath='{.items[*].metadata.labels}'

You’ll see istio-injection=enabled attached.


Step 4 – Define a Canary CRD

The heart of the automation lives in a Canary custom resource. Here’s a minimal example that rolls out a new frontend image version and validates the 99th‑percentile latency stays under 250 ms.

YAML
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: frontend
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  service:
    port: 80
    targetPort: 8080
  analysis:
    interval: 30s
    threshold: 5
    iterations: 10
    metrics:
      - name: request-success-rate
        threshold: 99
        query: |
          sum(rate(istio_requests_total{destination_workload="frontend",response_code=~"2.*"}[1m]))
          /
          sum(rate(istio_requests_total{destination_workload="frontend"}[1m])) * 100
      - name: request-duration-p99
        threshold: 250
        query: |
          histogram_quantile(0.99,
            sum(rate(istio_request_duration_seconds_bucket{destination_workload="frontend"}[1m]))
            by (le))

Save this as frontend-canary.yaml and apply:

Bash
kubectl apply -f frontend-canary.yaml

What happens next?

  1. Flagger creates a new Deployment named frontend-primary (the stable version) and frontend-canary.
  2. Istio VirtualService gets a split: 100 % to frontend-primary.
  3. After the first analysis interval (30 s), Flagger queries Prometheus for the two metrics above.
  4. If both metrics meet the thresholds for 10 iterations (≈5 min), Flagger bumps the traffic to the canary by 10 % each iteration.
  5. If any iteration fails, Flagger rolls back the canary and restores 100 % traffic to the primary.

Step 5 – Trigger a New Release

Update the image tag to simulate a new version:

Bash
helm upgrade sockshop flagger-demo/sockshop \
  --namespace=default \
  --set image.tag=v0.2.0

Flagger detects the change (it watches the frontend Deployment), spins up the canary pods, and starts the traffic shift automatically. You can watch the progress:

Bash
kubectl get canary frontend -n default -w

You’ll see fields like Weight, Status, and Iterations. When the rollout finishes, the Weight becomes 100 and the Status reads Succeeded.


Step 6 – Observe Metrics in Grafana

Istio ships a pre‑configured Grafana dashboard at http://<gateway>/grafana. Import the Flagger Canary dashboard (ID 12407) to see a live view of:

  • Traffic split percentages
  • Request success rate
  • P99 latency

These visual cues help you convince stakeholders that the canary is behaving.


Step 7 – Hook Into Your CI/CD Pipeline

In a real‑world setup you’ll want your CI system (GitHub Actions, GitLab CI, Jenkins) to push the new image and then wait for Flagger to finish. Here’s a concise GitHub Actions snippet:

YAML
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up kubectl
        uses: azure/setup-kubectl@v3
        with:
          version: v1.27
      - name: Deploy image
        run: |
          helm upgrade sockshop ./chart \
            --namespace=default \
            --set image.tag=${{ github.sha }}
      - name: Wait for canary
        run: |
          while true; do
            STATUS=$(kubectl get canary frontend -n default -o jsonpath='{.status.phase}')
            if [[ "$STATUS" == "Succeeded" ]]; then break; fi
            if [[ "$STATUS" == "Failed" ]]; then exit 1; fi
            sleep 15
          done

The loop polls the Canary resource until it reaches Succeeded or Failed. If it fails, the job exits with a non‑zero code, automatically failing the pipeline.


Real‑World Lessons Learned

IssueWhat happenedFix
High latency spikes during canary rolloutPrometheus query returned NaN because the destination_workload label changed after a Helm chart rename.Pin the workload label with istio.io/workloadGroup annotation.
Rollback never triggeredMetric threshold was too lenient (threshold: 0 by mistake).Double‑check the Canary manifest; Flagger logs a warning if a metric is mis‑configured.
Istio sidecar injection missingNamespace lacked istio-injection=enabled.Label the namespace before deploying the app: kubectl label namespace default istio-injection=enabled.
Prometheus scrape timeoutLarge cluster caused query latency > 5 s, Flagger timed out.Increase metricsServer.timeout in Flagger Helm values (--set metricsServer.timeout=10s).

These quirks cost me a few hours, but documenting them saved the next team a lot of headaches.


TL;DR – One‑Liner Recap

Bash
istioctl install -y && helm upgrade -i flagger flagger/flagger \
  --set meshProvider=istio --set metricsServer=http://prometheus.istio-system:9090 && \
kubectl apply -f frontend-canary.yaml && \
helm upgrade sockshop flagger-demo/sockshop --set image.tag=v0.2.0

Run those commands, watch the Canary resource, and you’ve got automated canary deployments with Flagger and Istio.


Next Steps

  1. Add more metrics – e.g., error rate, CPU usage.
  2. Enable A/B testing – use multiple canary resources pointing to different versions.
  3. Scale to multi‑cluster – Flagger works with Istio’s MeshGateway for cross‑region progressive delivery.

Progressive delivery isn’t a silver bullet, but with Flagger + Istio you get a battle‑tested, production‑ready automation layer for free. Give it a spin, break a few things intentionally, and you’ll quickly see the safety net in action.

Back to Blog