MHRubel
HomeAboutProjectsSkillsExperienceBlogContact
MHRubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
TechnologySoftware EngineeringJune 19, 20263 min read

Kubernetes Disaster Recovery: Velero and Restic Implementation Guide

Master Kubernetes disaster recovery by using Velero and Restic. Learn how to back up persistent volumes and perform cross-cluster restoration like a pro.

KubernetesDevOpsVeleroResticDisaster RecoveryCloud NativeInfrastructure as CodeLinuxServer

Why Your Kubernetes Backup Strategy Probably Fails

I’ve seen too many teams treat their Kubernetes clusters like cattle until the day a namespace gets nuked or a regional outage hits. If you're relying solely on snapshots from your cloud provider, you're missing half the picture. You need an application-aware solution that understands your YAML manifests, your persistent volumes (PVs), and your secret configurations.

That’s where Kubernetes disaster recovery comes in. In this guide, I’ll show you how to use Velero combined with Restic to move data between clusters.

The Stack: Velero and Restic

Velero is the gold standard for cluster-level backups. By default, it uses cloud-native snapshots (like AWS EBS snapshots). However, if your volumes aren't natively supported or you need file-level backups for cross-cluster portability, you need Restic.

  • Velero (v1.12+): Manages the backup lifecycle, API resource collection, and scheduling.
  • Restic: Handles the actual data transfer by backing up files within the PVs, making it cloud-agnostic and perfect for moving data between different storage backends.

Prerequisites

Before we dive into the code, ensure you have:

  1. Two Kubernetes clusters (Source and Destination).
  2. An S3-compatible object storage bucket (AWS S3, MinIO, or GCP GCS).
  3. velero CLI installed locally.

Step 1: Installing Velero with Restic

You need to enable the Restic integration during the initial installation. Run this command on your source cluster:

Bash
velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.8.0 \
    --bucket my-backups-bucket \
    --backup-location-config region=us-east-1 \
    --snapshot-location-config region=us-east-1 \
    --use-restic

The --use-restic flag is critical. It deploys the Restic daemonset, allowing Velero to mount your volumes, copy the data, and push it to your object storage.

Step 2: Annotating Your Workloads

Velero doesn't automatically back up every persistent volume to avoid bloating your storage. You must tell it which pods to monitor. For a deployment named database-app using a volume called data-vol, annotate the pod template:

YAML
metadata:
  annotations:
    backup.velero.io/backup-volumes: data-vol

Once you apply this, Velero sees the annotation and knows to trigger a Restic backup for that specific volume.

Step 3: Triggering the Backup

Now, let's create a backup that includes your resources and the persistent data.

Bash
velero backup create prod-backup --include-namespaces production --snapshot-volumes=true

Wait for the backup to complete by running velero backup describe prod-backup. Ensure the phase says Completed. If you see PartiallyFailed, check the logs using velero backup logs prod-backup.

Step 4: Cross-Cluster Restoration

This is the moment of truth. To move this data to your destination cluster:

  1. Install Velero on the destination cluster using the exact same bucket and credentials.

  2. Sync the backup metadata:

    Bash
    velero backup get

    If the backup isn't listed, run velero backup download or ensure your S3 bucket permissions are correct.

  3. Restore the data:

    Bash
    velero restore create --from-backup prod-backup

Velero will recreate the PVCs, wait for them to bind, and then Restic will pull the files from your bucket into the new volumes.

Lessons from the Trenches

I’ve learned a few things the hard way while managing Kubernetes disaster recovery for production environments:

  • Restic performance: Restic is CPU intensive. If you’re backing up terabytes of data, your pod might get throttled. Monitor your resource limits.
  • Namespace mismatches: If you're restoring to a different namespace, use the --namespace-mappings flag. It’s a lifesaver when you need to spin up a "staging" environment from a "production" backup.
  • Test your restores: A backup is just a file you haven't proven you can use yet. I run a weekly automated restore test into an isolated "test" namespace. It catches configuration drift before a real disaster strikes.

Final Thoughts

Implementing Velero backup strategies isn't just about insurance; it's about operational confidence. When you know you can migrate your entire stateful stack to a new cluster in under 30 minutes, you stop fearing the "delete" key.

Start small, annotate your volumes, and keep your Restic repositories healthy. Your future self will thank you when the outage hits at 3 AM.

Back to Blog

Similar Posts

TechnologyJune 19, 20263 min read

Kubernetes Cluster API: Automating Node Upgrades with CAPI

Master Kubernetes Cluster API for automated node upgrades. Learn how to leverage MachineHealthCheck for reliable, hands-off node lifecycle management today.

Read more
Software EngineeringJune 19, 20263 min read

Kubernetes Secret Management: Using External Secrets and HashiCorp Vault

Master Kubernetes Secret Management by syncing HashiCorp Vault with External Secrets Operator. Learn how to automate secure, GitOps-friendly secret injection.

Read more
Software EngineeringJune 19, 20263 min read

Kubernetes Cost Monitoring: A Guide to Kubecost and FinOps

Master Kubernetes cost monitoring with Kubecost. Learn how to implement granular resource allocation and drive FinOps practices to optimize your cloud spend.

Read more