Master Kubernetes disaster recovery by using Velero and Restic. Learn how to back up persistent volumes and perform cross-cluster restoration like a pro.
I’ve seen too many teams treat their Kubernetes clusters like cattle until the day a namespace gets nuked or a regional outage hits. If you're relying solely on snapshots from your cloud provider, you're missing half the picture. You need an application-aware solution that understands your YAML manifests, your persistent volumes (PVs), and your secret configurations.
That’s where Kubernetes disaster recovery comes in. In this guide, I’ll show you how to use Velero combined with Restic to move data between clusters.
Velero is the gold standard for cluster-level backups. By default, it uses cloud-native snapshots (like AWS EBS snapshots). However, if your volumes aren't natively supported or you need file-level backups for cross-cluster portability, you need Restic.
Before we dive into the code, ensure you have:
velero CLI installed locally.You need to enable the Restic integration during the initial installation. Run this command on your source cluster:
Bashvelero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.8.0 \ --bucket my-backups-bucket \ --backup-location-config region=us-east-1 \ --snapshot-location-config region=us-east-1 \ --use-restic
The --use-restic flag is critical. It deploys the Restic daemonset, allowing Velero to mount your volumes, copy the data, and push it to your object storage.
Velero doesn't automatically back up every persistent volume to avoid bloating your storage. You must tell it which pods to monitor. For a deployment named database-app using a volume called data-vol, annotate the pod template:
YAMLmetadata: annotations: backup.velero.io/backup-volumes: data-vol
Once you apply this, Velero sees the annotation and knows to trigger a Restic backup for that specific volume.
Now, let's create a backup that includes your resources and the persistent data.
Bashvelero backup create prod-backup --include-namespaces production --snapshot-volumes=true
Wait for the backup to complete by running velero backup describe prod-backup. Ensure the phase says Completed. If you see PartiallyFailed, check the logs using velero backup logs prod-backup.
This is the moment of truth. To move this data to your destination cluster:
Install Velero on the destination cluster using the exact same bucket and credentials.
Sync the backup metadata:
Bashvelero backup get
If the backup isn't listed, run velero backup download or ensure your S3 bucket permissions are correct.
Restore the data:
Bashvelero restore create --from-backup prod-backup
Velero will recreate the PVCs, wait for them to bind, and then Restic will pull the files from your bucket into the new volumes.
I’ve learned a few things the hard way while managing Kubernetes disaster recovery for production environments:
--namespace-mappings flag. It’s a lifesaver when you need to spin up a "staging" environment from a "production" backup.Implementing Velero backup strategies isn't just about insurance; it's about operational confidence. When you know you can migrate your entire stateful stack to a new cluster in under 30 minutes, you stop fearing the "delete" key.
Start small, annotate your volumes, and keep your Restic repositories healthy. Your future self will thank you when the outage hits at 3 AM.
Master Kubernetes Cluster API for automated node upgrades. Learn how to leverage MachineHealthCheck for reliable, hands-off node lifecycle management today.
Read moreMaster Kubernetes Secret Management by syncing HashiCorp Vault with External Secrets Operator. Learn how to automate secure, GitOps-friendly secret injection.