Master Infrastructure as Code drift detection with Terraform and Driftctl. Learn how to automate remediation and keep your cloud environment synchronized.
If you’ve spent any time in production, you know the scenario. You define your infrastructure with Terraform, run terraform apply, and everything is perfect. Six months later, a developer manually tweaks a security group or an S3 bucket policy via the AWS Console because "it was an emergency."
Suddenly, your code is a lie. Your state file doesn't match reality. This is Infrastructure as Code drift, and if you aren't actively monitoring it, you're just waiting for a disaster during your next deployment.
Terraform is great at managing state, but it isn't a continuous monitoring tool. When you run terraform plan, it compares your configuration against the state file, not necessarily the live environment in all its messy glory. While terraform refresh helps, it doesn't alert you when someone makes an unauthorized change in the middle of the night.
That’s where Driftctl comes in. It’s an open-source tool that scans your cloud provider, compares the actual resources against your Terraform state, and tells you exactly what has drifted.
I’ve been using Driftctl (v0.38.0) to keep my AWS environments clean. It’s fast, and the output is readable. Here is how I set it up in a CI/CD pipeline to ensure drift never goes unnoticed.
First, grab the binary for your environment. On macOS or Linux, it’s a simple:
Bashcurl -L https://driftctl.com/install | sh
To detect drift, you need to point Driftctl at your Terraform state file. If you’re using remote state (which you should be), download it or point to the S3 bucket:
Bashdriftctl scan --from tfstate://terraform.tfstate
The output will categorize resources into three buckets: Synced, Managed (but drifted), and Unmanaged (resources created manually that aren't in your code).
Detection is only half the battle. If you want to automate the remediation, you have a few options. I prefer a "Notify and Reconcile" approach.
I add a step in my GitHub Actions pipeline that runs a scan on every push. If drift is detected, the build fails and sends an alert to our Slack channel.
YAMLjobs: drift-detection: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Scan for Drift run: | driftctl scan --from tfstate://prod.tfstate --quiet --exit-code 1
Using the --exit-code 1 flag is crucial here. It forces the pipeline to fail if any drift is found, making it impossible to ignore.
Once alerted, you have two choices:
terraform apply to overwrite the manual changes and force the infrastructure back to the desired state.After running this in production for over a year, I’ve learned a few things:
.driftignore file. Not every resource needs to be managed by Terraform. Driftctl allows you to filter out legacy resources that you don't intend to import.terraform apply triggered by a drift scan. If someone manually deleted a database, an automated apply might try to recreate it, leading to data loss. Always alert, then verify, then apply.Infrastructure as Code is only as good as your ability to enforce it. By integrating Driftctl into your workflow, you move from "hoping" your environment matches your code to "knowing" it does.
Stop letting manual changes accumulate. Set up your scan, catch the drift early, and keep your production environment predictable. It's the only way to sleep soundly on a Friday afternoon.
Mastering Infrastructure as Code requires more than just Terraform. Learn how to use Terragrunt to simplify multi-cloud management and automate your DevOps workflows.
Read moreMaster Kubernetes Cluster API for automated node upgrades. Learn how to leverage MachineHealthCheck for reliable, hands-off node lifecycle management today.