Software EngineeringTechnologyJune 19, 20263 min read

Terraform Infrastructure as Code Drift Detection and Remediation

Master Infrastructure as Code drift detection with Terraform and Driftctl. Learn how to automate remediation and keep your cloud environment synchronized.

TerraformInfrastructure as CodeDrift DetectionDevOpsCloud EngineeringDriftctlAWSAutomationLinuxServer

The Silent Killer: Infrastructure Drift

If you’ve spent any time in production, you know the scenario. You define your infrastructure with Terraform, run terraform apply, and everything is perfect. Six months later, a developer manually tweaks a security group or an S3 bucket policy via the AWS Console because "it was an emergency."

Suddenly, your code is a lie. Your state file doesn't match reality. This is Infrastructure as Code drift, and if you aren't actively monitoring it, you're just waiting for a disaster during your next deployment.

Why Terraform Isn't Enough

Terraform is great at managing state, but it isn't a continuous monitoring tool. When you run terraform plan, it compares your configuration against the state file, not necessarily the live environment in all its messy glory. While terraform refresh helps, it doesn't alert you when someone makes an unauthorized change in the middle of the night.

That’s where Driftctl comes in. It’s an open-source tool that scans your cloud provider, compares the actual resources against your Terraform state, and tells you exactly what has drifted.

Setting Up Driftctl for Automated Drift Detection

I’ve been using Driftctl (v0.38.0) to keep my AWS environments clean. It’s fast, and the output is readable. Here is how I set it up in a CI/CD pipeline to ensure drift never goes unnoticed.

1. Installation

First, grab the binary for your environment. On macOS or Linux, it’s a simple:


Bash
curl -L https://driftctl.com/install | sh

2. Running a Scan

To detect drift, you need to point Driftctl at your Terraform state file. If you’re using remote state (which you should be), download it or point to the S3 bucket:


Bash
driftctl scan --from tfstate://terraform.tfstate

The output will categorize resources into three buckets: Synced, Managed (but drifted), and Unmanaged (resources created manually that aren't in your code).

Automating Remediation

Detection is only half the battle. If you want to automate the remediation, you have a few options. I prefer a "Notify and Reconcile" approach.

The CI/CD Pipeline Integration

I add a step in my GitHub Actions pipeline that runs a scan on every push. If drift is detected, the build fails and sends an alert to our Slack channel.


YAML
jobs:
  drift-detection:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Scan for Drift
        run: |
          driftctl scan --from tfstate://prod.tfstate --quiet --exit-code 1

Using the --exit-code 1 flag is crucial here. It forces the pipeline to fail if any drift is found, making it impossible to ignore.

Remediation Strategies

Once alerted, you have two choices:

Reconcile the Code: If the manual change was actually a good idea, update your Terraform files to match the new configuration. This keeps your Infrastructure as Code as the single source of truth.
Reconcile the Infrastructure: If the change was unauthorized, run a terraform apply to overwrite the manual changes and force the infrastructure back to the desired state.

Hard-Won Lessons in DevOps Automation

After running this in production for over a year, I’ve learned a few things:

Ignore the noise: Use a .driftignore file. Not every resource needs to be managed by Terraform. Driftctl allows you to filter out legacy resources that you don't intend to import.
Don't auto-remediate blindly: I strongly advise against running an automated terraform apply triggered by a drift scan. If someone manually deleted a database, an automated apply might try to recreate it, leading to data loss. Always alert, then verify, then apply.
Permissions matter: Driftctl needs read-only access to your entire cloud environment to effectively compare resources. Ensure your IAM roles are scoped correctly but broad enough to see everything.

Conclusion

Infrastructure as Code is only as good as your ability to enforce it. By integrating Driftctl into your workflow, you move from "hoping" your environment matches your code to "knowing" it does.

Stop letting manual changes accumulate. Set up your scan, catch the drift early, and keep your production environment predictable. It's the only way to sleep soundly on a Friday afternoon.

Back to Blog