Drift Detection using Terraform
Tools used:
Terraform
GitHub Actions
Slack webhook
Repo link: Link
Project Architecture

Open the image in a new tab to see it clearly.
Workflow
We have a github repo with a dev branch.
The infrastructure team then push the code to the github repo via a pull request.
We have two triggers: 1) Manual trigger, 2) Cron trigger (also known as cron expression) will trigger based on schedule or manual.
Next it will checkout the code. Based on the branch will determine the environment.
Then we will execute the infrastructure using Terraform.
Drift detection: Compare the plan and github code to determine any drift in the infrastructure.
Based on the decision gateway we will apply the changes again to the environment.
If drift detected, the auto fix applied based on the logic above. Followed by sending a slack message about the update.
Close the github issue.
If no drift occured, we will provide a report.
There are two backend files we created for execution of both dev and prod branch.
dev branch:
bucket = "day30-drift-detection-amals-dev"
key = "dev/terraform.tfstate"
region = "us-east-1"
use_lockfile = true
encrypt = true
prod branch:
bucket = "day30-drift-detection-amals-prod"
key = "prod/terraform.tfstate"
region = "us-east-1"
use_lockfile = true
encrypt = true
The given buckets need to be present before we trigger the github actions.
GitHub Actions Workflow
1. Trigger Strategy
The workflow is triggered by:
Pull Requests to the
mainordevbranches (runs the Plan job).Pushes (merges) to the
mainordevbranches (runs both Plan and Apply jobs).
2. Environment Management
It dynamically switches between environments based on the branch name:
main Branch: Deploys to the prod (Production) environment.
dev Branch: Deploys to the dev (Development) environment.
It uses environment-specific backend configurations:
backend-prod.hcland
backend-dev.hcl.
3. Job Workflow
The pipeline is split into two main stages:
Stage A: Terraform Plan
Validation: Runs terraform fmt and terraform validate to ensure code quality.
Visibility: If triggered by a Pull Request, it automatically comments the Terraform Plan directly onto the PR. This allows team members to review infrastructure changes before they are merged.
Artifacts: It saves the execution plan (tfplan) as a GitHub artifact to ensure that the exact same plan is used in the Apply stage.
Stage B: Terraform Apply
Strict Condition: This job only runs on a push/merge to the
mainordevbranches. It will not run on pull requests.Execution: It downloads the plan artifact from the first stage and runs
terraform apply.Summary: After completion, it posts a summary of the deployed infrastructure and Terraform outputs to the GitHub Actions "Summary" page.
4. Security & Configuration
AWS Integration: Uses aws-actions/configure-aws-credentials with secrets (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY).Terraform Version: Hardcoded to 1.10.3 for consistency across the team.
Permissions: Specifically requests
pull-requests: writeandissues: writepermissions to allow the bot to comment on PRs.
Before moving forward, making sure:
Bucket creation rules and setup are correct.
Added AWS Secrets to GitHub.
Added slack webhook to GitHub. Create one at api.slack.com/apps.
Created
devandprodEnvironments in GitHub. Withprodwith a manual approver.Enabled "Read and Write" workflow permissions in GitHub.
The drift detection workflow is given in the github actions file. Checkout the file named terraform.yml to understand the workflow on how we manage the drift detection based on the workflow.
In the drift detection workflow, we are creating two triggers as we said, one is a cron and the other is a manual trigger.
on:
schedule:
- cron: "*/1 * * * *" # Runs every minute (Not the ideal production deployment just for workflow)
workflow_dispatch: # Allows manual triggering
I have given a overview of the original drift detection code here, you can checkout the drift_detection.yml file and find the logic.
jobs:
// previous steps
...
steps:
- name: Checkout Repository
uses: actions/checkout@v4
- name: Determine Environment
id: env-vars
run: |
...// execution steps
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
...// credentials setup
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.10.3
- name: Terraform Init
run: terraform init -reconfigure -backend-config="backend-${{ env.ENVIRONMENT }}.hcl"
- name: Terraform Plan (Drift Detection)
id: plan
run: |
set +e
terraform plan -detailed-exitcode -no-color > plan_output.txt 2>&1
EXIT_CODE=$?
echo "exitcode=$EXIT_CODE" >> $GITHUB_OUTPUT
cat plan_output.txt
exit 0
- name: Analyze Drift
if: steps.plan.outputs.exitcode == '2'
uses: actions/github-script@v6
env:
PLAN_OUTPUT: plan_output.txt
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
.... # GitHub issue creation logic
- name: Auto-Fix Drift
if: steps.plan.outputs.exitcode == '2'
id: apply
run: |
echo "Applying Terraform changes to fix drift..."
terraform apply -auto-approve -no-color > apply_output.txt 2>&1
continue-on-error: true
- name: Notify Slack - Drift Detected & Fixed
if: steps.plan.outputs.exitcode == '2' && steps.apply.outcome == 'success'
run: |
.... # Slack notification
- name: Notify Slack - Auto-Fix Failed
if: steps.plan.outputs.exitcode == '2' && steps.apply.outcome == 'failure'
run: |
.... # Slack notification
- name: Update Issue on Success
if: steps.plan.outputs.exitcode == '2' && steps.apply.outcome == 'success'
uses: actions/github-script@v6
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
.... # Issue update logic
- name: No Drift
if: steps.plan.outputs.exitcode == '0'
run: echo "No drift detected."
- name: Terraform Plan Failure
if: steps.plan.outputs.exitcode == '1'
run: exit 1
Testing the Drift
Changed the load balancer tag and pushed the code to dev branch.


This updated in our workflow:

Applying Drift
I changed the tag ManagedBy tag for application load balancer via AWS directly. The GitHub actions polls state files every one minute to check any changes. If changes are their, we would be able to see the drift.


Before drift:

After detection:

It reverted back the ManagedBy tag, with the value we given in github.
But,

the slack message didn’t work well as I thought. It have some issue with intervention. I guess when github actions run at the same time, their might be a situation where the state file locked for the other operation thus the another process can’t edit it. So I created a concurrency to our CI/CD.
It worked.

Now, I will merge the dev branch to the main branch.


Video Reference:
Arigato!