Imagine opening a pull request and seeing your entire infrastructure change laid out in the
comments — every resource to be created, modified, or destroyed — before anyone clicks merge. A
teammate approves the PR, it merges to main, and the infrastructure deploys itself. No one SSHed
into a jump box. No one ran terraform apply from their laptop at 4:55 PM on a Friday.
That’s the dream, and GitHub Actions makes it achievable without a dedicated platform team or an expensive CI/CD product. I’ve been running this pattern across Terraform and Ansible workloads for a while now, and the results speak for themselves: fewer misconfigurations, faster reviews, and engineers who actually trust the deployment process.
Let’s build it.
The Core Pattern: Plan in the PR, Apply on Merge
The fundamental workflow for Terraform CI/CD in GitHub Actions follows a two-phase pattern:
- On pull request — run
terraform plan, post the output as a PR comment - On merge to main — run
terraform applyagainst the same code
This gives reviewers visibility into exactly what will change before they approve, and it ensures that only reviewed, approved changes reach your infrastructure. No surprises.
For Ansible, the pattern is similar in spirit but different in mechanics:
- On pull request — lint, syntax-check, and run Molecule tests
- On merge to main — execute the playbook against target environments
| Tool | PR Phase | Merge Phase |
|---|---|---|
| Terraform | plan + PR comment | apply |
| Ansible | ansible-lint + molecule test | ansible-playbook against target |
OIDC Authentication: No More Stored Secrets
Before we write any workflows, let’s talk authentication. If you’re still storing cloud credentials as long-lived GitHub secrets, stop. GitHub Actions supports OpenID Connect (OIDC) federation, which means your workflow gets a short-lived token from GitHub’s OIDC provider, presents it to your cloud (Azure, AWS, GCP), and receives temporary credentials in return.
No secrets to rotate. No credentials to leak. Nothing stored in your repository.
For Azure, the setup looks like this:
- Create or use an existing App Registration (Service Principal) in Entra ID
- Add a Federated Identity Credential that trusts tokens from your GitHub repo
- Scope the trust to specific branches or environments (e.g., only
main, only theproductionenvironment) - Set
id-token: writein your workflow permissions
The Terraform provider picks this up automatically with ARM_USE_OIDC=true. Your workflow
authenticates without a single secret stored in GitHub.
Complete Terraform CI/CD Workflow
Here’s a complete, production-ready workflow. It plans on PRs, posts the output as a comment, and applies on merge — with concurrency guards and OIDC auth baked in.
name: "Terraform CI/CD"
on:
pull_request:
branches: [main]
paths: ["infra/terraform/**"]
push:
branches: [main]
paths: ["infra/terraform/**"]
permissions:
contents: read
pull-requests: write
id-token: write # Required for OIDC
concurrency:
group: terraform-${{ github.ref }}
cancel-in-progress: false # Never cancel an in-progress apply
env:
TF_DIR: infra/terraform
ARM_USE_OIDC: true
ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
jobs:
plan:
name: "Terraform Plan"
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.9.x"
- name: Azure Login (OIDC)
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Terraform Init
working-directory: ${{ env.TF_DIR }}
run: terraform init -input=false
- name: Terraform Validate
working-directory: ${{ env.TF_DIR }}
run: terraform validate
- name: Terraform Plan
id: plan
working-directory: ${{ env.TF_DIR }}
run: terraform plan -no-color -input=false -out=tfplan
continue-on-error: true
- name: Post Plan to PR
uses: borchero/terraform-plan-comment@v2
with:
token: ${{ github.token }}
planfile: ${{ env.TF_DIR }}/tfplan
working-directory: ${{ env.TF_DIR }}
- name: Fail on Plan Error
if: steps.plan.outcome == 'failure'
run: exit 1
apply:
name: "Terraform Apply"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production # Requires approval
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.9.x"
- name: Azure Login (OIDC)
uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Terraform Init
working-directory: ${{ env.TF_DIR }}
run: terraform init -input=false
- name: Terraform Apply
working-directory: ${{ env.TF_DIR }}
run: terraform apply -auto-approve -input=false -lock-timeout=5m
A few things worth calling out:
concurrencygroup prevents two applies from running simultaneously. Thecancel-in-progress: falsesetting ensures a running apply is never killed mid-execution — that’s how you end up with half-deployed infrastructure and a corrupted state file.-lock-timeout=5mon the apply step tells Terraform to wait up to five minutes for the state lock instead of failing immediately. This is your second layer of protection if concurrency controls somehow overlap.environment: productionon the apply job ties it to a GitHub environment, which you can configure with required reviewers, wait timers, and branch restrictions.
Complete Ansible CI/CD Workflow
Ansible pipelines need a different approach. There’s no plan equivalent, so we lean harder on
linting, syntax validation, and Molecule testing before anything touches a real host.
name: "Ansible CI/CD"
on:
pull_request:
branches: [main]
paths: ["infra/ansible/**"]
push:
branches: [main]
paths: ["infra/ansible/**"]
concurrency:
group: ansible-${{ github.ref }}
cancel-in-progress: false
jobs:
lint-and-test:
name: "Lint & Test"
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- name: Install Dependencies
run: |
pip install ansible ansible-lint molecule molecule-docker yamllint
- name: YAML Lint
run: yamllint -s infra/ansible/
- name: Ansible Lint
run: ansible-lint infra/ansible/
- name: Ansible Syntax Check
run: |
ansible-playbook infra/ansible/site.yml --syntax-check
- name: Molecule Test
working-directory: infra/ansible/roles/webserver
run: molecule test
env:
MOLECULE_DISTRO: ubuntu2204
deploy:
name: "Deploy to ${{ matrix.environment }}"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: ${{ matrix.environment }}
strategy:
max-parallel: 1
matrix:
environment: [staging, production]
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- name: Install Ansible
run: pip install ansible
- name: Write Vault Password
run: echo "${{ secrets.ANSIBLE_VAULT_PASSWORD }}" > .vault_pass
shell: bash
- name: Write SSH Key
run: |
echo "${{ secrets.SSH_PRIVATE_KEY }}" > deploy_key
chmod 600 deploy_key
- name: Run Playbook
run: |
ansible-playbook infra/ansible/site.yml \
-i infra/ansible/inventory/${{ matrix.environment }}.yml \
--vault-password-file .vault_pass \
--private-key deploy_key \
-e "target_env=${{ matrix.environment }}"
- name: Cleanup Secrets
if: always()
run: rm -f .vault_pass deploy_key
Key design decisions:
max-parallel: 1with matrix strategy deploys to staging first, then production. This gives you sequential promotion through environments without duplicating the entire job.- Vault password handling writes the password to a temporary file, uses it, and cleans up in an
always()step so it’s removed even if the playbook fails. - Environment-specific secrets live in GitHub environments. Staging and production each have their own SSH keys, vault passwords, and inventory files.
Environment Protection Rules: Your Safety Net
GitHub environments are the backbone of safe IaC deployments. Here’s how I configure them:
| Environment | Protection Rules | Purpose |
|---|---|---|
dev | None — auto-deploy | Fast feedback loop |
staging | Branch restriction (main only) | Only tested code reaches staging |
production | Required reviewers + 10-min wait timer | Human approval + cool-off period |
To set this up, go to Settings > Environments in your repo. Create each environment and
configure its protection rules. The production environment should require at least one reviewer
from your infrastructure team, and the wait timer gives you a window to cancel if someone spots an
issue after approval.
Each environment gets its own secrets too. Your staging Azure subscription ID is different from production, your Ansible inventory points to different hosts, and your vault passwords can be rotated independently. GitHub Actions only exposes environment secrets to jobs that reference that environment and pass all protection rules.
Multi-Environment Promotion: Dev to Staging to Prod
For organizations running multiple environments, here’s the promotion pattern I recommend:
# Simplified multi-environment Terraform workflow
jobs:
plan-dev:
if: github.event_name == 'pull_request'
uses: ./.github/workflows/terraform-reusable.yml
with:
environment: dev
tf_var_file: environments/dev.tfvars
apply-dev:
if: github.event_name == 'push'
needs: []
uses: ./.github/workflows/terraform-reusable.yml
with:
environment: dev
tf_var_file: environments/dev.tfvars
apply: true
apply-staging:
if: github.event_name == 'push'
needs: [apply-dev]
uses: ./.github/workflows/terraform-reusable.yml
with:
environment: staging
tf_var_file: environments/staging.tfvars
apply: true
apply-production:
if: github.event_name == 'push'
needs: [apply-staging]
uses: ./.github/workflows/terraform-reusable.yml
with:
environment: production
tf_var_file: environments/production.tfvars
apply: true
The needs chain creates a linear promotion: dev must succeed before staging starts, and staging
must succeed before production. Combined with environment protection rules, production won’t even
begin until someone manually approves it.
Use reusable workflows (uses: ./.github/workflows/...) to avoid duplicating your Terraform
init/plan/apply logic across every environment. One workflow, parameterized by environment.
Hands-On Lab: Combined Terraform + Ansible Pipeline
Let’s build a realistic pipeline where Terraform provisions infrastructure and Ansible configures it. This is the pattern I use in production: Terraform creates the VMs, and Ansible installs the software.
Step 1: Repository Structure
my-infra/
.github/workflows/
infra-cicd.yml
terraform/
main.tf
outputs.tf
dev.tfvars
ansible/
site.yml
inventory/
dev.yml
roles/
webserver/
Step 2: The Combined Workflow
name: "Infrastructure CI/CD"
on:
push:
branches: [main]
pull_request:
branches: [main]
permissions:
contents: read
pull-requests: write
id-token: write
concurrency:
group: infra-${{ github.ref }}
cancel-in-progress: false
jobs:
terraform-plan:
name: "Terraform Plan"
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Init & Plan
working-directory: terraform
run: |
terraform init -input=false
terraform plan -no-color -var-file=dev.tfvars -out=tfplan
ansible-lint:
name: "Ansible Lint & Syntax"
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install ansible ansible-lint yamllint
- run: yamllint ansible/
- run: ansible-lint ansible/
- run: ansible-playbook ansible/site.yml --syntax-check
terraform-apply:
name: "Provision Infrastructure"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: dev
outputs:
host_ip: ${{ steps.output.outputs.host_ip }}
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_wrapper: false
- name: Apply
working-directory: terraform
run: |
terraform init -input=false
terraform apply -auto-approve -var-file=dev.tfvars
- name: Get Outputs
id: output
working-directory: terraform
run: echo "host_ip=$(terraform output -raw vm_public_ip)" >> "$GITHUB_OUTPUT"
ansible-configure:
name: "Configure Infrastructure"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
needs: [terraform-apply]
runs-on: ubuntu-latest
environment: dev
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install ansible
- name: Write Dynamic Inventory
run: |
cat > ansible/inventory/dynamic.yml <<EOF
all:
hosts:
web:
ansible_host: ${{ needs.terraform-apply.outputs.host_ip }}
ansible_user: azureuser
EOF
- name: Write SSH Key
run: |
echo "${{ secrets.SSH_PRIVATE_KEY }}" > deploy_key
chmod 600 deploy_key
- name: Run Playbook
run: |
ansible-playbook ansible/site.yml \
-i ansible/inventory/dynamic.yml \
--private-key deploy_key
- name: Cleanup
if: always()
run: rm -f deploy_key
This is the real power of combining Terraform and Ansible in CI/CD: Terraform provisions the VM, passes its IP address as a job output, and Ansible picks it up to configure the host. One merge, two tools, fully automated.
Troubleshooting Guide
| Problem | Cause | Fix |
|---|---|---|
Error acquiring the state lock | Concurrent apply or crashed run left a stale lock | Add concurrency group to prevent parallel runs. Use -lock-timeout=5m. As a last resort, terraform force-unlock LOCK_ID |
| Plan comment not appearing on PR | Missing pull-requests: write permission | Add permissions: pull-requests: write at the workflow level |
| OIDC login fails with “No matching federated credential” | Subject claim mismatch | Verify the federated credential entity type matches (branch, environment, or PR) and the repo/org are correct |
| Ansible vault decryption fails | Vault password secret is empty or has trailing newline | Re-save the secret in GitHub without trailing whitespace. Use echo -n when creating it |
| Molecule tests pass locally but fail in CI | Missing Docker socket or different Python version | Use molecule-docker driver and pin your Python version in setup-python |
| Apply job runs but skips environment approval | Job doesn’t reference the environment key | Add environment: production to the job definition |
| Terraform plan shows changes but apply says “no changes” | Code changed between plan and apply | Enable branch protection: require branches to be up-to-date before merging |
State Locking: Your Double Safety Net
Terraform state locking deserves special attention in CI/CD. When you run terraform apply,
Terraform acquires a lock on the state file to prevent concurrent modifications. In a pipeline,
this matters a lot because merge commits can trigger overlapping runs.
The defense-in-depth approach:
- GitHub Actions concurrency groups — prevent two workflow runs from executing the same Terraform directory simultaneously
- Terraform state locking — the backend (Azure Storage, S3, etc.) provides an additional lock at the state file level
-lock-timeout=5m— if a lock is held, wait instead of failing immediately
If you hit a stale lock from a crashed pipeline run, check that the run actually failed (don’t
unlock a state that’s actively being applied). Then use terraform force-unlock <LOCK_ID> — but
only after confirming no apply is in progress.
What’s Next
Next post: Dependabot and Supply Chain Security. We’ll dig into automated dependency updates, how to configure Dependabot for Terraform providers and Ansible collections, and what supply chain attacks actually look like in the infrastructure world.
Happy automating!