Version Control for Infrastructure Code

Early in my career, our team managed infrastructure the way a lot of shops still do: a shared folder on a file server, a naming convention that nobody followed consistently, and a prayer that two people weren’t editing the same firewall rule at the same time. One Friday afternoon, a colleague overwrote my subnet changes with an older copy of the config. We didn’t discover it until Monday morning when a deployment failed and took a production service down with it. There was no history, no audit trail, and no way to figure out what the “right” version was without comparing files line by line.

That was the moment I stopped treating infrastructure files like documents and started treating them like code. If your Terraform modules, Ansible playbooks, and CI/CD pipelines aren’t in version control with proper workflows around them, you’re one bad overwrite away from the same kind of Monday morning I had.

This post covers everything you need to apply real Git workflows to your infrastructure code — repo structures, branching strategies, commit conventions, CI guards, secrets management, and state file hygiene. Whether you’re working with Terraform, Ansible, or both, these patterns will keep your infrastructure safe, traceable, and collaborative.

Why Version Control Matters More for Infrastructure

Application bugs are bad. Infrastructure bugs can be catastrophic. A misconfigured security group can expose an entire database to the internet. A deleted subnet can take down every service running in it. Version control gives you three things that are non-negotiable for infrastructure work:

Traceability — Who changed what, when, and why. When an auditor asks how a firewall rule got added, you point to a commit, not a Slack thread.
Collaboration — Every change goes through review before it touches real resources. No more “I’ll just push this quick fix” that breaks staging.
Rollback — When something goes wrong (and it will), you can restore a known-good state instead of scrambling to recreate it from memory.

If you’re already using Git for application code, the good news is that the same tool works perfectly for infrastructure. The workflows just need some adjustments.

Repo Structure: Monorepo vs Polyrepo

Before you write a single line of Terraform, decide how you’re going to organize your repositories. There are two common approaches, and each has tradeoffs.

Aspect	Monorepo	Polyrepo
Layout	All IaC (Terraform, Ansible, Helm) in one repo	Separate repos per tool or team
Dependency tracking	Easier — everything is co-located	Harder — cross-repo coordination needed
CI complexity	Higher — need path-based triggers	Lower — each repo has focused pipelines
Access control	Repo-level only (unless you use CODEOWNERS)	Fine-grained per repo
Best for	Small-to-medium teams, tightly coupled infra	Large orgs, multiple teams, distinct ownership

For most teams I’ve worked with, a monorepo with clear directory structure is the sweet spot. You get a single source of truth without the overhead of coordinating across a dozen repositories.

Here’s a directory layout I’ve used successfully on multiple projects:

infra/
  terraform/
    modules/
      networking/
      compute/
      storage/
    envs/
      dev/
        main.tf
        variables.tf
        backend.tf
      staging/
        main.tf
        variables.tf
        backend.tf
      prod/
        main.tf
        variables.tf
        backend.tf
  ansible/
    roles/
      webserver/
      database/
    inventories/
      dev/
      prod/
    playbooks/
      deploy.yml
      patch.yml
  .github/
    workflows/
      terraform-ci.yml
      ansible-lint.yml
  .gitignore
  README.md

The key is separating modules (reusable building blocks) from environments (where those modules get deployed). Each environment directory has its own backend configuration and variables, so you never accidentally apply dev changes to prod.

Branching Strategies for Infrastructure

I covered branching strategies in depth in a previous post, but infrastructure code has some specific considerations worth calling out.

Trunk-based development is my recommendation for most infrastructure teams. Keep main as your source of truth, use short-lived feature branches for changes, and merge back quickly. Here’s why it works well for IaC:

Infrastructure changes tend to be small and focused (add a resource, modify a rule, update a tag).
Long-lived branches create drift between what’s in Git and what’s actually deployed.
Short branches mean fewer merge conflicts on shared files like variables.tf.

Environment branches (separate dev, staging, prod branches) are tempting but create maintenance headaches. You end up cherry-picking changes between branches and losing track of which environment has which version. Instead, use the directory-based approach above — one branch, separate environment folders.

A typical workflow looks like this:

# Create a feature branch from main
git checkout -b feature/add-cdn-distribution

# Make your changes
vim infra/terraform/envs/prod/main.tf

# Commit with a conventional message
git commit -am "feat(cdn): add CloudFront distribution for static assets"

# Push and open a PR
git push origin feature/add-cdn-distribution

The PR triggers CI checks (more on that below), a teammate reviews the plan output, and you merge to main. Clean and predictable.

Conventional Commits for Infrastructure

Consistent commit messages are more than a nice-to-have for infrastructure code. They make git log actually useful when you’re troubleshooting at 2 AM, and they enable automated changelogs and release notes.

I use the Conventional Commits specification adapted for infrastructure work:

Prefix	When to use it	Example
`feat`	New resource or capability	`feat(network): add bastion host for SSH access`
`fix`	Bug fix or misconfiguration	`fix(sg): correct ingress rule for port 443`
`refactor`	Restructuring without behavior change	`refactor(modules): split monolithic network module`
`chore`	Maintenance, dependency updates	`chore(providers): bump azurerm to 4.x`
`ci`	Pipeline or workflow changes	`ci: add tflint step to PR workflow`
`docs`	Documentation updates	`docs: add runbook for DR failover`

The scope in parentheses maps to the infrastructure component being changed. This makes filtering history trivial:

# Show all networking changes
git log --oneline --grep="network"

# Show all production fixes
git log --oneline --grep="fix(prod"

Use tags to mark deployments so you always know what’s running in each environment:

git tag -a prod-v2.4.0 -m "Prod deploy: CDN + WAF rules"
git push origin prod-v2.4.0

Pull Request Workflows with Plan Output

Every infrastructure change should go through a pull request, even on a team of one. The PR is your audit trail and your safety net.

What makes infra PRs different from application PRs is the plan output. A code review can tell you whether the Terraform looks right, but only terraform plan tells you what will actually happen. Always attach it.

You can capture plan output and add it as a PR comment automatically:

# Generate and capture the plan
terraform plan -out=tfplan -no-color
terraform show -no-color tfplan > plan_output.txt

Or better yet, automate it in your CI pipeline (see the next section). The reviewer should see exactly which resources will be created, modified, or destroyed before approving.

PR checklist for infrastructure changes:

terraform fmt or ansible-lint passes cleanly
terraform validate succeeds
terraform plan output is attached and reviewed
At least one peer has approved
CI checks are green
No secrets in the diff

CI Guards: Linting, Validation, and Plan

Automated CI checks are your first line of defense. They catch problems before a human reviewer even looks at the PR. Here’s a GitHub Actions workflow I use for Terraform projects:

name: Terraform CI
on:
  pull_request:
    paths:
      - "infra/terraform/**"

permissions:
  contents: read
  pull-requests: write

jobs:
  terraform-checks:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: infra/terraform/envs/dev
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.9.x

      - name: Terraform Format Check
        run: terraform fmt -check -recursive -diff

      - name: Terraform Init
        run: terraform init -backend=false

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -input=false
        continue-on-error: true

      - name: Comment Plan on PR
        uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Plan
            \`\`\`
            ${{ steps.plan.outputs.stdout }}
            \`\`\`
            *Triggered by @${{ github.actor }}*`;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });

For Ansible, add a separate workflow:

name: Ansible Lint
on:
  pull_request:
    paths:
      - "infra/ansible/**"

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install ansible-lint
        run: pip install ansible-lint

      - name: Run ansible-lint
        run: ansible-lint infra/ansible/playbooks/

The paths filter is important — you don’t want Terraform CI running when someone updates an Ansible playbook, and vice versa.

Handling Secrets in Git

This is the one area where infrastructure code absolutely requires extra care. Never commit secrets to Git. Not even to a private repo. Not even “just temporarily.” Git history is forever, and rotating leaked credentials is painful.

Here’s a layered approach:

Layer 1: .gitignore

Your first defense is preventing secrets from being staged in the first place. Here’s a combined .gitignore for a Terraform + Ansible project:

# Terraform
.terraform/
*.tfstate
*.tfstate.*
*.tfvars
*.tfvars.json
crash.log
crash.*.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json
.terraform.tfstate.lock.info
.terraformrc
terraform.rc
*tfplan*

# Ansible
*.retry
vault_password_file
*.vault.yml
!ansible.cfg

# General secrets
.env
*.pem
*.key
credentials.json

Layer 2: Encryption Tools

When you need secrets in the repo (like Ansible vault files or environment-specific configs), encrypt them:

ansible-vault — Built into Ansible, encrypts entire files or individual variables with a password.
SOPS (Secrets OPerationS) — Works with AWS KMS, Azure Key Vault, GCP KMS, or PGP. Encrypts values but leaves keys readable, so you can still see the structure of your config.
git-crypt — Transparent encryption in Git. Files are encrypted on commit and decrypted on checkout.

# Ansible Vault: encrypt a file
ansible-vault encrypt group_vars/all/secrets.yml

# Ansible Vault: encrypt a single variable
ansible-vault encrypt_string 'my-db-password' --name 'db_password'

# SOPS: encrypt a file using AWS KMS
sops --encrypt --kms "arn:aws:kms:us-east-1:123:key/abc" secrets.yaml > secrets.enc.yaml

Layer 3: External Secret Managers

For production workloads, reference secrets from a vault rather than storing them in the repo at all:

HashiCorp Vault with the Terraform Vault provider
AWS Secrets Manager or Azure Key Vault via data sources
Environment variables injected by your CI/CD platform

Terraform State: Keep It Out of Git

This deserves its own section because it’s the single most common mistake I see teams make. Do not commit terraform.tfstate to your repository. Ever.

Here’s why:

State files contain secrets in plain text. Database passwords, API keys, private IPs — all stored unencrypted in the state file.
No locking. Git doesn’t support file locking. Two engineers running terraform apply at the same time will corrupt the state.
Merge conflicts are catastrophic. A bad merge on a state file can cause Terraform to destroy and recreate resources.
State changes on every apply. You’d be committing after every single operation, cluttering your history.

Use a remote backend instead:

# AWS S3 backend with DynamoDB locking
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "envs/prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

# Azure Storage backend
terraform {
  backend "azurerm" {
    resource_group_name  = "rg-terraform-state"
    storage_account_name = "stterraformstate"
    container_name       = "tfstate"
    key                  = "prod.terraform.tfstate"
  }
}

Remote backends give you automatic locking (via DynamoDB or Azure blob leases), encryption at rest, and a single source of truth that doesn’t depend on anyone remembering to commit.

Hands-On Lab

Let’s put this all together. You’ll create a Git repository with Terraform and Ansible code, a proper .gitignore, and a CI workflow.

Step 1: Create the repo and directory structure

mkdir infra-vc-lab && cd infra-vc-lab
git init

mkdir -p infra/terraform/envs/dev
mkdir -p infra/terraform/modules/storage
mkdir -p infra/ansible/playbooks
mkdir -p .github/workflows

Step 2: Add a .gitignore

cat > .gitignore << 'EOF'
# Terraform
.terraform/
*.tfstate
*.tfstate.*
*.tfvars
*.tfvars.json
crash.log
crash.*.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json
.terraform.tfstate.lock.info
.terraformrc
terraform.rc
*tfplan*

# Ansible
*.retry
vault_password_file

# Secrets
.env
*.pem
*.key
EOF

Step 3: Add a basic Terraform config

cat > infra/terraform/envs/dev/main.tf << 'EOF'
terraform {
  required_version = ">= 1.9.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.region
}

variable "region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

resource "aws_s3_bucket" "logs" {
  bucket = "my-infra-lab-logs-bucket"
  tags = {
    Environment = "dev"
    ManagedBy   = "terraform"
  }
}
EOF

Step 4: Add a basic Ansible playbook

cat > infra/ansible/playbooks/ping.yml << 'EOF'
---
- name: Verify connectivity
  hosts: all
  gather_facts: false
  tasks:
    - name: Ping all hosts
      ansible.builtin.ping:
EOF

Step 5: Add a GitHub Actions workflow

cat > .github/workflows/terraform-ci.yml << 'EOF'
name: Terraform CI
on:
  pull_request:
    paths:
      - "infra/terraform/**"
jobs:
  validate:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: infra/terraform/envs/dev
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform fmt -check -recursive
      - run: terraform init -backend=false
      - run: terraform validate
EOF

Step 6: Commit and push

git add .
git commit -m "feat: scaffold infra repo with terraform, ansible, and CI"
git branch -M main

# If you have a GitHub remote:
# git remote add origin https://github.com/youruser/infra-vc-lab.git
# git push -u origin main

Step 7: Test the workflow

Create a feature branch, make a change, and open a PR:

git checkout -b feature/add-bucket-versioning

# Add versioning to the S3 bucket in main.tf, then:
git add .
git commit -m "feat(storage): enable versioning on logs bucket"
git push origin feature/add-bucket-versioning

Open a PR on GitHub and watch the CI workflow run. If formatting or validation fails, fix it locally and push again.

Troubleshooting Guide

Problem	Cause	Fix
`terraform fmt -check` fails in CI	Local formatting doesn’t match	Run `terraform fmt -recursive` before committing
State file accidentally committed	Missing `.gitignore` entry	Add `.tfstate` to `.gitignore`, then `git rm --cached .tfstate`
Secrets in git history	Committed credentials at some point	Rotate the secret immediately, use `git filter-repo` to remove from history
CI runs on unrelated changes	No path filter in workflow	Add `paths:` filter to your `on.pull_request` trigger
Merge conflicts in `*.tf` files	Long-lived branches with shared files	Keep branches short-lived, rebase from `main` frequently
`terraform plan` differs between local and CI	Different provider or Terraform versions	Pin versions in `required_providers` and use `setup-terraform` with a specific version
Ansible playbook fails lint	Deprecated module syntax or formatting	Run `ansible-lint` locally before pushing, fix warnings

Q: Should I squash commits when merging infra PRs? Yes, in most cases. A clean single commit per change makes git log and git bisect much more useful. The exception is if your compliance team requires granular commit history for audit purposes.

Q: Can I force push to main or prod? No. Set up branch protection rules to prevent this entirely. If you need to fix a bad commit, use a revert commit through a PR. Force pushing rewrites history and can break your teammates’ local repos.

What’s Next

Next post: Branch Protection Rules and PR Workflows — we’ll configure GitHub branch protection to enforce reviews, require status checks, and lock down your main branch so nothing gets deployed without proper approvals.

Happy automating!