Early in my career, our team managed infrastructure the way a lot of shops still do: a shared folder on a file server, a naming convention that nobody followed consistently, and a prayer that two people werenât editing the same firewall rule at the same time. One Friday afternoon, a colleague overwrote my subnet changes with an older copy of the config. We didnât discover it until Monday morning when a deployment failed and took a production service down with it. There was no history, no audit trail, and no way to figure out what the ârightâ version was without comparing files line by line.
That was the moment I stopped treating infrastructure files like documents and started treating them like code. If your Terraform modules, Ansible playbooks, and CI/CD pipelines arenât in version control with proper workflows around them, youâre one bad overwrite away from the same kind of Monday morning I had.
This post covers everything you need to apply real Git workflows to your infrastructure code â repo structures, branching strategies, commit conventions, CI guards, secrets management, and state file hygiene. Whether youâre working with Terraform, Ansible, or both, these patterns will keep your infrastructure safe, traceable, and collaborative.
Why Version Control Matters More for Infrastructure
Application bugs are bad. Infrastructure bugs can be catastrophic. A misconfigured security group can expose an entire database to the internet. A deleted subnet can take down every service running in it. Version control gives you three things that are non-negotiable for infrastructure work:
- Traceability â Who changed what, when, and why. When an auditor asks how a firewall rule got added, you point to a commit, not a Slack thread.
- Collaboration â Every change goes through review before it touches real resources. No more âIâll just push this quick fixâ that breaks staging.
- Rollback â When something goes wrong (and it will), you can restore a known-good state instead of scrambling to recreate it from memory.
If youâre already using Git for application code, the good news is that the same tool works perfectly for infrastructure. The workflows just need some adjustments.
Repo Structure: Monorepo vs Polyrepo
Before you write a single line of Terraform, decide how youâre going to organize your repositories. There are two common approaches, and each has tradeoffs.
| Aspect | Monorepo | Polyrepo |
|---|---|---|
| Layout | All IaC (Terraform, Ansible, Helm) in one repo | Separate repos per tool or team |
| Dependency tracking | Easier â everything is co-located | Harder â cross-repo coordination needed |
| CI complexity | Higher â need path-based triggers | Lower â each repo has focused pipelines |
| Access control | Repo-level only (unless you use CODEOWNERS) | Fine-grained per repo |
| Best for | Small-to-medium teams, tightly coupled infra | Large orgs, multiple teams, distinct ownership |
For most teams Iâve worked with, a monorepo with clear directory structure is the sweet spot. You get a single source of truth without the overhead of coordinating across a dozen repositories.
Hereâs a directory layout Iâve used successfully on multiple projects:
infra/
terraform/
modules/
networking/
compute/
storage/
envs/
dev/
main.tf
variables.tf
backend.tf
staging/
main.tf
variables.tf
backend.tf
prod/
main.tf
variables.tf
backend.tf
ansible/
roles/
webserver/
database/
inventories/
dev/
prod/
playbooks/
deploy.yml
patch.yml
.github/
workflows/
terraform-ci.yml
ansible-lint.yml
.gitignore
README.md
The key is separating modules (reusable building blocks) from environments (where those modules get deployed). Each environment directory has its own backend configuration and variables, so you never accidentally apply dev changes to prod.
Branching Strategies for Infrastructure
I covered branching strategies in depth in a previous post, but infrastructure code has some specific considerations worth calling out.
Trunk-based development is my recommendation for most infrastructure teams. Keep main as your source of truth, use short-lived feature branches for changes, and merge back quickly. Hereâs why it works well for IaC:
- Infrastructure changes tend to be small and focused (add a resource, modify a rule, update a tag).
- Long-lived branches create drift between whatâs in Git and whatâs actually deployed.
- Short branches mean fewer merge conflicts on shared files like
variables.tf.
Environment branches (separate dev, staging, prod branches) are tempting but create maintenance headaches. You end up cherry-picking changes between branches and losing track of which environment has which version. Instead, use the directory-based approach above â one branch, separate environment folders.
A typical workflow looks like this:
# Create a feature branch from main
git checkout -b feature/add-cdn-distribution
# Make your changes
vim infra/terraform/envs/prod/main.tf
# Commit with a conventional message
git commit -am "feat(cdn): add CloudFront distribution for static assets"
# Push and open a PR
git push origin feature/add-cdn-distribution
The PR triggers CI checks (more on that below), a teammate reviews the plan output, and you merge to main. Clean and predictable.
Conventional Commits for Infrastructure
Consistent commit messages are more than a nice-to-have for infrastructure code. They make git log actually useful when youâre troubleshooting at 2 AM, and they enable automated changelogs and release notes.
I use the Conventional Commits specification adapted for infrastructure work:
| Prefix | When to use it | Example |
|---|---|---|
feat | New resource or capability | feat(network): add bastion host for SSH access |
fix | Bug fix or misconfiguration | fix(sg): correct ingress rule for port 443 |
refactor | Restructuring without behavior change | refactor(modules): split monolithic network module |
chore | Maintenance, dependency updates | chore(providers): bump azurerm to 4.x |
ci | Pipeline or workflow changes | ci: add tflint step to PR workflow |
docs | Documentation updates | docs: add runbook for DR failover |
The scope in parentheses maps to the infrastructure component being changed. This makes filtering history trivial:
# Show all networking changes
git log --oneline --grep="network"
# Show all production fixes
git log --oneline --grep="fix(prod"
Use tags to mark deployments so you always know whatâs running in each environment:
git tag -a prod-v2.4.0 -m "Prod deploy: CDN + WAF rules"
git push origin prod-v2.4.0
Pull Request Workflows with Plan Output
Every infrastructure change should go through a pull request, even on a team of one. The PR is your audit trail and your safety net.
What makes infra PRs different from application PRs is the plan output. A code review can tell you whether the Terraform looks right, but only terraform plan tells you what will actually happen. Always attach it.
You can capture plan output and add it as a PR comment automatically:
# Generate and capture the plan
terraform plan -out=tfplan -no-color
terraform show -no-color tfplan > plan_output.txt
Or better yet, automate it in your CI pipeline (see the next section). The reviewer should see exactly which resources will be created, modified, or destroyed before approving.
PR checklist for infrastructure changes:
terraform fmtoransible-lintpasses cleanlyterraform validatesucceedsterraform planoutput is attached and reviewed- At least one peer has approved
- CI checks are green
- No secrets in the diff
CI Guards: Linting, Validation, and Plan
Automated CI checks are your first line of defense. They catch problems before a human reviewer even looks at the PR. Hereâs a GitHub Actions workflow I use for Terraform projects:
name: Terraform CI
on:
pull_request:
paths:
- "infra/terraform/**"
permissions:
contents: read
pull-requests: write
jobs:
terraform-checks:
runs-on: ubuntu-latest
defaults:
run:
working-directory: infra/terraform/envs/dev
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.9.x
- name: Terraform Format Check
run: terraform fmt -check -recursive -diff
- name: Terraform Init
run: terraform init -backend=false
- name: Terraform Validate
run: terraform validate
- name: Terraform Plan
id: plan
run: terraform plan -no-color -input=false
continue-on-error: true
- name: Comment Plan on PR
uses: actions/github-script@v7
with:
script: |
const output = `#### Terraform Plan
\`\`\`
${{ steps.plan.outputs.stdout }}
\`\`\`
*Triggered by @${{ github.actor }}*`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
});
For Ansible, add a separate workflow:
name: Ansible Lint
on:
pull_request:
paths:
- "infra/ansible/**"
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install ansible-lint
run: pip install ansible-lint
- name: Run ansible-lint
run: ansible-lint infra/ansible/playbooks/
The paths filter is important â you donât want Terraform CI running when someone updates an Ansible playbook, and vice versa.
Handling Secrets in Git
This is the one area where infrastructure code absolutely requires extra care. Never commit secrets to Git. Not even to a private repo. Not even âjust temporarily.â Git history is forever, and rotating leaked credentials is painful.
Hereâs a layered approach:
Layer 1: .gitignore
Your first defense is preventing secrets from being staged in the first place. Hereâs a combined .gitignore for a Terraform + Ansible project:
# Terraform
.terraform/
*.tfstate
*.tfstate.*
*.tfvars
*.tfvars.json
crash.log
crash.*.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json
.terraform.tfstate.lock.info
.terraformrc
terraform.rc
*tfplan*
# Ansible
*.retry
vault_password_file
*.vault.yml
!ansible.cfg
# General secrets
.env
*.pem
*.key
credentials.json
Layer 2: Encryption Tools
When you need secrets in the repo (like Ansible vault files or environment-specific configs), encrypt them:
- ansible-vault â Built into Ansible, encrypts entire files or individual variables with a password.
- SOPS (Secrets OPerationS) â Works with AWS KMS, Azure Key Vault, GCP KMS, or PGP. Encrypts values but leaves keys readable, so you can still see the structure of your config.
- git-crypt â Transparent encryption in Git. Files are encrypted on commit and decrypted on checkout.
# Ansible Vault: encrypt a file
ansible-vault encrypt group_vars/all/secrets.yml
# Ansible Vault: encrypt a single variable
ansible-vault encrypt_string 'my-db-password' --name 'db_password'
# SOPS: encrypt a file using AWS KMS
sops --encrypt --kms "arn:aws:kms:us-east-1:123:key/abc" secrets.yaml > secrets.enc.yaml
Layer 3: External Secret Managers
For production workloads, reference secrets from a vault rather than storing them in the repo at all:
- HashiCorp Vault with the Terraform Vault provider
- AWS Secrets Manager or Azure Key Vault via data sources
- Environment variables injected by your CI/CD platform
Terraform State: Keep It Out of Git
This deserves its own section because itâs the single most common mistake I see teams make. Do not commit terraform.tfstate to your repository. Ever.
Hereâs why:
- State files contain secrets in plain text. Database passwords, API keys, private IPs â all stored unencrypted in the state file.
- No locking. Git doesnât support file locking. Two engineers running
terraform applyat the same time will corrupt the state. - Merge conflicts are catastrophic. A bad merge on a state file can cause Terraform to destroy and recreate resources.
- State changes on every apply. Youâd be committing after every single operation, cluttering your history.
Use a remote backend instead:
# AWS S3 backend with DynamoDB locking
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "envs/prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
# Azure Storage backend
terraform {
backend "azurerm" {
resource_group_name = "rg-terraform-state"
storage_account_name = "stterraformstate"
container_name = "tfstate"
key = "prod.terraform.tfstate"
}
}
Remote backends give you automatic locking (via DynamoDB or Azure blob leases), encryption at rest, and a single source of truth that doesnât depend on anyone remembering to commit.
Hands-On Lab
Letâs put this all together. Youâll create a Git repository with Terraform and Ansible code, a proper .gitignore, and a CI workflow.
Step 1: Create the repo and directory structure
mkdir infra-vc-lab && cd infra-vc-lab
git init
mkdir -p infra/terraform/envs/dev
mkdir -p infra/terraform/modules/storage
mkdir -p infra/ansible/playbooks
mkdir -p .github/workflows
Step 2: Add a .gitignore
cat > .gitignore << 'EOF'
# Terraform
.terraform/
*.tfstate
*.tfstate.*
*.tfvars
*.tfvars.json
crash.log
crash.*.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json
.terraform.tfstate.lock.info
.terraformrc
terraform.rc
*tfplan*
# Ansible
*.retry
vault_password_file
# Secrets
.env
*.pem
*.key
EOF
Step 3: Add a basic Terraform config
cat > infra/terraform/envs/dev/main.tf << 'EOF'
terraform {
required_version = ">= 1.9.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.region
}
variable "region" {
description = "AWS region"
type = string
default = "us-east-1"
}
resource "aws_s3_bucket" "logs" {
bucket = "my-infra-lab-logs-bucket"
tags = {
Environment = "dev"
ManagedBy = "terraform"
}
}
EOF
Step 4: Add a basic Ansible playbook
cat > infra/ansible/playbooks/ping.yml << 'EOF'
---
- name: Verify connectivity
hosts: all
gather_facts: false
tasks:
- name: Ping all hosts
ansible.builtin.ping:
EOF
Step 5: Add a GitHub Actions workflow
cat > .github/workflows/terraform-ci.yml << 'EOF'
name: Terraform CI
on:
pull_request:
paths:
- "infra/terraform/**"
jobs:
validate:
runs-on: ubuntu-latest
defaults:
run:
working-directory: infra/terraform/envs/dev
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform fmt -check -recursive
- run: terraform init -backend=false
- run: terraform validate
EOF
Step 6: Commit and push
git add .
git commit -m "feat: scaffold infra repo with terraform, ansible, and CI"
git branch -M main
# If you have a GitHub remote:
# git remote add origin https://github.com/youruser/infra-vc-lab.git
# git push -u origin main
Step 7: Test the workflow
Create a feature branch, make a change, and open a PR:
git checkout -b feature/add-bucket-versioning
# Add versioning to the S3 bucket in main.tf, then:
git add .
git commit -m "feat(storage): enable versioning on logs bucket"
git push origin feature/add-bucket-versioning
Open a PR on GitHub and watch the CI workflow run. If formatting or validation fails, fix it locally and push again.
Troubleshooting Guide
| Problem | Cause | Fix |
|---|---|---|
terraform fmt -check fails in CI | Local formatting doesnât match | Run terraform fmt -recursive before committing |
| State file accidentally committed | Missing .gitignore entry | Add *.tfstate to .gitignore, then git rm --cached *.tfstate |
| Secrets in git history | Committed credentials at some point | Rotate the secret immediately, use git filter-repo to remove from history |
| CI runs on unrelated changes | No path filter in workflow | Add paths: filter to your on.pull_request trigger |
Merge conflicts in *.tf files | Long-lived branches with shared files | Keep branches short-lived, rebase from main frequently |
terraform plan differs between local and CI | Different provider or Terraform versions | Pin versions in required_providers and use setup-terraform with a specific version |
| Ansible playbook fails lint | Deprecated module syntax or formatting | Run ansible-lint locally before pushing, fix warnings |
Q: Should I squash commits when merging infra PRs?
Yes, in most cases. A clean single commit per change makes git log and git bisect much more useful. The exception is if your compliance team requires granular commit history for audit purposes.
Q: Can I force push to main or prod?
No. Set up branch protection rules to prevent this entirely. If you need to fix a bad commit, use a revert commit through a PR. Force pushing rewrites history and can break your teammatesâ local repos.
Whatâs Next
Next post: Branch Protection Rules and PR Workflows â weâll configure GitHub branch protection to enforce reviews, require status checks, and lock down your main branch so nothing gets deployed without proper approvals.
Happy automating!