Back to DevOps & SRE

Module 4: Infrastructure as Code

Manage infrastructure declaratively with Terraform, automate configuration with Ansible, and embrace GitOps

šŸŽÆ What is Infrastructure as Code?

Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files, rather than manual configuration or interactive tools.

Traditional vs IaC Approach

āŒ Manual Infrastructure

  • • Click through web consoles
  • • Inconsistent environments
  • • No version control
  • • Hard to replicate
  • • Slow and error-prone
  • • No audit trail

āœ… Infrastructure as Code

  • • Define in code files
  • • Consistent and repeatable
  • • Version controlled in Git
  • • Easy to replicate
  • • Fast and reliable
  • • Complete audit trail

Benefits of IaC

Speed: Provision in minutes, not days
Consistency: Same config every time
Accountability: Track who changed what
Reusability: Modules and templates
Testing: Test infrastructure changes
Disaster Recovery: Rebuild quickly

šŸ—ļø Terraform Fundamentals

Terraform is an open-source IaC tool that lets you define infrastructure for multiple cloud providers using a declarative configuration language (HCL - HashiCorp Configuration Language).

Core Concepts

Providers

Plugins that interact with cloud platforms (AWS, Azure, GCP, etc.)

Resources

Infrastructure components (VMs, networks, databases, etc.)

State

Tracks current infrastructure state (stored in terraform.tfstate)

Modules

Reusable Terraform configurations

Example: AWS Infrastructure

# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region = var.aws_region
}

# VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  
  tags = {
    Name        = "${var.project_name}-vpc"
    Environment = var.environment
  }
}

# Subnet
resource "aws_subnet" "public" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "${var.aws_region}a"
  map_public_ip_on_launch = true
  
  tags = {
    Name = "${var.project_name}-public-subnet"
  }
}

# Security Group
resource "aws_security_group" "web" {
  name        = "${var.project_name}-web-sg"
  description = "Allow HTTP and HTTPS traffic"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# EC2 Instance
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_type
  subnet_id     = aws_subnet.public.id
  
  vpc_security_group_ids = [aws_security_group.web.id]
  
  user_data = file("user-data.sh")
  
  tags = {
    Name = "${var.project_name}-web-server"
  }
}

# Outputs
output "instance_public_ip" {
  value = aws_instance.web.public_ip
}

Variables File

# variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "project_name" {
  description = "Project name"
  type        = string
}

variable "environment" {
  description = "Environment (dev, staging, prod)"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod"
  }
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t3.micro"
}

# terraform.tfvars
project_name  = "myapp"
environment   = "prod"
instance_type = "t3.small"

Terraform Workflow

1

terraform init

Initialize working directory, download providers

2

terraform plan

Preview changes before applying

3

terraform apply

Apply changes to create/update infrastructure

4

terraform destroy

Destroy all managed infrastructure

šŸ“¦ Terraform Modules

Modules are containers for multiple resources that are used together. They enable code reuse and organization of complex infrastructure.

Module Structure

modules/
└── vpc/
    ā”œā”€ā”€ main.tf       # Resources
    ā”œā”€ā”€ variables.tf  # Input variables
    ā”œā”€ā”€ outputs.tf    # Output values
    └── README.md     # Documentation

# Using the module
module "vpc" {
  source = "./modules/vpc"
  
  vpc_cidr    = "10.0.0.0/16"
  project_name = "myapp"
  environment  = "prod"
}

# Access module outputs
resource "aws_instance" "web" {
  subnet_id = module.vpc.public_subnet_id
}

Remote State Management

Store state remotely for team collaboration and state locking to prevent concurrent modifications.

# S3 Backend with DynamoDB locking
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

# Terraform Cloud Backend
terraform {
  cloud {
    organization = "my-org"
    workspaces {
      name = "prod-infrastructure"
    }
  }
}

āš™ļø Ansible for Configuration Management

Ansible automates software provisioning, configuration management, and application deployment. It's agentless (uses SSH) and uses YAML for playbooks.

Ansible vs Terraform

Terraform

  • • Infrastructure provisioning
  • • Declarative
  • • Immutable infrastructure
  • • Cloud resources

Ansible

  • • Configuration management
  • • Procedural
  • • Mutable infrastructure
  • • Software installation/config

Ansible Playbook Example

# playbook.yml
---
- name: Configure web servers
  hosts: webservers
  become: yes
  
  vars:
    app_name: myapp
    app_port: 3000
  
  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes
        cache_valid_time: 3600
    
    - name: Install Node.js
      apt:
        name:
          - nodejs
          - npm
        state: present
    
    - name: Create app directory
      file:
        path: /opt/{{ app_name }}
        state: directory
        owner: www-data
        group: www-data
    
    - name: Copy application files
      copy:
        src: ./app/
        dest: /opt/{{ app_name }}/
        owner: www-data
        group: www-data
    
    - name: Install dependencies
      npm:
        path: /opt/{{ app_name }}
        state: present
    
    - name: Create systemd service
      template:
        src: templates/app.service.j2
        dest: /etc/systemd/system/{{ app_name }}.service
      notify: Restart app
    
    - name: Enable and start service
      systemd:
        name: {{ app_name }}
        enabled: yes
        state: started
  
  handlers:
    - name: Restart app
      systemd:
        name: {{ app_name }}
        state: restarted

Inventory File

# inventory.ini
[webservers]
web1.example.com ansible_host=10.0.1.10
web2.example.com ansible_host=10.0.1.11

[databases]
db1.example.com ansible_host=10.0.2.10

[all:vars]
ansible_user=ubuntu
ansible_ssh_private_key_file=~/.ssh/id_rsa

# Run playbook
ansible-playbook -i inventory.ini playbook.yml

šŸ”„ GitOps with ArgoCD

GitOps uses Git as the single source of truth for declarative infrastructure and applications. Changes are made via Git commits, and automated tools sync the desired state to clusters.

GitOps Principles

Declarative: Entire system described declaratively
Versioned: Desired state stored in Git
Automated: Changes automatically applied
Continuously Reconciled: Software agents ensure correctness

ArgoCD Application

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  
  source:
    repoURL: https://github.com/myorg/myapp
    targetRevision: main
    path: k8s/overlays/production
  
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

šŸ’” GitOps Workflow:

  1. Developer commits K8s manifests to Git
  2. Pull request reviewed and merged
  3. ArgoCD detects changes in Git
  4. ArgoCD syncs changes to cluster
  5. Application updated automatically

✨ IaC Best Practices

Code Organization

  • āœ“ Use modules for reusability
  • āœ“ Separate environments (dev/staging/prod)
  • āœ“ Keep configurations DRY
  • āœ“ Use meaningful naming conventions

Version Control

  • āœ“ Store all IaC in Git
  • āœ“ Use pull requests for changes
  • āœ“ Tag releases
  • āœ“ Document changes in commits

Security

  • āœ“ Never commit secrets
  • āœ“ Use secret management tools
  • āœ“ Scan for security issues
  • āœ“ Implement least privilege

Testing

  • āœ“ Validate syntax before apply
  • āœ“ Test in non-prod first
  • āœ“ Use policy as code (OPA)
  • āœ“ Automated compliance checks

šŸ“ Module Summary

You've learned to manage infrastructure as code:

Tools Mastered:

  • āœ“ Terraform for provisioning
  • āœ“ Ansible for configuration
  • āœ“ GitOps with ArgoCD
  • āœ“ State management

Key Concepts:

  • āœ“ Declarative infrastructure
  • āœ“ Modules and reusability
  • āœ“ Version control everything
  • āœ“ Automated deployments

šŸŽÆ Next Steps

Now that you can manage infrastructure as code, let's learn how to monitor and observe your systems with Prometheus, Grafana, and the ELK stack.

Continue to Module 5: Monitoring & Observability →