×
Community Blog Automation and DevOps on Alibaba Cloud: CI/CD and Infrastructure as Code

Automation and DevOps on Alibaba Cloud: CI/CD and Infrastructure as Code

The article introduces automating cloud infrastructure and application deployments on Alibaba Cloud using DevOps practices.

Note: You've learned to build, scale, and secure systems manually. Now imagine doing it automatically, consistently, and reliably. This blog is about removing the human element—and the human error—from deployments.

Introduction

I once watched a senior engineer deploy code to production on a Friday afternoon.

Everything went fine until someone accidentally typed a command in the wrong window. They were supposed to update a configuration file. Instead, they dropped an entire table from the production database.

Friday night just became Saturday morning.

If that deployment had been automated—controlled by code that humans review before changes happen—it would have been prevented. This is the promise of DevOps and CI/CD: Repeatability, safety, and speed.

Part 1: The DevOps Philosophy

DevOps isn't a tool; it's a mindset. Three principles define it:

Principle 1: Infrastructure as Code (IaC)

Instead of clicking buttons in the console, you write code that describes your infrastructure.

Manual (Bad):

  1. Log into console
  2. Click "Create VPC"
  3. Fill in form (name, CIDR block, etc.)
  4. Click security groups
  5. Add inbound rules (remember to allow SSH? Did you?)
  6. Click create, wait 2 minutes
  7. Create instances
  8. ...repeat this 20 times for each environment

One mistake, and production looks different from staging.

Infrastructure as Code (Good):

resource "alibabacloud_vpc" "prod" {
  name            = "prod-vpc"
  cidr_block      = "10.0.0.0/16"
  description     = "Production VPC"
}

resource "alibabacloud_security_group" "web" {
  name            = "web-sg"
  vpc_id          = alibabacloud_vpc.prod.id

  ingress {
    protocol  = "tcp"
    port_range = "80/80"
    cidr_ip   = "0.0.0.0/0"
  }

  ingress {
    protocol  = "tcp"
    port_range = "443/443"
    cidr_ip   = "0.0.0.0/0"
  }
}

Now you can:

● Review changes before they happen

● Version control your infrastructure (same as code)

● Recreate environments identically

● Automate testing of infrastructure

Principle 2: Continuous Integration

Every time you commit code:

  1. Automated tests run
  2. Code is checked for quality/security
  3. If it passes, code is automatically built and packaged
  4. Everyone knows the code is deployable

Principle 3: Continuous Deployment

Code that passes CI automatically deploys to production. No manual steps. No "oops, I forgot to run a migration."

Part 2: Setting Up CI/CD

Tool Choice: Terraform for IaC

Terraform is the most popular Infrastructure as Code tool. It works with Alibaba Cloud (and 500+ other providers).

Installation:

# macOS
brew install terraform

# Or download from terraform.io

Quick test:

bash
terraform version
# Terraform v1.0.0 or higher

Your First Terraform Configuration

Step 1: Create a directory and files

my-infra/
├── main.tf (main configuration)
├── variables.tf (input variables)
├── outputs.tf (what to display after creation)
└── terraform.tfvars (values for variables)

Step 2: Write main.tf

terraform {
  required_providers {
    alibabacloud = {
      source  = "aliyun/alibabacloud"
      version = "~> 1.0"
    }
  }
}

provider "alibabacloud" {
  region            = var.region
  access_key        = var.access_key
  secret_key        = var.secret_key
}

# Create VPC
resource "alibabacloud_vpc" "main" {
  name       = "${var.environment}-vpc"
  cidr_block = "10.0.0.0/16"
}

# Create subnet
resource "alibabacloud_vswitch" "main" {
  vpc_id            = alibabacloud_vpc.main.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "${var.region}a"
  name              = "${var.environment}-subnet"
}

# Create security group
resource "alibabacloud_security_group" "web" {
  name        = "${var.environment}-web-sg"
  vpc_id      = alibabacloud_vpc.main.id
  description = "Security group for web servers"
}

# Inbound HTTP
resource "alibabacloud_security_group_rule" "allow_http" {
  type              = "ingress"
  ip_protocol       = "tcp"
  port_range        = "80/80"
  cidr_ip           = "0.0.0.0/0"
  security_group_id = alibabacloud_security_group.web.id
}

# Inbound HTTPS
resource "alibabacloud_security_group_rule" "allow_https" {
  type              = "ingress"
  ip_protocol       = "tcp"
  port_range        = "443/443"
  cidr_ip           = "0.0.0.0/0"
  security_group_id = alibabacloud_security_group.web.id
}

# Create ECS instance
resource "alibabacloud_instance" "web" {
  count                   = var.instance_count
  availability_zone       = "${var.region}a"
  image_id                = var.image_id
  instance_type           = var.instance_type
  security_groups         = [alibabacloud_security_group.web.id]
  vswitch_id              = alibabacloud_vswitch.main.id
  internet_max_bandwidth_out = "5"
  host_name               = "${var.environment}-web-${count.index}"
  instance_name           = "${var.environment}-web-${count.index}"

  tags = {
    Environment = var.environment
    Terraform   = "true"
  }
}

Step 3: Define variables (variables.tf)

variable "region" {
  description = "Alibaba Cloud region"
  type        = string
  default     = "ap-southeast-1"
}

variable "environment" {
  description = "Environment name (dev, staging, prod)"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

variable "instance_count" {
  description = "Number of instances"
  type        = number
  default     = 2
}

variable "instance_type" {
  description = "ECS instance type"
  type        = string
  default     = "ecs.t6.medium"
}

variable "image_id" {
  description = "Image ID (Ubuntu 20.04 LTS)"
  type        = string
  default     = "ubuntu_20_04_x64_20G_alibase_20240101.vhd"
}

variable "access_key" {
  type      = string
  sensitive = true
}

variable "secret_key" {
  type      = string
  sensitive = true
}

Step 4: Set values (terraform.tfvars)

# Don't commit this to Git! Use environment variables instead:
# export TF_VAR_access_key="AKIAU2XXXXXXXXXX"
# export TF_VAR_secret_key="secret_here"

region      = "ap-southeast-1"
environment = "dev"
instance_count = 2

Better approach (for security):

export TF_VAR_access_key="AKIAU2XXXXXXXXXX"
export TF_VAR_secret_key="secret_here"
export TF_VAR_region="ap-southeast-1"
export TF_VAR_environment="dev"

# Now run Terraform without secrets in files
terraform apply

Step 5: Deploy with Terraform

# Initialize Terraform (downloads provider)
terraform init

# Review what will be created
terraform plan
# Outputs: Plan to create 7 resources (VPC, subnet, SG, rules, 2 instances)

# Apply the configuration (creates actual resources)
terraform apply
# Type 'yes' to confirm

# Check what was created
terraform show
terraform state list

Outputs (outputs.tf)

After resources are created, show useful information:

output "vpc_id" {
  description = "VPC ID"
  value       = alibabacloud_vpc.main.id
}

output "instance_ips" {
  description = "Public IPs of instances"
  value       = [for instance in alibabacloud_instance.web : instance.public_ip]
}

output "security_group_id" {
  description = "Security group ID"
  value       = alibabacloud_security_group.web.id
}

After terraform apply, you can see:

Outputs:

instance_ips = [
  "47.89.155.123",
  "47.89.155.124"
]
security_group_id = "sg-wz93dsad1234"
vpc_id = "vpc-wz12asda5678"

Part 3: CI/CD Pipeline with GitHub Actions

Now let's automate the deployment. We'll use GitHub Actions (Alibaba Cloud also has equivalent services).

Setup: Store Infrastructure in Git

my-project/
├── .github/workflows/
│   └── deploy.yml (CI/CD configuration)
├── terraform/
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
└── src/ (application code)

The CI/CD Workflow (.github/workflows/deploy.yml)

name: Deploy Infrastructure

on:
  push:
    branches:
      - main
    paths:
      - 'terraform/**'
  pull_request:
    branches:
      - main
    paths:
      - 'terraform/**'

jobs:
  terraform:
    runs-on: ubuntu-latest
    
    steps:
      # Step 1: Checkout code
      - uses: actions/checkout@v3
      
      # Step 2: Set up Terraform
      - uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.5.0
      
      # Step 3: Validate configuration
      - name: Terraform Init
        run: terraform -chdir=terraform init
        env:
          ALIBABACLOUD_ACCESS_KEY: ${{ secrets.ALIBABACLOUD_ACCESS_KEY }}
          ALIBABACLOUD_SECRET_KEY: ${{ secrets.ALIBABACLOUD_SECRET_KEY }}
      
      # Step 4: Format check
      - name: Terraform Format
        run: terraform -chdir=terraform fmt -check
      
      # Step 5: Validate syntax
      - name: Terraform Validate
        run: terraform -chdir=terraform validate
      
      # Step 6: Plan changes (for PR review)
      - name: Terraform Plan
        run: terraform -chdir=terraform plan -out=tfplan
        env:
          ALIBABACLOUD_ACCESS_KEY: ${{ secrets.ALIBABACLOUD_ACCESS_KEY }}
          ALIBABACLOUD_SECRET_KEY: ${{ secrets.ALIBABACLOUD_SECRET_KEY }}
          TF_VAR_environment: ${{ github.ref == 'refs/heads/main' && 'prod' || 'dev' }}
      
      # Step 7: Apply (only on main branch)
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform -chdir=terraform apply -auto-approve tfplan
        env:
          ALIBABACLOUD_ACCESS_KEY: ${{ secrets.ALIBABACLOUD_ACCESS_KEY }}
          ALIBABACLOUD_SECRET_KEY: ${{ secrets.ALIBABACLOUD_SECRET_KEY }}
          TF_VAR_environment: 'prod'

How This Works

On Pull Request:

  1. Developer creates PR with infrastructure changes
  2. GitHub Actions runs terraform plan
  3. Plan is commented on the PR
  4. Team reviews the plan
  5. Team members approve/request changes
  6. If approved, developer merges PR

On Merge to Main:

  1. GitHub Actions runs terraform init, validate, plan
  2. GitHub Actions runs terraform apply
  3. Infrastructure is updated automatically
  4. No manual clicking required
  5. Complete audit trail in Git and GitHub

Part 4: Application Deployment Pipeline

Now for your actual application code. Same principles apply.

Docker: Containerize Your Application

Instead of deploying raw code, you deploy containers.

Dockerfile:

FROM node:18-alpine

WORKDIR /app

COPY package*.json ./
RUN npm install --production

COPY . .

EXPOSE 3000

CMD ["node", "server.js"]

Build and Push to Registry

name: Build and Deploy Application

on:
  push:
    branches:
      - main
    paths:
      - 'src/**'

jobs:
  build:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v3
      
      # Build image
      - name: Build Docker image
        run: docker build -t my-app:${{ github.sha }} .
      
      # Push to Alibaba Cloud Container Registry
      - name: Push to ACR
        run: |
          docker tag my-app:${{ github.sha }} registry.alibabacloud.com/my-org/my-app:${{ github.sha }}
          docker login -u ${{ secrets.ACR_USERNAME }} -p ${{ secrets.ACR_PASSWORD }} registry.alibabacloud.com
          docker push registry.alibabacloud.com/my-org/my-app:${{ github.sha }}

  deploy:
    needs: build
    runs-on: ubuntu-latest
    
    steps:
      - name: Deploy to ECS
        run: |
          # SSH into server and pull new image
          ssh -i ${{ secrets.DEPLOY_KEY }} ubuntu@${{ secrets.SERVER_IP }} << 'EOF'
            docker pull registry.alibabacloud.com/my-org/my-app:${{ github.sha }}
            docker-compose down
            docker-compose up -d
          EOF

Part 5: Testing and Quality Gates

Not everything should deploy automatically. You need quality gates.

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: 18
      
      # Unit tests
      - name: Run unit tests
        run: npm test
      
      # Code coverage
      - name: Check coverage
        run: npm run coverage
        
      # Linting
      - name: Lint code
        run: npm run lint
      
      # Security scanning
      - name: Security audit
        run: npm audit --audit-level=moderate
      
      # Integration tests
      - name: Run integration tests
        run: npm run test:integration

  deploy:
    needs: test  # Only deploy if tests pass
    if: success()
    ...

Now deployment only happens if:

  1. All tests pass
  2. Code coverage is above threshold
  3. No security vulnerabilities
  4. Linting passes

Part 6: Monitoring Deployments

After deployment, monitor for issues.

 post-deploy-validation:
    needs: deploy
    runs-on: ubuntu-latest
    
    steps:
      - name: Health check
        run: |
          for i in {1..30}; do
            response=$(curl -s -o /dev/null -w "%{http_code}" https://api.example.com/health)
            if [ $response -eq 200 ]; then
              echo "✓ Application is healthy"
              exit 0
            fi
            echo "Waiting... attempt $i"
            sleep 10
          done
          echo "✗ Application failed to become healthy"
          exit 1
      
      - name: Alert on failure
        if: failure()
        run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -d '{"text":"Deployment failed! Application is not healthy."}'

Part 7: Infrastructure Patterns Using Terraform

Dev Environment

terraform {
  workspace = "dev"
}

resource "alibabacloud_instance" "web" {
  count         = 1  # Single instance for dev
  instance_type = "ecs.t6.small"  # Smaller, cheaper
  tags = {
    Environment = "dev"
  }
}

Production Environment

terraform {
  workspace = "prod"
}

resource "alibabacloud_instance" "web" {
  count         = 3  # Multiple instances for HA
  instance_type = "ecs.c6.xlarge"  # Larger, better spec
  tags = {
    Environment = "prod"
  }
}

resource "alibabacloud_instance" "database" {
  count         = 2  # Primary + replica
  instance_type = "ecs.r6.2xlarge"  # Memory-optimized
  tags = {
    Environment = "prod"
  }
}

Using the same code, Terraform can deploy different configurations for different environments.

Part 8: Real-World Checklist

Infrastructure as Code:

● All infrastructure defined in Terraform

● No manual console changes

.terraform directory in .gitignore

terraform.tfvars never committed (use env vars)

● Backend configured for state file (prevents loss)

CI/CD Pipeline:

● Automatic tests on every commit

● Quality gates before deployment

● Manual approval for production

● Deployment rollback plan documented

● Deployment notifications (Slack, email)

Security:

● Secrets stored in GitHub Secrets (not in code)

● IAM user for CI/CD with minimal permissions

● HTTPS enforced for all connections

● Audit logs for all deployments

Monitoring:

● Health checks post-deployment

● Alerts for deployment failures

● Rollback triggers for errors

● Deployment metrics tracked

Wrapping Up

You now understand:

Infrastructure as Code: Version control your infrastructure

Terraform basics: Write code, deploy infrastructure automatically

CI/CD pipeline: Automated testing and deployment

Docker: Containerize applications

Quality gates: Automatic safeguards before production

Monitoring: Validate deployments work

Resources

Terraform Documentation

Alibaba Cloud Terraform Provider

GitHub Actions Documentation

Docker Getting Started

Alibaba Cloud Container Registry

__

Your Turn: Describe your current deployment process. How many manual steps? How long does it take? Comment below—I'll help you automate it.

Next post: Real-World Projects and Optimization — Let's build a complete application end-to-end.


Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

0 1 0
Share on

Farah Abdou

19 posts | 0 followers

You may also like

Comments