Building Scalable Architectures on Alibaba Cloud

Note: You've got the basics down. Now it's time to think like an architect. In this post, we'll design systems that don't just work they work when 10,000 users hit them simultaneously. This is where cloud infrastructure becomes an art.

Introduction

A few years ago, a small e-commerce startup I knew launched their site on a single server. For the first few months, everything worked beautifully. Then Black Friday arrived.

6:00 AM: Website crashes under traffic.

6:15 AM: They frantically spin up more servers, but they're not configured to work together, so it doesn't help.

6:30 AM: Lost $50,000 in sales. Customers can't buy.

This didn't have to happen. They needed scalability a term that sounds fancy but really just means "your system gets better under load, not worse."

In this blog, we'll design the kind of architecture that would have saved them.

Part 1: Scaling Principles

Vertical vs. Horizontal Scaling

Vertical Scaling (Scale Up)

● Give one server more CPU, RAM, or storage

● Easier initially, but has a ceiling (there's only so big a single server can be)

● When it stops working, you have one huge problem

Horizontal Scaling (Scale Out)

● Add more servers

● More complex initially, but practically unlimited

● If one server fails, others keep serving

The rule: Always design for horizontal scaling from day one. Add a second load-balanced server even if you don't need it yet. You're learning the pattern.

The N+1 Principle

For any critical component, you should have at least N+1 instances. If you need 3 instances to handle load, run 4.

Why? When one fails, you still have N instances handling traffic while it recovers.

Part 2: A Scalable Architecture Pattern

Let me show you a proven architecture that handles scaling gracefully:

Tier 1: Presentation Layer

CDN (Content Delivery Network)

● Caches static files (images, CSS, JavaScript) at edge locations

● Users download from the closest edge server, not your origin

● 10x faster for users far from your data center

● 90% reduction in traffic to your load balancer

Load Balancer (SLB)

● Single entry point for all users

● Health-checks web servers (removes unhealthy ones automatically)

● Distributes traffic evenly

Tier 2: Application Layer

Key principle: Stateless design

Each web server should not store session state locally. Why?

Imagine a user logs in to Server

Next request gets routed to Server
Server 2 doesn't know this user is logged in user is logged out!

Solution: Store sessions in Redis (cache), not on the server.

This design means:

● Any server can handle any request

● Servers are interchangeable

● You can add/remove servers without breaking anything

Tier 3: Data Layer

Database (RDS)

● Stores all persistent data

● Configured with replication (primary + standbys)

● Automated backups

Cache (Redis)

● Stores sessions, frequently accessed data

● Much faster than database (in-memory)

● Reduces database load significantly

Message Queue (Optional but recommended)

● Decouples synchronous from asynchronous work

● Prevents user-facing requests from slowing down due to slow tasks

Storage (OSS)

● User files, backups, logs

● Not tightly coupled to application layer

Part 3: Implementing Auto-Scaling

Auto-scaling is the mechanism that makes horizontal scaling automatic. Here's how to set it up:

Step 1: Create a Launch Template

This is a blueprint for creating new instances. It specifies:

● Instance type (e.g., t6.medium)

● Image (e.g., Ubuntu 20.04 with your app pre-installed)

● Security group

● Key pair

● User data (script that runs on instance startup)

Why a blueprint? When scaling, Alibaba Cloud needs to know "What kind of instance should I spin up?" The launch template answers that.

Step 2: Create an Auto Scaling Group

Step 3: Define Scaling Policies

Scale-out (add servers):

Trigger: Average CPU > 70% for 2 consecutive periods (2 minutes)

Action: Add 2 instances

Cooldown: 3 minutes (wait 3 min before trying to scale again)

Scale-in (remove servers):

Trigger: Average CPU < 30% for 2 consecutive periods (2 minutes)

Action: Remove 1 instance

Cooldown: 5 minutes (be conservative removing)

Step 4: Attach Load Balancer

The load balancer's health check should detect when instances are unhealthy and remove them from rotation. Combined with auto-scaling, this is powerful

Part 4: Database Scaling

Scaling databases is different from scaling web servers. You can't just add more database servers and expect requests to distribute evenly.

Read Replicas

Implementation on Alibaba Cloud:

Create RDS primary instance
Create read replicas (same or different region)
Configure your application:

Example code:

const writeConnection = createConnection(primary);
const readConnections = [replica1, replica2, replica3];


async function getUser(id) {
 const readConn = readConnections[Math.floor(Math.random() * readConnections.length)];
 return readConn.query('SELECT * FROM users WHERE id = ?', id);
}


async function updateUser(id, data) {
 return writeConnection.query('UPDATE users SET ... WHERE id = ?', id);
}

Sharding

When read replicas aren't enough, you need sharding. Divide your data across multiple independent databases based on a key.

Example: User data sharded by user_id

Database 1: users with IDs 1-1,000,000

Database 2: users with IDs 1,000,001-2,000,000

Database 3: users with IDs 2,000,001-3,000,000

To find user 1,500,000:

Hash: 1,500,000 % 3 = 2
Look in Database 2

Pros: Each database is smaller and faster

Cons: Queries that span multiple shards become complex

Part 5: Caching Strategy

A well-designed cache can reduce database load by 99%. Here's the pattern:

Cache-Aside Pattern

Application checks cache
Check database
Update cache with result
Return to user

Code example:

async function getUserProfile(userId) {
 // Try cache first
 const cached = await redis.get(`user:${userId}`);
 if (cached) return JSON.parse(cached);
  // Cache miss, hit database
 const user = await database.query('SELECT * FROM users WHERE id = ?', userId);
  // Store in cache for 1 hour
 await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
  return user;
}

Cache Invalidation

The hard part isn't caching it's deciding when cached data is stale.

Strategies:

Strategy	When to Use	Trade-off
TTL (Time To Live)	Most situations	Stale data for up to TTL duration
Write-through	Important data (user profile)	Code is more complex; cache and DB always in sync
Event-based	Real-time updates needed	Requires message queue or pub/sub system

Part 6: Monitoring and Alerts

A scalable system is useless if you don't know when it's failing. Set up monitoring:

Key Metrics to Monitor

Metric	Alert Threshold	Why It Matters
CPU Usage	> 80% for 5 min	Indicates scaling pressure
Memory	> 85%	Memory leaks, misconfiguration
Disk	> 90%	Logs filling up, storage issue
Database Connections	> 80% of max	Connection pool exhausting
Request Latency	p95 > 500ms	User experience degrading
Error Rate	> 1%	Something is broken
Auto Scaling Activity	Frequent scaling	Under-provisioned or misconfigured scaling policies

Setting Up Alerts

Create a threshold (e.g., CPU > 80%)
Set evaluation period (e.g., 2 consecutive 5-minute periods)
Choose notification method (email, SMS, webhook)
Action: Page-on-call engineer, auto-remediate if possible

Part 7: The Complete Scaling Playbook

Here's what happens under heavy load with your properly scaled architecture:

Timeline:

6:00 AM - Traffic increases

6:02 AM: Auto-scaling detects CPU > 70%

6:05 AM: New instances come online

6:06 AM: Load balancer routes traffic to new instances

System maintains 70% CPU utilization

Every user still gets fast responses

No downtime

Midnight - Traffic drops

12:05 AM: Auto-scaling detects CPU < 30%

12:10 AM: One instance removed (never drop below min)

Cost optimization active

Still maintaining availability

Part 8: Real-World Checklist

Before deploying, verify:

Compute:

● Running multiple instances (minimum 2)

● Auto-scaling group configured with min/desired/max

● Launch template tested and finalized

● Scaling policies set conservatively (don't over-scale)

● Instances are stateless (sessions in cache, not local)

Data Layer:

● Database configured with replication

● Read replicas in use for read-heavy workloads

● Cache (Redis) for sessions and hot data

● Backup strategy automated

Monitoring:

● Metrics collected for all components

● Alerts configured for key thresholds

● Dashboards show system health at a glance

● Runbook written for common failures

Testing:

● Load tested to validate scaling behavior

● Tested failure scenarios (instance crashes, database unavailable)

● Tested database failover

● Tested scale-in and scale-out

Resources

● Auto Scaling User Guide

● RDS Read Replicas

● Redis Cluster

● CDN Configuration

● SLB Documentation

Your Turn: In the comments, describe your expected user load. I'll help you size an auto-scaling group for it.

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

Building Scalable Architectures on Alibaba Cloud

Introduction

Part 1: Scaling Principles

Vertical vs. Horizontal Scaling

The N+1 Principle

Part 2: A Scalable Architecture Pattern

Tier 1: Presentation Layer

Tier 2: Application Layer

Tier 3: Data Layer

Part 3: Implementing Auto-Scaling

Step 1: Create a Launch Template

Step 2: Create an Auto Scaling Group

Step 3: Define Scaling Policies

Step 4: Attach Load Balancer

Part 4: Database Scaling

Read Replicas

Sharding

Part 5: Caching Strategy

Cache-Aside Pattern

Cache Invalidation

Part 6: Monitoring and Alerts

Key Metrics to Monitor

Setting Up Alerts

Part 7: The Complete Scaling Playbook

Part 8: Real-World Checklist

Resources

Read previous post:

Read next post:

Farah Abdou

You may also like

Comments

Farah Abdou

Related Products

Content Delivery Solution

OSS(Object Storage Service)

Storage Capacity Unit

Architecture and Structure Design

Community

﻿Building Scalable Architectures on Alibaba Cloud

Introduction

Part 1: Scaling Principles

Vertical vs. Horizontal Scaling

The N+1 Principle

Part 2: A Scalable Architecture Pattern

Tier 1: Presentation Layer

Tier 2: Application Layer

Tier 3: Data Layer

Part 3: Implementing Auto-Scaling

Step 1: Create a Launch Template

Step 2: Create an Auto Scaling Group

Step 3: Define Scaling Policies

Step 4: Attach Load Balancer

Part 4: Database Scaling

Read Replicas

Sharding

Part 5: Caching Strategy

Cache-Aside Pattern

Cache Invalidation

Part 6: Monitoring and Alerts

Key Metrics to Monitor

Setting Up Alerts

Part 7: The Complete Scaling Playbook

Part 8: Real-World Checklist

Resources

Read previous post:

Read next post:

Farah Abdou

You may also like

Comments

Farah Abdou

Related Products

Content Delivery Solution

OSS(Object Storage Service)

Storage Capacity Unit

Architecture and Structure Design

Building Scalable Architectures on Alibaba Cloud