Note: You've got the basics down. Now it's time to think like an architect. In this post, we'll design systems that don't just work they work when 10,000 users hit them simultaneously. This is where cloud infrastructure becomes an art.
A few years ago, a small e-commerce startup I knew launched their site on a single server. For the first few months, everything worked beautifully. Then Black Friday arrived.
6:00 AM: Website crashes under traffic.
6:15 AM: They frantically spin up more servers, but they're not configured to work together, so it doesn't help.
6:30 AM: Lost $50,000 in sales. Customers can't buy.
This didn't have to happen. They needed scalability a term that sounds fancy but really just means "your system gets better under load, not worse."
In this blog, we'll design the kind of architecture that would have saved them.
Vertical Scaling (Scale Up)
● Give one server more CPU, RAM, or storage
● Easier initially, but has a ceiling (there's only so big a single server can be)
● When it stops working, you have one huge problem
Horizontal Scaling (Scale Out)
● Add more servers
● More complex initially, but practically unlimited
● If one server fails, others keep serving
The rule: Always design for horizontal scaling from day one. Add a second load-balanced server even if you don't need it yet. You're learning the pattern.
For any critical component, you should have at least N+1 instances. If you need 3 instances to handle load, run 4.
Why? When one fails, you still have N instances handling traffic while it recovers.
Let me show you a proven architecture that handles scaling gracefully:
CDN (Content Delivery Network)
● Caches static files (images, CSS, JavaScript) at edge locations
● Users download from the closest edge server, not your origin
● 10x faster for users far from your data center
● 90% reduction in traffic to your load balancer
Load Balancer (SLB)
● Single entry point for all users
● Health-checks web servers (removes unhealthy ones automatically)
● Distributes traffic evenly
Key principle: Stateless design
Each web server should not store session state locally. Why?
Imagine a user logs in to Server
Solution: Store sessions in Redis (cache), not on the server.
This design means:
● Any server can handle any request
● Servers are interchangeable
● You can add/remove servers without breaking anything
Database (RDS)
● Stores all persistent data
● Configured with replication (primary + standbys)
● Automated backups
Cache (Redis)
● Stores sessions, frequently accessed data
● Much faster than database (in-memory)
● Reduces database load significantly
Message Queue (Optional but recommended)
● Decouples synchronous from asynchronous work
● Prevents user-facing requests from slowing down due to slow tasks
Storage (OSS)
● User files, backups, logs
● Not tightly coupled to application layer
Auto-scaling is the mechanism that makes horizontal scaling automatic. Here's how to set it up:
This is a blueprint for creating new instances. It specifies:
● Instance type (e.g., t6.medium)
● Image (e.g., Ubuntu 20.04 with your app pre-installed)
● Security group
● Key pair
● User data (script that runs on instance startup)
Why a blueprint? When scaling, Alibaba Cloud needs to know "What kind of instance should I spin up?" The launch template answers that.
Scale-out (add servers):
Trigger: Average CPU > 70% for 2 consecutive periods (2 minutes)
Action: Add 2 instances
Cooldown: 3 minutes (wait 3 min before trying to scale again)
Scale-in (remove servers):
Trigger: Average CPU < 30% for 2 consecutive periods (2 minutes)
Action: Remove 1 instance
Cooldown: 5 minutes (be conservative removing)
The load balancer's health check should detect when instances are unhealthy and remove them from rotation. Combined with auto-scaling, this is powerful
Scaling databases is different from scaling web servers. You can't just add more database servers and expect requests to distribute evenly.
Implementation on Alibaba Cloud:
Example code:
const writeConnection = createConnection(primary);
const readConnections = [replica1, replica2, replica3];
async function getUser(id) {
const readConn = readConnections[Math.floor(Math.random() * readConnections.length)];
return readConn.query('SELECT * FROM users WHERE id = ?', id);
}
async function updateUser(id, data) {
return writeConnection.query('UPDATE users SET ... WHERE id = ?', id);
}
When read replicas aren't enough, you need sharding. Divide your data across multiple independent databases based on a key.
Example: User data sharded by user_id
Database 1: users with IDs 1-1,000,000
Database 2: users with IDs 1,000,001-2,000,000
Database 3: users with IDs 2,000,001-3,000,000
To find user 1,500,000:
Pros: Each database is smaller and faster
Cons: Queries that span multiple shards become complex
A well-designed cache can reduce database load by 99%. Here's the pattern:
Code example:
async function getUserProfile(userId) {
// Try cache first
const cached = await redis.get(`user:${userId}`);
if (cached) return JSON.parse(cached);
// Cache miss, hit database
const user = await database.query('SELECT * FROM users WHERE id = ?', userId);
// Store in cache for 1 hour
await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
return user;
}
The hard part isn't caching it's deciding when cached data is stale.
Strategies:
| Strategy | When to Use | Trade-off |
|---|---|---|
| TTL (Time To Live) | Most situations | Stale data for up to TTL duration |
| Write-through | Important data (user profile) | Code is more complex; cache and DB always in sync |
| Event-based | Real-time updates needed | Requires message queue or pub/sub system |
A scalable system is useless if you don't know when it's failing. Set up monitoring:
| Metric | Alert Threshold | Why It Matters |
|---|---|---|
| CPU Usage | > 80% for 5 min | Indicates scaling pressure |
| Memory | > 85% | Memory leaks, misconfiguration |
| Disk | > 90% | Logs filling up, storage issue |
| Database Connections | > 80% of max | Connection pool exhausting |
| Request Latency | p95 > 500ms | User experience degrading |
| Error Rate | > 1% | Something is broken |
| Auto Scaling Activity | Frequent scaling | Under-provisioned or misconfigured scaling policies |
Here's what happens under heavy load with your properly scaled architecture:
Timeline:
6:00 AM - Traffic increases
6:02 AM: Auto-scaling detects CPU > 70%
6:05 AM: New instances come online
6:06 AM: Load balancer routes traffic to new instances
System maintains 70% CPU utilization
Every user still gets fast responses
No downtime
Midnight - Traffic drops
12:05 AM: Auto-scaling detects CPU < 30%
12:10 AM: One instance removed (never drop below min)
Cost optimization active
Still maintaining availability
Before deploying, verify:
Compute:
● Running multiple instances (minimum 2)
● Auto-scaling group configured with min/desired/max
● Launch template tested and finalized
● Scaling policies set conservatively (don't over-scale)
● Instances are stateless (sessions in cache, not local)
Data Layer:
● Database configured with replication
● Read replicas in use for read-heavy workloads
● Cache (Redis) for sessions and hot data
● Backup strategy automated
Monitoring:
● Metrics collected for all components
● Alerts configured for key thresholds
● Dashboards show system health at a glance
● Runbook written for common failures
Testing:
● Load tested to validate scaling behavior
● Tested failure scenarios (instance crashes, database unavailable)
● Tested database failover
● Tested scale-in and scale-out
Your Turn: In the comments, describe your expected user load. I'll help you size an auto-scaling group for it.
Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.
Understanding Core Services: Compute, Storage, and Networking Simplified
16 posts | 0 followers
FollowAlibaba Clouder - May 11, 2021
Alibaba Clouder - January 4, 2021
Alipay Technology - May 14, 2020
Alibaba Clouder - September 24, 2020
Alibaba Clouder - May 18, 2021
Alibaba Cloud Native Community - December 17, 2025
16 posts | 0 followers
Follow
Content Delivery Solution
Save egress traffic cost. Eliminate all complexity in managing storage cost.
Learn More
OSS(Object Storage Service)
An encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the world
Learn More
Storage Capacity Unit
Plan and optimize your storage budget with flexible storage services
Learn More
Architecture and Structure Design
Customized infrastructure to ensure high availability, scalability and high-performance
Learn MoreMore Posts by Farah Abdou