Auto Scaling Explained: Dynamic & Stable Workload Architectures - Auto Scaling - Alibaba Cloud - Auto Scaling

Auto Scaling automatically adjusts Elastic Compute Service (ECS) instance counts and elastic container instance counts in response to changing workloads — reducing the need for manual intervention. This page describes four common scenarios to help you identify the right scaling approach for your situation.

Handle unpredictable traffic spikes

The problem: A news website sees page views surge when breaking news hits and drop as the story fades. Traffic spikes are impossible to anticipate, so manually adjusting instance counts is both impractical and error-prone.

The solution: Create event-triggered tasks to monitor metrics such as CPU utilization. Auto Scaling responds automatically based on the monitoring results.

Two common configurations:

Simple scaling rules (two tasks): If CPU utilization reaches 70% or above, add three instances to the scaling group. If it drops below 30%, remove three instances.
Target tracking scaling rule (one task): Set a target CPU utilization of 50%. Auto Scaling adjusts instance counts to maintain that target.

Handle predictable traffic patterns

The problem: A game company consistently sees demand surge between 18:00 and 22:00 every day. Manually scaling up before peak hours and scaling down afterward is repetitive and easy to forget.

The solution: Create scheduled tasks to add and release instances at fixed times each day.

Example configuration:

At 17:55, a scheduled task triggers a simple scaling rule that adds three instances — giving them time to warm up before peak hours start at 18:00.
At 22:05, a second scheduled task triggers a simple scaling rule that removes those three instances once off-peak hours begin.

This keeps capacity ready when traffic arrives and releases it when no longer needed, with no idle instances running overnight.

Maintain high availability with stable workloads

The problem: A telecommunications company runs stable workloads with no obvious traffic variation — but if an instance fails unexpectedly, the team may not catch it in time, causing a service interruption.

The solution: Enable health checks in your scaling group. Auto Scaling monitors instance health continuously. When it detects an unhealthy instance, it automatically creates a replacement so your service keeps running.

Set a minimum number of instances in your scaling group. Auto Scaling ensures the group never drops below that count, providing a baseline guarantee of availability.

Handle mixed and complex traffic patterns

The problem: A company runs stable daily traffic but occasionally experiences demand spikes. It already has subscription instances and only needs to add capacity when load increases — not replace its existing fleet.

The solution: Manually add your subscription instances to a scaling group, then configure event-triggered tasks to monitor CPU utilization and scale accordingly. Auto Scaling keeps your subscription instances running and adds or removes additional instances only when needed.

For more complex requirements, combine multiple features simultaneously:

Scheduled tasks for predictable peak periods
Event-triggered tasks for unexpected spikes
Health checks for continuous availability

Using these features together lets you cover a wide range of traffic conditions while minimizing costs and manual effort.