Use the (ALB) QPS per Backend Server metric to enable automatic scaling of elastic container instances - Auto Scaling

Auto Scaling provides the (ALB) QPS per Backend Server system metric as a trigger condition for event-triggered tasks of the system monitoring type. This enables automatic scaling of ECS instances and elastic container instances. This topic describes how to use the (ALB) QPS per Backend Server system metric to enable automatic scaling of elastic container instances.

Concepts

Application Load Balancer (ALB) is an Alibaba Cloud service that runs at the application layer and supports various protocols, such as HTTP, HTTPS, and Quick UDP Internet Connections (QUIC). ALB provides high elasticity and can process a large amount of network traffic at the application layer. For more information, see What is ALB?
Query per second (QPS) measures the number of HTTP or HTTPS queries or requests that can be processed by ALB per second. This metric is specific to Layer-7 listeners. (ALB) QPS per Backend Server is a metric that is used to evaluate the service performance of a server. The value of this metric is calculated by using the following formula: QPS per backend server = Total number of client requests that are received by ALB per second/Total number of ECS instances or elastic container instances of ALB backend server groups.

Scenarios

If you use ALB instances to receive client requests (or QPS) in a centralized manner, the client requests are forwarded to the backend servers for processing. The backend servers can be ECS instances or elastic container instances. When the number of client requests surges, your business system must scale out backend servers in an agile manner to keep your business running smoothly and stably.

In this case, you can create event-triggered tasks that use the (ALB) QPS per Backend Server metric as the alert trigger condition to enable automatic scaling of backend servers. This solution improves the high availability of your business application. This solution provides the following benefits:

When the value of the (ALB) QPS per Backend Server metric is greater than the specified threshold, Auto Scaling triggers a scale-out event to increase the number of backend servers and reduce the workloads on each backend server. This improves your system response time and stability.
When the value of the (ALB) QPS per Backend Server metric is less than the specified threshold, Auto Scaling triggers a scale-in event to decrease the number of backend servers and improve cost efficiency.

Prerequisites

If you use a Resource Access Management (RAM) user, the permissions to view and manage ALB resources are granted to the RAM user. For more information, see Grant permissions to a RAM user.
At least one virtual private cloud (VPC) and one vSwitch are created. For more information, see Resource Orchestration Service (ROS) console.

Step 1: Configure ALB

Create an ALB instance.

For more information, see Create an ALB instance.

The following table describes the parameters that are required to create an ALB instance.

Parameter	Description	Example
Instance Name	Enter a name for the ALB instance.	alb-qps-instance
VPC	Select a VPC for the ALB instance.	vpc-test****-001
Zone	Select zones and vSwitches for the ALB instance. Note ALB supports cross-zone deployment. If two or more zones are available in the specified region, select at least two zones to ensure high availability.	Zones: Hangzhou Zone G and Hangzhou Zone H vSwitches: vsw-test003 and vsw-test002

Create a server group for the ALB instance.

For more information, see Create and manage server groups.

The following table describes the parameters that are required to create a server group.

Parameter	Description	Example
Server Group Type	Specify the type of the server group that you want to create. The value Server specifies that the server group contains elastic container instances.	Server
Server Group Name	Enter a name for the server group.	alb-qps-servergroup
VPC	Select a VPC from the VPC drop-down list. You can add only servers that reside in the VPC to the server group. Important In this step, select the VPC that is specified in Step 1.	vpc-test****-001

Configure a listener for the ALB instance.
1. In the left-side navigation pane, choose ALB > Instances.
2. On the Instances page, find the ALB instance whose name is lb-qps-instance and click Create Listener in the Actions column.
3. In the Configure Listener step of the Configure Server Load Balancer wizard, set the Listener Port parameter to 80, retain the default settings of other parameters, and then click Next.
  Note
  You can specify another port number for the Listener Port parameter based on your business requirements. In this example, port number 80 is used.
4. In the Select Server Group step of the Configure Server Load Balancer wizard, select Server Type below the Server Group field, select the server group whose name is alb-qps-servergroup that was created in Step 2, and then click Next.
5. In the Configuration Review step of the Configure Server Load Balancer wizard, confirm the parameter settings and click Submit. In the message that appears, click OK.
After you complete the configurations, click the Instance Details tab to obtain the elastic IP addresses (EIPs) of the ALB instance.

Step 2: Create a scaling group for the ALB server group

The steps to create a scaling group of the ECS type are different from the steps to create a scaling group of the Elastic Container Instance type. The parameters that are displayed when you create a scaling group shall prevail. In this step, a scaling group of the Elastic Container Instance type is created.

Create a scaling group.

For more information, see Create scaling groups. The following table describes the parameters that are required to create a scaling group. You can configure the parameters that are not described in the following table based on your business requirements.

Parameter	Description	Example
Scaling Group Name	Enter a name for the scaling group.	alb-qps-scalinggroup
Type	Specify the type of instances that are managed by the scaling group to provide computing power.	ECI
Instance Configuration Source	Auto Scaling creates instances based on the value of the Instance Configuration Source parameter.	Create from Scratch
Minimum Number of Instances	If the total number of elastic container instances is less than the lower limit, Auto Scaling automatically creates elastic container instances in the scaling group until the total number of elastic container instances reaches the lower limit.	1
Maximum Number of Instances	If the total number of elastic container instances exceeds the upper limit, Auto Scaling automatically removes elastic container instances from the scaling group until the total number of elastic container instances drops below the upper limit.	5
VPC	Select the VPC in which the ALB instance resides.	vpc-test****-001
vSwitch	Select the vSwitches that are used by the ALB instance.	In this example, vsw-test003 and vsw-test002 are selected.
Associate ALB and NLB Server Groups	Select the ALB server group that is created in Step 1. Then, enter port 80.	Server group: sgp-****/alb-qps-servergroup Port: 80

Create and enable a scaling configuration.

For more information, see Create a scaling configuration of the Elastic Container Instance type. The following table describes the parameters that are required to create a scaling configuration. You can configure the parameters that are not described in the following table based on your business requirements.

Parameter	Description	Example
Container Group Configurations	Select the specifications for the container group. The specifications include the number of vCPUs and memory size.	vCPU: 2 vCPUs Memory: 4 GiB
Container Configurations	Select a container image and image tag.	Container image: registry-vpc.cn-hangzhou.aliyuncs.com/eci_open/nginx Image tag: latest

Enable the scaling group.
For more information, see Enable or disable scaling groups.
Note
In this example, the Minimum Number of Instances parameter is set to 1. In this case, Auto Scaling automatically triggers a scale-out event to create one elastic container instance after you enable the scaling group.
Check the status of the elastic container instance and the container running on the instance and make sure that the container runs as expected.
Access the EIP of the ALB instance that is created in Step 1 to check whether nginx can be accessed as expected.

Step 3: Create event-triggered tasks based on the (ALB) QPS per Backend Server metric

Log on to the Auto Scaling console.
Create a scaling rule.
In this example, two scaling rules are created. One scale-out rule named Add1 is used to add one elastic container instance. One scale-in rule named Reduce1 is used to remove one elastic container instance from the scaling group. For more information, see Manage scaling rules.
Create event-triggered tasks.
1. On the Scaling Groups page, find the scaling group whose name is alb-qps-scalinggroup and click Details in the Actions column.
2. Choose Scaling Rules and Event-triggered Tasks > Event-triggered Tasks > Event-triggered Tasks (System) and click Create Event-triggered Task.
  In this example, two event-triggered tasks are created. One event-triggered task named Alarm1 is used to trigger Add 1. The other event-triggered task named Alarm2 is used to trigger Reduce1. For more information, see Manage event-triggered tasks.
  - When you create Alarm1, select the (ALB) QPS per Backend Server metric and specify the following alert trigger condition: Average(Average) >= 100 Count/s.
    Note
    QPS per backend server = Total QPS/Total number of elastic container instances
  - When you create Alarm2, select the (ALB) QPS per Backend Server metric and specify the following alert trigger condition: Average(Average) < 50 Count/s.

Check the monitoring effect

You can use a stress testing tool, such as Apache JMeter, ApacheBench, or wrk, to perform a stress test on the EIP of the ALB instance that is created in Step 1. When you set up a stress test, you can simulate a scenario in which the QPS value reaches 500. During the test, you can observe the following trends on the Monitoring tab of the Auto Scaling console:

When the QPS value exceeds the alert threshold, the event-triggered task for scale-out events is triggered to add elastic container instances. Each time a new elastic container instance is added, the workloads on each instance are reduced, which decreases the QPS value.