Use (ALB) QPS per Backend Server to trigger scaling activities - Auto Scaling

Auto Scaling allows you to use the (ALB) QPS per Backend Server metric when you create an event-triggered task of the system monitoring type. This topic describes how to use the (ALB) QPS per Backend Server metric to trigger scaling activities.

Prerequisites

An Alibaba Cloud account is created. If you do not have an Alibaba Cloud account, sign up for an Alibaba Cloud account.
Note
If you want to log on to the Auto Scaling console as a RAM user, you must grant the RAM user the permissions on Application Load Balancer (ALB) instances. For more information, see Grant permissions to a RAM user.
ALB is activated. Make sure that your Alibaba Cloud account has the AliyunServiceRoleForAlb service-linked role. For more information, see ALB service-linked roles.
At least one virtual private cloud (VPC) and one vSwitch are created. For more information, see the Create a VPC and a vSwitch section in the "Create a VPC with an IPv4 CIDR block" topic.

Scenarios

If your application is sensitive to the access traffic to your servers, you can use the (ALB) QPS per Backend Server metric to trigger scaling activities. For example, you use ALB to receive client requests and distribute the requests to backend servers such as Elastic Compute Service (ECS) instances and elastic container instances for processing. When the number of client requests increases, more backend servers are required to process the client requests.

In this case, you can create an event-triggered task for your scaling group and specify (ALB) QPS per Backend Server as the metric when you create the event-triggered task. This task enables Auto Scaling to trigger scaling activities in your scaling group, which improves the service availability of your application. This best practice provides the following benefits:

When the QPS value of each ALB server group is greater than the specified threshold, Auto Scaling triggers scale-outs. This reduces the traffic load on each ALB server group and improves system stability.
When the QPS value of each ALB server group is less than the specified threshold, Auto Scaling triggers scale-ins. This helps reduce costs on servers.

Concepts

ALB is a service of Alibaba Cloud that runs at the application layer and supports protocols, such as HTTP, HTTPS, and Quick UDP Internet Connections (QUIC). ALB provides high elasticity and can process a large amount of network traffic at the application layer. For more information, see the What is ALB? topic.
QPS measures the number of HTTP or HTTPS queries or requests that can be processed by ALB per second. This metric is specific to Layer-7 listeners. (ALB) QPS per Backend Server is a metric that is used to evaluate the service performance of a server. The value of this metric is calculated by using the following formula: (ALB) QPS per backend server = Total number of client requests that are received by ALB per second/Total number of ECS instances or elastic container instances of ALB backend server groups.
Performance Testing (PTS) is an easy-to-use SaaS-based stress testing platform that has powerful distributed stress testing capabilities.

Step 1: Configure ALB

Create an ALB instance.

For more information, see the Create an ALB instance topic.

The following table describes the parameters that are required to create an ALB instance.

Parameter	Description	Example
Instance Name	The name of the ALB instance.	alb-qps-instance
VPC	The VPC in which you want to create the ALB instance.	vpc-test-001
Zone	The zone and the vSwitch to which the ALB instance belongs. Note ALB supports cross-zone deployment. If two or more zones are available in the specified region, select at least two zones to ensure high availability.	Zones: Hangzhou Zone G and Hangzhou Zone H vSwitches: vsw-test003 and vsw-test002

Create an ALB server group.

For more information, see the Create and manage a server group topic.

The following table describes the parameters that are required to create an ALB server group.

Parameter	Description	Example
Server Group Name	The name of the server group.	alb-qps-servergroup
VPC	The VPC in which you want to create the ALB server group. You can add only servers that reside in this VPC to the ALB server group. Note In this step, select the VPC that was specified in Step 1.	vpc-test-001

Configure a listener.
1. If the left-side navigation pane of the Server Load Balancer (SLB) console, choose ALB > Instances.
2. On the Instances page, find the ALB instance named lb-qps-instance and click Create Listener in the Actions column.
3. In the Configure Listener step, set the Listener Port parameter to 80, retain the default settings of other parameters, and then click Next.
  Note
  You can specify another port number as the value of the Listener Port parameter based on your business requirements. In this example, port number 80 is used.
4. In the Select Server Group step, select Server Type below the Server Group field, select the server group named alb-qps-servergroup that was created in Step 2, and then click Next.
5. In the Configuration Review step, confirm the parameter settings and click Submit. In the message that appears, click OK.
After you complete the configuration, click the name of the ALB instance to go to the Instance Details tab. On this tab, you can obtain the elastic IP address (EIP) of the ALB instance.

Step 2: Create a scaling group for the ALB server group

The steps to create an ECS scaling group are different from the steps to create an Elastic Container Instance scaling group. The parameters that are displayed when you create a scaling group shall prevail. In this step, an Elastic Container Instance scaling group is created.

Create a scaling group.

For information about how to create a scaling group, see the Create scaling groups section in the "Manage scaling groups" topic. The following table describes the parameters that are required to create a scaling group. You can configure the parameters that are not described in the table based on your business requirements.

Parameter	Description	Example
Scaling Group Name	The name of the scaling group.	alb-qps-scalinggroup
Type	The type of instances that are managed by the scaling group to provide computing power.	ECI
Instance Configuration Source	The template that is used by Auto Scaling to create elastic container instances.	Create from Scratch
Minimum Number of Instances	The minimum number of elastic container instances that must be contained in the scaling group. When the total number of elastic container instances falls below the minimum limit, Auto Scaling automatically creates elastic container instances in the scaling group until the total number of elastic container instances reaches the value of this parameter.	1
VPC	The VPC in which you want to create the scaling group. The VPC of the scaling group must be the same as the VPC of the ALB instance.	vpc-test-001
vSwitch	The vSwitches that you want to associate with the scaling group. The vSwitches of the scaling group must be the same as the vSwitches of the ALB instance.	vsw-test003 and vsw-test002
Associate ALB Server Group	The ALB server group that you want to associate with the scaling group. In this example, the ALB sever group created in Step 1 is used.	Server group: sgp-/alb-qps-servergroup Port number: 80

Create and enable a scaling configuration.

For more information, see the Create scaling configurations for scaling groups that contain elastic container instances topic. The following table describes the parameters that are required to create a scaling configuration. You can configure the parameters that are not described in the table based on your business requirements.

Parameter	Description	Example
Container Group Configurations	The number of vCPUs and the memory size that you want to allocate to the container group.	vCPU: 2 vCPUs Memory: 4 GiB
Container Configurations	The image and the image version of the container. Note In this example, an `nginx` image is used.	Container image: registry-vpc.cn-hangzhou.aliyuncs.com/eci_open/nginx Image version: latest

Enable the scaling group.
For information about how to enable a scaling group, see the Enable or disable scaling groups section in the "Manage scaling groups" topic.
Note
After you enable the scaling group, Auto Scaling automatically creates an elastic container instance because the Minimum Number of Instances parameter of the scaling group is set to 1.
Access the EIP of the ALB instance obtained in Step 1 to check whether nginx can be accessed as expected.

Step 3: Create event-triggered tasks based on the (ALB) QPS per Backend Server metric

Log on to the Auto Scaling console.
Create scaling rules.
In this example, two scaling rules are created. One scale-out rule named Add1 is used to add one elastic container instance. One scale-in rule named Reduce1 is used to remove one elastic container instance from the scaling group. For information about how to create a scaling rule, see the Create a scaling rule section in the "Manage scaling rules" topic.
Create event-triggered tasks.
1. On the Scaling Groups page, find the scaling group named alb-qps-scalinggroup and click Details in the Actions column.
2. Click the Scaling Rules and Event-triggered Tasks tab, click the Scheduled and Event-triggered Tasks tab, and then click the Event-triggered Tasks (System) tab. On the page that appears, click Create Event-triggered Task.
  In this example, two event-triggered tasks are created. One event-triggered task named Alarm1 is used to trigger the scale-out rule named Add 1. The other event-triggered task named Alarm2 is used to trigger the scale-in rule named Reduce1. For information about how to create an event-triggered task, see the Create event-triggered tasks section in the "Manage event-triggered tasks" topic.
  - When you create the Alarm1 event-triggered task, specify (ALB) QPS per Backend Server as the metric and Average > 100 counts/s as the alert condition.
    Note
    QPS per backend server = Total QPS/Total number of elastic container instances
  - When you create the Alarm2 event-triggered task, specify (ALB) QPS per Backend Server as the metric and Average < 50 counts/s as the alert condition.

Check the monitoring effect

Log on to the PTS console.

Create a PTS scenario for the EIP of the ALB instance.

In the left-side navigation pane of the PTS console, choose Performance Test > Create Scenario and click Stress Test.
On the Scenario Settings tab, click Add PTS Node.
Note
If you click Add PTS Node, the system considers that you want to add an HTTP PTS node.
Enter the name of the API operation to perform the stress test and then enter the stress test URL.
In this step, nginx_api is used as the API operation to perform the stress test and the EIP of the ALB instance obtained in Step 1 is the stress test URL.

Click the Stress Settings tab. Configure parameters of the PTS node as prompted.

The following table describes the parameters that you must configure on the Stress Settings tab. For parameters that are not described in the following table, retain the default settings.

Parameter	Description	Example
Stress Testing Mode	Valid values: Concurrent Mode: The system determines the service performance based on the number of users who can initiate requests to the service at the same time. RPS Mode: The system determines the service performance based on the number of received requests per second.	RPS Mode
Stress Test Period	The amount of time that is required to complete the stress test. Unit: minutes.	10 Minutes
RPS Upper Limit	In request per second (RPS) mode, the system tests the service throughput of an API operation. Therefore, you must specify a maximum number of RPS and an initial number of RPS for each API operation.	500
Initial RPS	The initial number of RPS of each API operation.	500

In the lower part of the Stress Settings tab, click Debug Scenario.
If no error is reported, click Close.
In the lower part of the Stress Settings tab, click Save and Test. In the dialog box that appears, set the Trigger Cycle parameter to Trigger Now and click OK, Test Now.
If the stress test is successful, you can view the stress test report and sampling logs.
View the monitoring details of the (ALB) QPS per Backend Server metric.
For information about how to view the monitoring details of a metric, see the View instance metrics topic.
The preceding figure shows an example. From 17:25:00 to 17:35:00, the stress test whose QPS value is 500 starts. At 17:30:00, 17:32:00, 17:34:00, and 17:36:00, Auto Scaling adds one elastic container instance during each scale-out. Each time a scale-out is completed, QPS per backend server of the scaling group decreases. This best practice proves that (ALB) QPS per Backend Server is an effective metric for triggering scaling activities in scaling groups in real time.