Advanced Horizontal Pod Autoscaler (AHPA) can perform predictive scaling based on the historical values of metrics, such as the RPS, concurrency, CPU, and memory metrics. It can scale resources in advance and maintain the specified maximum or minimum number of pods within the specified time period. By defining cron expressions, you can specify the maximum and minimum numbers of pods within the specified time period.
Prerequisites
Knative is deployed in the ACS cluster. For more information, see Deploy Knative.
AHPA is deployed in the ACS cluster. For more information, see Deploy AHPA.
Step 1: Use AHPA to configure metrics for auto scaling
Create an AHPA ConfigMap based on the following YAML content and deploy AHPA in the ACS cluster.
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedHorizontalPodAutoscalerTemplate
metadata:
name: ahpa-demo
spec:
metrics:
- type: Resource
resource:
name: rps
target:
type: Utilization
averageUtilization: 10 # The RPS threshold is set to 10.
maxReplicas: 50 # The maximum number of replicated pods is set to 50.
minReplicas: 0 # The minimum number of replicated pods is set to 0.
prediction:
quantile: 95 # The confidence level of prediction is set to 95%.
scaleUpForward: 180 # The time range of forward prediction is set to 180 seconds.
# The number of replicated pods is limited by the maximum number of replicated pods and the minimum number of replicated pods defined by AHPA from 00:00:00 on June 1, 2023 to 00:00:00 on June 1, 2123.
instanceBounds:
- startTime: "2023-06-01 00:00:00"
endTime: "2123-06-01 00:00:00"
bounds:
# The minimum number of replicated pods is 0 and the maximum number of replicated pods is 50 from 0 am to 6 am.
- cron: '* 0-6 ? * *'
maxReplicas: 50
minReplicas: 0
# The minimum number of replicated pods is 5 and the maximum number of replicated pods is 50 from 7 am to 9 am.
- cron: '* 7-9 ? * *'
maxReplicas: 50
minReplicas: 5
# The minimum number of replicated pods is 10 and the maximum number of replicated pods is 50 from 10 am to 4 pm.
- cron: '* 10-16 ? * *'
maxReplicas: 50
minReplicas: 10
# The minimum number of replicated pods is 2 and the maximum number of replicated pods is 50 from 5 pm to 11 pm.
- cron: '* 17-23 ? * *'
maxReplicas: 50
minReplicas: 2
Parameter | Required | Description |
| Yes | Configure metrics for auto scaling. The RPS, concurrency, CPU, and memory metrics are supported. |
| Yes | The maximum number of replicated pods that are allowed. |
| Yes | The minimum number of replicated pods that must be guaranteed. |
| No | The time period during which the number of replicated pods is limited by the maximum number of replicated pods and the minimum number of replicated pods defined by AHPA.
|
| No | The maximum number of replicated pods and the minimum number of replicated pods within the specified time period.
|
Fields used in cron expressions
The following table describes the fields that are contained in a CRON expression. For more information, see Cron expressions.
Field | Special character | Required | Description |
Minutes | * / , - | Yes | Valid values: 0 to 59. |
Hours | * / , - | Yes | Valid values: 0 to 23. |
Day of month | * / , – ? | Yes | Valid values: 1 to 31. |
Month | * / , - | Yes | Valid values: 1 to 12 or JAN to DEC. Note The valid values from JAN to DEC are not case-sensitive. |
Day of week | * / , – ? | No | Valid values: 0 to 6 or SUN to SAT. Note
|
Special characters used in cron expressions:
An asterisk (*) indicates any value. For example,
*
indicates any minute or hour.A forward slash (/) indicates the step size. For example,
/5
indicates five time units.Commas (,) are used as delimiters. For example,
1,3,5
indicates values 1, 3, and 5.Hyphens (-) are used in value ranges. For example,
1-5
indicates values 1 to 5.Question marks (?) are used only in the Day of month and Day of week fields to indicate variable values.
Step 2: Create a Knative Service and enable AHPA for the Service
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose
.On the Services tab of the Knative page, set Namespace to default, click Create from Template, copy the following YAML content to the editor, and then click Create to create a Service named helloworld-go-demo.
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go-demo spec: template: metadata: annotations: autoscaling.knative.dev/class: ahpa.autoscaling.knative.dev # Specify the AHPA plug-in. autoscaling.knative.dev.alibabacloud/ahpa-template: "ahpa-demo" # If you modify the AHPA template parameter, the corresponding revision is also updated. spec: containers: - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56 env: - name: TARGET value: "Knative"
After the Service is created, record the gateway address and domain name of the Service, which will be used in Step 3: Access the Service.
Step 3: Access the Service
Run the following command to access the Service:
# helloworld-go-demo.default.example.com is the default domain name of the Service. # alb-i5lagvip6fga******.cn-shenzhen.alb.aliyuncs.com is the gateway address of the Service. curl -H "Host: helloworld-go-demo.default.example.com" http://alb-i5lagvip6fga******.cn-shenzhen.alb.aliyuncs.com
Expected results:
Hello Knative!
Step 4 (Optional): Verify scheduled auto scaling
On the Monitoring Dashboards of Knative, you can view the trends of pod scaling for the Knative Service. For more information about the Knative dashboard, see View the Knative monitoring dashboard.
When the number of pods for a Knative application is scaled to zero, metrics such as the request concurrency and the number of requests sent to a pod per second cannot be collected by Managed Service for Prometheus. You can view these metrics in the console only after you access the pods of the Knative application.
When the number of pods for a Knative application is not zero, you can directly view the metrics in the console, such as the request concurrency and the number of requests sent to a pod per second. You do not need to access the pods of the Knative application.
References
You can configure auto scaling based on the number of concurrent pod requests and RPS configurations. For more information, see Enable auto scaling to withstand traffic fluctuations.