Container Service for Kubernetes (ACK) provides the Advanced Horizontal Pod Autoscaler (AHPA) component that supports predictive scaling. The predictive scaling feature can prefetch resources for scaling activities of applications that have periodical traffic patterns. This provides fast scaling for your applications. This topic describes how to deploy and use AHPA in your cluster.
Prerequisites
- An ACK managed cluster or a serverless Kubernetes (ASK) cluster is created. For more information, see Create an ACK managed cluster or Create an ASK cluster.
- Application Real-Time Monitoring Prometheus Service is enabled, and application statistics within at least the last seven days are collected by Prometheus Service. The statistics include details about the CPU and memory resources that are used by an application. For more information about how to enable Prometheus Service, see Enable ARMS Prometheus.
Step 1: Install Application Intelligence Controller
- Log on to the ACK console and click Clusters in the left-side navigation pane.
- On the Clusters page, click the name of a cluster and choose in the left-side navigation pane.
- On the Add-ons page, click the Others tab. Find Application Intelligence Controller and click Install.
- In the Install Application Intelligence Controller message, click OK.
Step 2: Add Prometheus Service as a data source
- Log on to the ARMS console.
- In the left-side navigation pane, choose .
- In the upper-left corner of the Prometheus Monitoring page, select the region where the Prometheus instance resides. Find the Prometheus instance and click Settings in the Actions column.
- On the Settings page, click the Settings tab and copy the internal endpoint of the Prometheus instance in the HTTP API Address section.
- Specify the internal endpoint of the Prometheus instance in the cluster configurations.
Step 3: Deploy a test service
Deploy a test service that consists of a Deployment named fib-deployment and a Service named fib-svc. Deploy an application named fib-loader that is used to send requests to the test service to simulate traffic fluctuation. Then, deploy Horizontal Pod Autoscaler (HPA) to scale the test service. This way, you can compare the HPA scaling results and the AHPA prediction results.
apiVersion: apps/v1
kind: Deployment
metadata:
name: fib-deployment
namespace: default
annotations:
k8s.aliyun.com/eci-use-specs: "1-2Gi"
spec:
replicas: 1
selector:
matchLabels:
app: fib-deployment
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: fib-deployment
spec:
containers:
- image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-server:20200820-171837
imagePullPolicy: IfNotPresent
name: user-container
ports:
- containerPort: 8080
name: user-port
protocol: TCP
resources:
limits:
cpu: "1"
memory: 2000Mi
requests:
cpu: "1"
memory: 2000Mi
---
apiVersion: v1
kind: Service
metadata:
name: fib-svc
namespace: default
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
selector:
app: fib-deployment
sessionAffinity: None
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fib-loader
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: fib-loader
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: fib-loader
spec:
containers:
- args:
- -c
- |
/ko-app/fib-loader --service-url="http://fib-svc.${NAMESPACE}?size=35&interval=0" --save-path=/tmp/fib-loader-chart.html
command:
- sh
env:
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-loader:20201126-110434
imagePullPolicy: IfNotPresent
name: loader
ports:
- containerPort: 8090
name: chart
protocol: TCP
resources:
limits:
cpu: "8"
memory: 16000Mi
requests:
cpu: "2"
memory: 4000Mi
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: fib-hpa
namespace: default
spec:
maxReplicas: 50
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: fib-deployment
targetCPUUtilizationPercentage: 50
---
Step 4: Deploy AHPA
To deploy AHPA and configure the AHPA policy, perform the following steps:
- Create a file named ahpa-demo.yaml and copy the following content to the file:
apiVersion: autoscaling.alibabacloud.com/v1beta1 kind: AdvancedHorizontalPodAutoscaler metadata: name: ahpa-demo spec: scaleStrategy:observer metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 40 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: fib-deployment maxReplicas: 100 minReplicas: 2 stabilizationWindowSeconds: 300 prediction: quantile: 95 scaleUpForward: 180 instanceBounds: - startTime: "2021-12-16 00:00:00" endTime: "2031-12-16 00:00:00" bounds: - cron: "* 0-8 ? * MON-FRI" maxReplicas: 15 minReplicas: 4 - cron: "* 9-15 ? * MON-FRI" maxReplicas: 15 minReplicas: 10 - cron: "* 16-23 ? * MON-FRI" maxReplicas: 20 minReplicas: 15
The following table describes some of the parameters that are specified in the preceding code block.
The following table describes the fields that are contained in a cron expression. For more information, see Cron expressions.Parameter Required Description scaleTargetRef Yes The Deployment for which you want to configure predictive scaling. metrics Yes The metrics based on which the AHPA policy is implemented. The following metrics are supported: CPU, GPU, memory, queries per second (QPS), and response time (RT). metrics. resource. averageUtilization Yes The scaling threshold. If you specify averageUtilization: 40
, the scaling threshold of CPU utilization is 40%.scaleStrategy No The scaling mode of AHPA. Valid values: auto and observer. Default value: observer. - auto: AHPA automatically performs scaling activities.
- observer: AHPA observes the resource usage but does not perform scaling activities. You can use the observer mode to check whether AHPA works as expected.
maxReplicas Yes The maximum number of replicated pods that can be provisioned for the application. minReplicas Yes The minimum number of replicated pods that must run for the application. stabilizationWindowSeconds No The cooldown period of scale-in activities. Default value: 300. prediction. quantile Yes A quantile that indicates the probability of the actual metric value not reaching the scaling threshold. A larger value indicates a higher probability. Valid values: 0 to 100. Default value: 99. We recommend that you set the parameter to a value from 90 to 99. prediction. scaleUpForward Yes The duration of a cold start. The value represents the time period from the point in time when a pod is created to the point in time when the pod is in the Ready state. instanceBounds No The duration of a scaling activity. - startTime: the start time of a scaling activity.
- endTime: the end time of a scaling activity.
instanceBounds. bounds. cron No This parameter is used to create a scheduled scaling job. The cron expression - cron: "* 0-8 ? * MON-FRI"
specifies the time period from 00:00:00 to 08:00:00 on Monday to Friday each month.Field Required Valid value Valid special character Minutes Yes 0~59 * / , - Hours Yes 0~23 * / , - Day of Month Yes 1~31 * / , – ? Month Yes 1 to 12 or JAN to DEC * / , - Day of Week No 0 to 6 or SUN to SAT * / , – ? Note- The Month and Day of Week fields are not case-sensitive. For example, you can specify
SUN
,Sun
, orsun
. - The default value of the Day of Week field is
*
. - The following list describes the special characters:
*
: specifies an arbitrary value./
: specifies an increment.,
: separates a list of values.-
: specifies a range.?
: specifies a placeholder.
- Run the following command to apply the AHPA policy:
kubectl apply -f fib-deployment.yaml
Step 5: View the prediction results
In this section, an AHPA policy that uses the observer
scaling mode is used as an example to check whether AHPA works as expected.
- Run the following command to obtain the observer.html file. The file contains the AHPA prediction results that are compared with the HPA scaling results.
kubectl get --raw '/apis/metrics.alibabacloud.com/v1beta1/namespaces/default/predictionsobserver/fib-deployment'|jq -r '.content' |base64 -d > observer.html
- Open the observer.html file and check the details. The following figures show the AHPA prediction results that are compared with the HPA scaling results based on CPU usage.
- Predict CPU Observer: The actual CPU usage based on HPA is represented by a blue line. The CPU usage predicted by AHPA is represented by a green line. The predicted CPU usage is higher than the actual CPU usage.
- Predict POD Observer: The actual number of pods that are provisioned by HPA is represented by a blue line. The number of pods that are predicted by AHPA is represented by a green line. The predicted number of pods is less than the actual number of pods. You can set the scaling mode to
auto
and configure other settings based on the predicted number of pods. This way, AHPA can save pod resources.
The results show that AHPA can use predictive scaling to handle fluctuating workloads as expected. After you confirm the prediction results, you can set the scaling mode to
auto
, which allows AHPA to automatically scale pods. - Predict CPU Observer: The actual CPU usage based on HPA is represented by a blue line. The CPU usage predicted by AHPA is represented by a green line. The predicted CPU usage is higher than the actual CPU usage.