Deploy AHPA - Container Service for Kubernetes - Alibaba Cloud Documentation Center

Container Service for Kubernetes (ACK) provides the Advanced Horizontal Pod Autoscaler (AHPA) component that supports predictive scaling. The predictive scaling feature can prefetch resources for the scaling activities of applications that have periodical traffic patterns. This provides fast scaling for your applications. This topic describes how to deploy and use AHPA in your cluster.

Prerequisites
Step 1: Install the AHPA controller
Step 2: Add Managed Service for Prometheus as a data source
Step 3: Deploy a test service
Step 4: Deploy AHPA
Step 5: View the prediction results

Prerequisites

An ACK managed cluster or a ACK Serverless cluster is created. For more information, see Create an ACK managed cluster or Create an ACK Serverless cluster.
Managed Service for Prometheus is enabled, and application statistics within at least the last seven days are collected by Managed Service for Prometheus. The statistics include details about the CPU and memory resources that are used by an application. For more information about how to enable Managed Service for Prometheus, see Enable Managed Service for Prometheus.

Step 1: Install the AHPA controller

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Operations > Add-ons in the left-side navigation pane.
On the Add-ons page, click the Others tab or enter AHPA Controller into the search box in the upper-right corner of the page and click the Search icon. Then, click Install in the AHPA Controller card.
In the message that appears, click OK.

Step 2: Add Managed Service for Prometheus as a data source

Log on to the ARMS console.
In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.
In the upper-left corner of the Managed Service for Prometheus page, select the region of the Prometheus instance that monitors your cluster. Click the name (arms_metrics_{RegionId}_XXX) of the instance. In the left-side navigation pane, click Settings. In the HTTP API URL (Grafana Read URL) section, record the following information:
- Optional. If access tokens are enabled, you must configure an access token for your cluster. In this case, record the token.
- View and record the internal endpoint.
Specify the endpoint of the Prometheus instance in the cluster configurations.
1. Create a file named application-intelligence.yaml and copy the following content to the file.
  armsUrl specifies the endpoint of the Prometheus instance, and token specifies the access token. Set the parameters to the endpoint and token obtained in the previous step.
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: application-intelligence
  namespace: kube-system
data:
  armsUrl: "http://cn-hangzhou-intranet.arms.aliyuncs.com:9443/api/v1/prometheus/da9d7dece901db4c9fc7f5b9c40****/158120454317****/cc6df477a982145d986e3f79c985a****/cn-hangzhou"
  token: "eyJhxxxxx"
```
2. Run the following command to deploy application-intelligence:
```
kubectl apply -f application-intelligence.yaml
```

Step 3: Deploy a test service

Deploy a test service that consists of a Deployment named fib-deployment and a Service named fib-svc. Deploy an application named fib-loader that is used to send requests to the test service to simulate traffic fluctuation. Then, deploy Horizontal Pod Autoscaler (HPA) to scale the test service. This way, you can compare the HPA scaling results and the AHPA prediction results.

Create a file named demo.yaml and copy the following content to the file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fib-deployment
  namespace: default
  annotations:
    k8s.aliyun.com/eci-use-specs: "1-2Gi"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fib-deployment
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: fib-deployment
    spec:
      containers:
      - image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-server:20200820-171837
        imagePullPolicy: IfNotPresent
        name: user-container
        ports:
        - containerPort: 8080
          name: user-port
          protocol: TCP
        resources:
          limits:
            cpu: "1"
            memory: 2000Mi
          requests:
            cpu: "1"
            memory: 2000Mi
---
apiVersion: v1
kind: Service
metadata:
  name: fib-svc
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: fib-deployment
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fib-loader
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: fib-loader
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: fib-loader
    spec:
      containers:
      - args:
        - -c
        - |
          /ko-app/fib-loader --service-url="http://fib-svc.${NAMESPACE}?size=35&interval=0" --save-path=/tmp/fib-loader-chart.html
        command:
        - sh
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-loader:20201126-110434
        imagePullPolicy: IfNotPresent
        name: loader
        ports:
        - containerPort: 8090
          name: chart
          protocol: TCP
        resources:
          limits:
            cpu: "8"
            memory: 16000Mi
          requests:
            cpu: "2"
            memory: 4000Mi
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: fib-hpa
  namespace: default
spec:
  maxReplicas: 50
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fib-deployment
  targetCPUUtilizationPercentage: 50
---

Step 4: Deploy AHPA

To deploy AHPA and configure the AHPA policy, perform the following steps:

Create a file named ahpa-demo.yaml and copy the following content to the file:

apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedHorizontalPodAutoscaler
metadata:
  name: ahpa-demo
spec:
  scaleStrategy: observer
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 40
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fib-deployment 
  maxReplicas: 100
  minReplicas: 2
  stabilizationWindowSeconds: 300
  prediction:
    quantile: 95
    scaleUpForward: 180
  instanceBounds:
  - startTime: "2021-12-16 00:00:00"
    endTime: "2031-12-16 00:00:00"
    bounds:
    - cron: "* 0-8 ? * MON-FRI"
      maxReplicas: 15
      minReplicas: 4
    - cron: "* 9-15 ? * MON-FRI"
      maxReplicas: 15
      minReplicas: 10
    - cron: "* 16-23 ? * MON-FRI"
      maxReplicas: 20
      minReplicas: 15

The following table describes some of the parameters that are specified in the preceding code block.

Parameter	Required	Description
scaleTargetRef	Yes	The Deployment for which you want to configure predictive scaling.
metrics	Yes	The metrics based on which the AHPA policy is implemented. The following metrics are supported: CPU, GPU, memory, queries per second (QPS), and response time (RT).
metrics. resource. averageUtilization	Yes	The scaling threshold. If you specify `averageUtilization: 40`, the scaling threshold of CPU utilization is 40%.
scaleStrategy	No	The scaling mode of AHPA. Valid values: auto and observer. Default value: observer. auto: AHPA automatically performs scaling activities. observer: AHPA observes the resource usage but does not perform scaling activities. You can use the observer mode to check whether AHPA works as expected.
maxReplicas	Yes	The maximum number of replicated pods that can be provisioned for the application.
minReplicas	Yes	The minimum number of replicated pods that must run for the application.
stabilizationWindowSeconds	No	The cooldown period of scale-in activities. Default value: 300. Unit: seconds.
prediction. quantile	Yes	A quantile that indicates the probability of the actual metric value not reaching the scaling threshold. A larger value indicates a higher probability. Valid values: 0 to 1. Default value: 0.99. The value is accurate to two decimal places. We recommend that you set the parameter to a value from 0.90 to 0.99.
prediction. scaleUpForward	Yes	The duration of a cold start. The value represents the time period from the point in time when a pod is created to the point in time when the pod is in the Ready state.
instanceBounds	No	The duration of a scaling activity. startTime: the start time of a scaling activity. endTime: the end time of a scaling activity.
instanceBounds. bounds. cron	No	This parameter is used to create a scheduled scaling job. The cron expression `- cron: "* 0-8 ? * MON-FRI"` specifies the time period from 00:00:00 to 08:00:00 on Monday to Friday each month.

The following table describes the fields that are contained in a cron expression. For more information, see Cron expressions.

Field	Required	Valid value	Valid special character
Minutes	Yes	0~59	* / , -
Hours	Yes	0~23	* / , -
Day of Month	Yes	1~31	* / , – ?
Month	Yes	1 to 12 or JAN to DEC	* / , -
Day of Week	No	0 to 6 or SUN to SAT	* / , – ?

Note

The Month and Day of Week fields are not case-sensitive. For example, you can specify SUN, Sun, or sun.
The default value of the Day of Week field is *.
The following list describes the special characters:
- *: specifies an arbitrary value.
- /: specifies an increment.
- ,: separates a list of values.
- -: specifies a range.
- ?: specifies a placeholder.

Run the following command to apply the AHPA policy:
```
kubectl apply -f fib-deployment.yaml
```

Step 5: View the prediction results

In this section, an AHPA policy that uses the observer scaling mode is used as an example to check whether AHPA works as expected.

Note

The AHPA prediction results are generated based on historical data within the last seven days. Therefore, you must wait seven days after you apply the AHPA policy. To apply the AHPA policy to an existing application, specify the application in the configuration of the AHPA policy.

Run the following command to obtain the observer.html file. The file contains the AHPA prediction results that are compared with the HPA scaling results:

kubectl get --raw '/apis/metrics.alibabacloud.com/v1beta1/namespaces/default/predictionsobserver/fib-deployment'|jq -r '.content' |base64 -d > observer.html

Open the observer.html file and check the details.
The following figures show the AHPA prediction results that are compared with the HPA scaling results based on CPU usage.
- Predict CPU Observer: The actual CPU usage based on HPA is represented by a blue line. The CPU usage predicted by AHPA is represented by a green line. The predicted CPU usage is higher than the actual CPU usage.
- Predict POD Observer: The actual number of pods that are provisioned by HPA is represented by a blue line. The number of pods that are predicted by AHPA is represented by a green line. The predicted number of pods is less than the actual number of pods. You can set the scaling mode to auto and configure other settings based on the predicted number of pods. This way, AHPA can save pod resources.
The results show that AHPA can use predictive scaling to handle fluctuating workloads as expected. After you confirm the prediction results, you can set the scaling mode to auto, which allows AHPA to automatically scale pods.

Container Service for Kubernetes:Deploy AHPA

Table of contents