Container Service for Kubernetes (ACK) clusters provide the Advanced Horizontal Pod Autoscaler (AHPA) component that supports predictive scaling. This topic describes how to use AHPA to configure predictive scaling.


  • AHPA is in invitational preview. To use AHPA, Submit a ticket to apply to be added to a whitelist.
  • An ACK managed cluster is created. For more information, see Create an ACK managed cluster.
  • Application Real-Time Monitoring Service (ARMS) Prometheus is enabled, and application statistics within at least seven days are collected by ARMS Prometheus. The statistics include details about the CPU and memory resources that are used by an application. For more information about how to enable ARMS Prometheus, see Enable ARMS Prometheus.

Background information

The following traditional methods are used to manage the pods of an application: manually specify the number of pods, use Horizontal Pod Autoscaler (HPA), and use CronHPA. The following table describes the disadvantages of the preceding methods.

Method Disadvantage
Manually specify the number of pods Resources are wasted during off-peak hours. Idle resources are still billed.
  • Scaling activities are performed after a scaling delay. Scale-out activities are triggered only if the resource usage exceeds the threshold and scale-in activities are triggered only if the resource usage drops below the threshold.
  • New pods require a period of time to enter the Ready state. This increases the response time and even causes request timeouts.
  • The precision of scheduled pod scaling is low. If a schedule is not properly specified, resources may be wasted.
  • You must modify the scaling policy to adapt to fluctuating workloads.
ACK clusters provide the AHPA component that supports predictive scaling. You can use AHPA to increase resource utilization and improve the efficiency of resource management. AHPA can analyze historical data and predict the number of pods that are required per minute within the next 24 hours. If you use CronHPA, you must manually create 1,440 (24 hours × 60 minutes) schedules instead. The following figure shows the difference between traditional horizontal pod scaling and predictive horizontal pod scaling.
  • Traditional horizontal pod scaling: Scale-out activities are triggered after the amount of workloads increases. The system cannot provision pods at the earliest opportunity to handle fluctuating workloads due to the scaling delay.
  • Predictive horizontal pod scaling: AHPA learns the pattern of workload fluctuations based on the historical values of specific metrics and the amount of time that a pod spent before the pod entered the Ready state. This way, AHPA can provision pods that are ready to be scheduled before a traffic spike occurs. This ensures that resources are allocated at the earliest opportunity.

Step 1: Install Application Intelligence Controller

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  4. In the left-side navigation pane of the details page, choose Operations > Add-ons.
  5. On the Add-ons page, click the Others tab, find Application Intelligence Controller, and then click Install.
  6. In the Install Application Intelligence Controller message, click OK.

Step 2: Add ARMS Prometheus as a data source

  1. Log on to the ARMS console .
  2. In the left-side navigation pane, choose Prometheus Monitoring > Prometheus Instances.
  3. In the upper-left corner of the Prometheus Monitoring page, select the region in which your Prometheus instances are deployed. Then, click the name of a Prometheus instance whose Instance Type is Prometheus for Container Service. The details page of the Prometheus instance appears.
  4. In the left-side navigation pane of the instance details page, click Settings and copy the public endpoint in the HTTP API Address section.
  5. In the HTTP API Address section, click Generate Token to generate a token. The token is used to pass the authentication when you access the Prometheus instance.
  6. Create a ConfigMap named application-intelligence.yaml based on the following content and specify the public endpoint of the Prometheus instance:
    apiVersion: v1
    kind: ConfigMap
      name: application-intelligence
      namespace: kube-system
      armsUrl: "****/158120454317****/cc6df477a982145d986e3f79c985a****/cn-hangzhou"
      token: "****"
    Descriptions of parameters:
    • armsUrl: Specify the public endpoint of the Prometheus instance that you copied in Step 4.
    • token: Specify the token that was generated in Step 5.
  7. Run the following command to deploy the ConfigMap:
    kubectl apply -f application-intelligence.yaml

Step 3: Configure AHPA

  1. Create an AHPA policy with the following content:
    kind: AdvancedHorizontalPodAutoscaler
      name: ahpa-demo
      scaleStrategy: observer
      - type: Resource
          name: cpu
            type: Utilization
            averageUtilization: 40
        apiVersion: apps/v1
        kind: Deployment
        name: php-apache
      maxReplicas: 100
      minReplicas: 2
        quantile: 95
        scaleUpForward: 180
      - startTime: "2021-12-16 00:00:00"
        endTime: "2022-12-16 24:00:00"
        - cron: "* 0-8 ? * MON-FRI"
          maxReplicas: 4
          minReplicas: 15
        - cron: "* 8-15 ? * MON-FRI"
          maxReplicas: 10
          minReplicas: 15
        - cron: "* 15-24 ? * MON-FRI"
          maxReplicas: 15
          minReplicas: 20

    The following table describes some parameters that are specified in the preceding code block.

    Parameter Description
    scaleTargetRef Required. The Deployment for which you want to configure predictive scaling.
    metrics Required. The metrics based on which predictive scaling is performed. You can specify CPU and memory metrics.
    averageUtilization Required. The threshold that is used to trigger scaling activities. For example, a value of 40 indicates that the CPU or memory usage threshold is set to 40%.
    scaleStrategy Optional. The scaling mode. Valid values: auto and observer. Default value: observer.
    • auto: AHPA automatically performs scaling activities.
    • observer: AHPA observes the resource usage but does not perform scaling activities. You can use the observer mode to check whether AHPA works as expected.
    maxReplicas Required. The maximum number of pods that can be provisioned for the application.
    minReplicas Required. The minimum number of pods that must be provisioned for the application.
    instanceBounds Optional. The duration of a scaling activity.
    • startTime: the start time of a scaling activity.
    • endTime: the end time of a scaling activity.
    cron Optional. This parameter is used to create a scheduled scaling job. The cron expression contains five fields that are separated by space characters. These fields specify a schedule. For example, - cron: "* 0-8 ? * MON-FRI" specifies the time period from 00:00:00 to 08:00:00 on Monday to Friday each month.
    The following table describes the fields that are contained in a cron expression. For more information, see Cron expression.
    Field Required Valid value Valid special character
    Minutes Yes 0~59 * / , -
    Hours Yes 0~23 * / , -
    Day of Month Yes 1~31 * / , – ?
    Month Yes 1 to 12 or JAN to DEC * / , -
    Day of Week No 0 to 6 or SUN to SAT * / , – ?
    • The Month and Day of Week fields are not case-sensitive. For example, you can specify SUN, Sun, or sun.
    • The default value of the Day of Week field is *.
    • Descriptions of special characters:
      • *: specifies an arbitrary value.
      • /: specifies an increment.
      • ,: separates a list of values.
      • -: specifies a range.
      • ?: specifies a placeholder.
  2. Run the following command to apply the AHPA policy:
    kubectl apply -f ahpa-demo.yaml

Verify the AHPA policy

In this section, an AHPA policy that uses the observer scaling mode is used as an example to check whether AHPA works as expected.

  1. Run the following command to obtain the observer.html file. The file contains the AHPA prediction results that are compared with the HPA scaling results.
    kubectl get --raw '/apis/'jq -r '.content' base64 -d > observer.html
  2. Open the observer.html file and check the details. The following figures show the AHPA prediction results that are compared with the HPA scaling results based on the CPU usage.
    • Predict CPU Observer: The actual CPU usage based on HPA is represented by a blue line. The CPU usage predicted by AHPA is represented by a green line. The predicted CPU usage is higher than the actual CPU usage. CPU usage prediction
    • Predict POD Observer: The actual number of pods that are provisioned by HPA is represented by a blue line. The number of pods that are predicted by AHPA is represented by a green line. The predicted number of pods is less than the actual number of pods. You can set the scaling mode to auto and configure other settings based on the predicted number of pods. This way, AHPA can save pod resources. pod usage prediction

    The results show that AHPA can use predictive scaling to handle fluctuating workloads as expected. After you confirm the prediction results, you can set the scaling mode to auto, which allows AHPA to automatically scale pods.