All Products
Search
Document Center

Container Service for Kubernetes:Deploy AHPA

Last Updated:Dec 25, 2025

Container Service for Kubernetes supports Advanced Horizontal Pod Autoscaler (AHPA). AHPA learns from and analyzes historical data to predict future resource needs and dynamically adjusts the number of pod replicas. This ensures that resources are scaled out and prefetched before demand peaks, which improves system response speed and stability. When a low-traffic period is predicted, AHPA also scales in resources at the appropriate time to save costs.

Prerequisites

Step 1: Install the AHPA Controller

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, find the one you want to manage and click its name. In the left navigation pane, click Add-ons.

  3. On the Component Management page, find the AHPA Controller component, click Install on its card, and follow the on-screen prompts to complete the installation.

Step 2: Configure the Prometheus data source

  1. Log on to the ARMS console.

  2. In the left navigation pane, choose Managed Service for Prometheus > Instances.

  3. At the top of the Instances page, select the region of the Prometheus instance, and then click the name of the target instance. The instance name is the same as the ACK cluster name.

  4. On the Settings page, in the HTTP API URL (Grafana Read URL) section, record the values for the following configuration items.

    • (Optional) If a token is enabled, record the access token.

    • Record the Internal Network endpoint (Prometheus URL).

  5. Set the Prometheus query endpoint in the ACK cluster.

    1. Create a file named application-intelligence.yaml with the following content.

      • prometheusUrl: The endpoint of Managed Service for Prometheus.

      • token: The access token for Prometheus.

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: application-intelligence
        namespace: kube-system
      data:
        prometheusUrl: "http://cn-hangzhou-intranet.arms.aliyuncs.com:9443/api/v1/prometheus/da9d7dece901db4c9fc7f5b9c40****/158120454317****/cc6df477a982145d986e3f79c985a****/cn-hangzhou"
        token: "eyJhxxxxx"
      Note

      To view the AHPA dashboard in Managed Service for Prometheus, you also need to configure the following fields in this ConfigMap:

      • prometheus_writer_url: The private endpoint for Remote Write.

      • prometheus_writer_ak: The AccessKey ID of your Alibaba Cloud account.

      • prometheus_writer_sk: The AccessKey secret of your Alibaba Cloud account.

      For more information, see Enable the Prometheus dashboard for AHPA.

    2. Run the following command to deploy the `application-intelligence` configuration.

      kubectl apply -f application-intelligence.yaml

Step 3: Deploy a test service

The test service includes fib-deployment, fib-svc, and fib-loader, which simulates request peaks and troughs. A Horizontal Pod Autoscaler (HPA) resource is also deployed to compare its scaling results with those of AHPA.

Create a file named demo.yaml with the following content.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fib-deployment
  namespace: default
  annotations:
    k8s.aliyun.com/eci-use-specs: "1-2Gi"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fib-deployment
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: fib-deployment
    spec:
      containers:
      - image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-server:20200820-171837
        imagePullPolicy: IfNotPresent
        name: user-container
        ports:
        - containerPort: 8080
          name: user-port
          protocol: TCP
        resources:
          limits:
            cpu: "1"
            memory: 2000Mi
          requests:
            cpu: "1"
            memory: 2000Mi
---
apiVersion: v1
kind: Service
metadata:
  name: fib-svc
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: fib-deployment
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fib-loader
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: fib-loader
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: fib-loader
    spec:
      containers:
      - args:
        - -c
        - |
          /ko-app/fib-loader --service-url="http://fib-svc.${NAMESPACE}?size=35&interval=0" --save-path=/tmp/fib-loader-chart.html
        command:
        - sh
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-loader:20201126-110434
        imagePullPolicy: IfNotPresent
        name: loader
        ports:
        - containerPort: 8090
          name: chart
          protocol: TCP
        resources:
          limits:
            cpu: "8"
            memory: 16000Mi
          requests:
            cpu: "2"
            memory: 4000Mi
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: fib-hpa
  namespace: default
spec:
  maxReplicas: 50
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fib-deployment
  targetCPUUtilizationPercentage: 50
---

Step 4: Deploy AHPA

Configure an elastic policy by submitting an `AdvancedHorizontalPodAutoscaler` resource.

  1. Create a file named ahpa-demo.yaml with the following content.

    apiVersion: autoscaling.alibabacloud.com/v1beta1
    kind: AdvancedHorizontalPodAutoscaler
    metadata:
      name: ahpa-demo
    spec:
      scaleStrategy: observer
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 40
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: fib-deployment 
      maxReplicas: 100
      minReplicas: 2
      stabilizationWindowSeconds: 300
      prediction:
        quantile: 0.95
        scaleUpForward: 180
      instanceBounds:
      - startTime: "2021-12-16 00:00:00"
        endTime: "2031-12-16 00:00:00"
        bounds:
        - cron: "* 0-8 ? * MON-FRI"
          maxReplicas: 15
          minReplicas: 4
        - cron: "* 9-15 ? * MON-FRI"
          maxReplicas: 15
          minReplicas: 10
        - cron: "* 16-23 ? * MON-FRI"
          maxReplicas: 20
          minReplicas: 15

    The following describes some of the parameters:

    Parameter

    Required

    Description

    scaleTargetRef

    Yes

    Specifies the target deployment.

    metrics

    Yes

    Configures the scaling metrics. Supported metrics include CPU, GPU, Memory, QPS, and RT.

    target

    Yes

    The target threshold. For example, averageUtilization: 40 indicates that the target CPU utilization is 40%.

    scaleStrategy

    No

    Sets the scaling mode. The default value is observer.

    • auto: AHPA performs scaling operations.

    • observer: AHPA only observes but does not perform scaling actions. You can use this mode to check if AHPA works as expected.

    • proactive: Only proactive prediction takes effect.

    • reactive: Only reactive prediction takes effect.

    maxReplicas

    Yes

    The maximum number of replicas for scale-out.

    minReplicas

    Yes

    The minimum number of replicas for scale-in.

    stabilizationWindowSeconds

    No

    The cooldown time for scale-in. The default value is 300 seconds.

    prediction. quantile

    Yes

    The prediction quantile. A higher value indicates a more conservative prediction, meaning a higher probability that the actual metric value will be below the target value. The value must be between 0 and 1, and supports two decimal places. The default value is 0.99. A value from 0.90 to 0.99 is recommended.

    prediction. scaleUpForward

    Yes

    The time required for a pod to become Ready (cold start time).

    instanceBounds

    No

    The bounds for the number of instances during a scaling period.

    • startTime: The start time.

    • endTime: The end time.

    instanceBounds. bounds. cron

    No

    Configures a scheduled task. The cron expression represents a set of times. It uses five space-separated fields. For example, - cron: "* 0-8 ? * MON-FRI" indicates that the task runs from 00:00 to 08:59 every Monday to Friday.

    The following table describes the fields in a cron expression. For more information, see Scheduled tasks.

    Field name

    Required

    Allowed values

    Allowed special characters

    Minutes

    Yes

    0 to 59

    * / , -

    Hours

    Yes

    0 to 23

    * / , -

    Day of Month

    Yes

    1 to 31

    * / , – ?

    Month

    Yes

    1 to 12 or JAN to DEC

    * / , -

    Day of Week

    No

    0 to 6 or SUN to SAT

    * / , – ?

    Note
    • The values for the Month and Day of Week fields are not case-sensitive. For example, SUN, Sun, and sun have the same effect.

    • If the Day of Week field is not configured, the default value is *.

    • Special characters:

      • *: Indicates all possible values.

      • /: Specifies an increment for a numeric value.

      • ,: Lists enumerated values.

      • -: Indicates a range.

      • ?: Indicates that no specific value is specified.

  2. Run the following command to create the AHPA elastic policy.

    kubectl apply -f ahpa-demo.yaml

Step 5: View the prediction results

To view the AHPA elastic prediction results, enable the Prometheus dashboard for AHPA.

Note

Because prediction requires seven days of historical data, you can view the prediction results only after the sample deployment has run for at least seven days. If you have an existing online application, you can directly specify it in the AHPA configuration.

This topic uses the observer mode as an example. In this mode, the results are compared with the HPA policy. This comparison provides a reference for the actual resources that the application requires and helps you verify whether the AHPA prediction results meet expectations.

image.png

  • Actual and predicted CPU usage: The green curve represents the actual CPU usage with HPA. The yellow curve represents the CPU usage predicted by AHPA.

    • The yellow curve is above the green curve, which indicates that the predicted CPU capacity is sufficient.

    • The yellow curve leads the green curve, which indicates that the required resources are prepared in advance.

  • Pod trend: The green curve represents the actual number of pods scaled by HPA. The yellow curve represents the number of pods predicted by AHPA.

    • The yellow curve is below the green curve, which indicates that AHPA predicts that fewer pods are required.

    • The yellow curve is smoother than the green curve, which indicates that scaling with AHPA causes fewer fluctuations and improves service stability.

The prediction trend meets expectations. After a period of observation, if the results meet your expectations, you can set the scaling mode to auto to allow AHPA to manage scaling.

Related documents