For workloads such as web services or message processing, scaling on CPU or memory alone often wastes resources or adds latency, because real load correlates with business signals such as queries per second (QPS) and queue depth. Configure AdvancedHorizontalPodAutoscaler (AHPA) to autoscale on Managed Service for Prometheus metrics matching real traffic.
How it works
|
Before you begin
You have deployed the AHPA add-on and configured the Prometheus data source. For more information, see Deploy AHPA.
You have enabled Managed Service for Prometheus.
Step 1: Deploy the application and ServiceMonitor
First, deploy a sample application that can expose custom metrics, then configure a ServiceMonitor so that Prometheus can scrape the application's metrics endpoint.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
On the Deployments page, click Create from YAML, and follow the on-screen instructions to deploy the YAML, which creates the
sample-appapplication, aServiceto provide in-cluster access, and aServiceMonitorfor metrics collection.Deployment
This container exposes the custom metric
requests_per_secondat the/metricspath on port 8080, which indicates the number of requests per second.Service
This creates a stable in-cluster endpoint for the Deployment.
ServiceMonitor
After this resource is created, metric scraping begins. ServiceMonitor is enabled by default. To verify its status, see Enable features.
Step 2: Deploy the metrics adapter
The metrics adapter acts as a bridge between AHPA and Prometheus. After deploying the add-on, configure it to connect to your Managed Service for Prometheus instance.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
Click Create, then search for and deploy
ack-alibaba-cloud-metrics-adapter.Chart Version: Use the latest version.
Parameter configuration: In the Chart YAML, in the Parameter section, configure
prometheus.urlandprometheus.prometheusHeader, and then click OK.prometheus.url: The HTTP API address of Managed Service for Prometheus (the Grafana read endpoint). For more information, see How to obtain the Prometheus data request URL.prometheus.prometheusHeader:
Step 3: Configure custom metrics
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
Locate
ack-alibaba-cloud-metrics-adapterand click Actions in the Actions column.Replace the corresponding parameters in the template with the following YAML content, and then click Update.
In the example, replace
requests_per_secondwith the actual metric for requests per second in Prometheus....... prometheus: adapter: rules: custom: - metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) name: as: requests_per_second resources: overrides: namespace: resource: namespace seriesQuery: requests_per_second # Set the metric name. Make sure this name matches the metric in Managed Service for Prometheus. ......Use the Custom Metrics API to view details about the available metrics.
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/requests_per_second"Expected output:
{"kind":"ExternalMetricValueList","apiVersion":"external.metrics.k8s.io/v1beta1","metadata":{},"items":[{"metricName":"requests_per_second","metricLabels":{},"timestamp":"2025-10-15T07:57:00Z","value":"1"}]}
Step 4: Create an AHPA rule and verify scaling
Next, create an AHPA rule to automate scaling and run a stress test to verify its behavior.
Create an AHPA resource.
Use the following YAML to create an AHPA resource. AHPA scales out when the average
requests_per_secondvalue per pod exceeds10, and scales in when it falls below10.Configure
external.metricby specifying the metric name andmatchLabels. The metric name must match the one specified in Configure custom metrics. In this example, the custom metric is set torequests_per_second.Set the target threshold. For example, set
AverageValueto10. This means that a scale-out begins if the number of requests per second exceeds 10.
After the stress test, check the status of the AHPA object.
kubectl get ahpaExpected output:
NAME STRATEGY PERIODICITY REFERENCE METRIC TARGETS DESIREDPODS REPLICAS MINPODS MAXPODS AGE customer-deployment observer Deployment/sample-app requests_per_second 16/10 2 1 1 50 102sTARGETS: The current metric value is16, and the target value is10.DESIREDPODS: AHPA calculates the desired number of replicas as2based onCurrent Value (16) / Target Value (10) = 2.REPLICAS: Displays the actual number of replicas ofsample-app.Because the current AHPA's
STRATEGYisobserver, it only performs calculations and observations and does not execute scaling operations. Therefore, even thoughDESIREDPODSis 2,REPLICASremains1.
Run the
kubectl get deployment sample-appcommand to check the real-time changes in the pod replica count.
Production considerations
Aspect | Description |
Metric selection | Use smoothed metrics that reflect the actual workload rather than instantaneous values. This prevents scaling fluctuation caused by traffic spikes. |
Scaling policy configuration |
|
Monitoring and alerting | Set up alerts for the AHPA operational status to promptly identify potential issues, such as capacity bottlenecks, improperly configured policies, or abnormal upstream traffic. |
