Deploying multiple instances improves application stability but can lead to idle resources and higher cluster costs. Manual scaling is labor-intensive and often lags. You can use metrics from an Nginx Ingress to drive the Horizontal Pod Autoscaler (HPA), which dynamically adjusts the number of pod replicas based on workload. This approach ensures application stability and fast responses while optimizing resource utilization and reducing costs. This topic describes how to use Nginx Ingress traffic metrics to autoscale multiple applications.
An Ingress forwards external requests to a Service within the cluster, and the Service then routes the requests to a pod. In a production environment, you can configure automatic scaling based on request volume. The Nginx Ingress Controller exposes this volume through the nginx_ingress_controller_requests metric, which you can use as a metric source for HPA. The Nginx Ingress Controller in ACK clusters is an enhanced version of the community edition and offers a more streamlined user experience.
Prerequisites
Before you can autoscale applications based on Nginx Ingress traffic, you must configure the ack-alibaba-cloud-metrics-adapter component to expose Alibaba Cloud Prometheus metrics for the HPA.
-
The Alibaba Cloud Prometheus monitoring component is deployed. For more information, see Use Alibaba Cloud Prometheus for monitoring.
-
The ack-alibaba-cloud-metrics-adapter component is deployed and its
prometheus.urlfield is configured. -
The Apache Benchmark stress test tool is installed.
In this tutorial, you will create two Deployments and their corresponding Services. You will then configure an Ingress with different access paths to route external traffic. Finally, you will configure an HPA for the applications based on the nginx_ingress_controller_requests metric and use the HPA's selector.matchLabels.service field to filter the metric. This enables pods to scale automatically based on traffic.
Step 1: Create applications and services
Use the following YAML manifests to create the application Deployments and their corresponding Services.
-
Create a file named
nginx1.yamland copy the following content into it.Run the following command to create the
test-appapplication and its corresponding Service.kubectl apply -f nginx1.yaml -
Create a file named
nginx2.yamland copy the following content into it.Run the following command to create the
sample-appapplication and its corresponding Service.kubectl apply -f nginx2.yaml
Step 2: Create an Ingress
-
Create a file named
ingress.yamland copy the following content into it.Run the following command to deploy the Ingress resource.
kubectl apply -f ingress.yaml -
Run the following command to get the Ingress resource.
kubectl get ingress -o wideExpected output:
NAME CLASS HOSTS ADDRESS PORTS AGE test-ingress nginx test.example.com 10.XX.XX.10 80 55sAfter a successful deployment, you can access the Host address by using the
/and/homepaths. The Nginx Ingress Controller routes traffic to sample-app and test-app respectively based on the preceding configuration. You can query thenginx_ingress_controller_requestsmetric in Alibaba Cloud Prometheus to obtain request information for each application.
Step 3: Convert Prometheus metrics for HPA
Configure metrics adapter
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Helm page, click ack-alibaba-cloud-metrics-adapter. In the Resource section, click adapter-config, and then click Edit YAML in the upper-right corner of the page.
-
Replace the existing rules in the configuration with the following content. Then, click OK at the bottom of the page.
For more information, see Horizontal pod autoscaling based on Alibaba Cloud Prometheus metrics.
rules: - metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) name: as: ${1}_per_second matches: ^(.*)_requests resources: namespaced: false seriesQuery: nginx_ingress_controller_requestsapiVersion: v1 data: config.yaml: > rules: - metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) name: as: ${1}_per_second matches: ^(.*)_requests resources: namespaced: false seriesQuery: nginx_ingress_controller_requests
View the metric output
Run the following command to view the metric output.
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/*/nginx_ingress_controller_per_second" | jq .
The query result is as follows:
{
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"metricName": "nginx_ingress_controller_per_second",
"metricLabels": {},
"timestamp": "2025-07-25T07:56:04Z",
"value": "0"
}
]
}
Step 4: Create HPAs
-
Create a file named
hpa.yamland copy the following content into it.Run the following command to deploy the HPAs for the
sample-appandtest-appapplications.kubectl apply -f hpa.yaml -
Run the following command to check the HPA deployment status.
kubectl get hpaExpected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-hpa Deployment/sample-app 0/30 (avg) 1 10 1 74s test-hpa Deployment/test-app 0/30 (avg) 1 10 1 59m
Step 5: Verify the results
After the HPAs are deployed, use the Apache Benchmark tool to run a stress test and verify that the application pods scale out as request volume increases.
-
Run the following command to perform a stress test on the
/homepath on the host.ab -c 50 -n 5000 test.example.com/home -
Run the following command to check the HPA status.
kubectl get hpaExpected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-hpa Deployment/sample-app 0/30 (avg) 1 10 1 22m test-hpa Deployment/test-app 22096m/30 (avg) 1 10 3 80m -
Run the following command to stress test the root path of the host.
ab -c 50 -n 5000 test.example.com/ -
Run the following command to check the HPA status.
kubectl get hpaExpected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-hpa Deployment/sample-app 27778m/30 (avg) 1 10 2 38m test-hpa Deployment/test-app 0/30 (avg) 1 10 1 96mThe results show that the applications successfully scaled out when the request volume exceeded the threshold.
Related documents
-
Multi-zone balancing is a common deployment method for data-intensive services in high-availability scenarios. When the workload increases, applications that use a multi-zone balanced scheduling policy must automatically scale out instances across multiple zones to meet cluster scheduling demands. For more information, see Implement rapid and simultaneous elastic scaling across multiple zones.
-
You can build custom operating system images to simplify elastic scaling in complex scenarios. For more information, see Elastic optimization with custom images.