You can deploy an application in multiple pods to improve application stability. However, this method increases costs and causes resource waste during off-peak hours. You can also manually scale the pods of your application. However, this method increases your O&M workload and pods cannot be scaled in real time. To resolve the preceding issues, you can configure horizontal pod autoscaling for multiple applications based on the metrics of the NGINX Ingress controller. This way, the pods of applications are automatically scaled based on loads. This method improves the stability and resilience of applications, optimizes resource usage, and reduces costs. This topic describes how to configure horizontal pod autoscaling for multiple applications based on the metrics of the NGINX Ingress controller.
Prerequisites
To configure horizontal pod autoscaling for multiple applications based on the metrics of the NGINX Ingress controller, you must convert Managed Service for Prometheus metrics to metrics that are supported by Horizontal Pod Autoscaler (HPA) and deploy the required components.
The Managed Service for Prometheus component is installed. For more information, see Enable Managed Service for Prometheus.
alibaba-cloud-metrics-adapter is installed. For more information, see Horizontal pod scaling based on Managed Service for Prometheus metrics.
The stress testing tool Apache Benchmark is installed. For more information, see Apache Benchmark.
Feature description
An Ingress is a Kubernetes API object that forwards external requests to Services in a Kubernetes cluster. The Services route the requests to the backend pods. To automatically scale the pods of an application based on the number of requests in the production environment, you can use the http_requests_total
metric to collect the number of requests. You can also implement horizontal pod autoscaling based on the metrics of the NGINX Ingress controller.
The NGINX Ingress controller is deployed in a Container Service for Kubernetes (ACK) cluster to control the Ingresses in the cluster. The NGINX Ingress controller provides high-performance and custom traffic management. The NGINX Ingress controller provided by ACK is developed based on the open source version and provides enhanced capabilities to offer a simplified user experience.
In this topic, two Deployments are created and a Service is created for each Deployment. An Ingress is created to route external requests to the two Deployments based on different path matching rules. In the following example, HPA is configured to automatically scale pods based on the nginx_ingress_controller_requests
metric, which indicates the traffic loads. In the following example, HPA has the selector.matchLabels.service
field configured, which is used to filter metrics.
Step 1: Create applications and Services
Use the following YAML template to create Deployments and Services.
Create a file named nginx1.yaml and copy the following content to the file:
Run the following command to create an application named test-app and a Service that is used to expose the application:
kubectl apply -f nginx1.yaml
Create a file named nginx2.yaml and copy the following content to the file:
Run the following command to create an application named sample-app and a Service that is used to expose the application:
kubectl apply -f nginx2.yaml
Step 2: Create an Ingress
Create a file named ingress.yaml and copy the following content to the file:
Run the following command to deploy an Ingress:
kubectl apply -f ingress.yaml
Run the following command to query the Ingress:
kubectl get ingress -o wide
Expected output:
NAME CLASS HOSTS ADDRESS PORTS AGE test-ingress nginx test.example.com 10.10.10.10 80 55s
After you deploy the preceding resources, you can send requests to the
/
and/home
paths to access the specified host. The NGINX Ingress controller automatically routes your requests to the sample-app and test-app applications based on the URL paths of the requests. You can obtain information about the requests to each application from thenginx_ingress_controller_request
metric in Managed Service for Prometheus.
Step 3: Convert Managed Service for Prometheus metrics to metrics supported by HPA
Modify the adapter.config file of the ack-alibaba-cloud-metrics-adapter component .
NoteBefore you modify the adapter.config file, make sure that the ack-alibaba-cloud-metrics-adapter component is installed in your cluster and the prometheus.url file is configured.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
Click ack-alibaba-cloud-metrics-adapter, In the Resource section, click adapter-config. In the upper-right corner of the adapter-config page, click Edit YAML. Replace the code in the Value field with the following content, and then click OK in the lower part of the page.
For more information about the fields in the following code block, see Implement horizontal auto scaling based on Prometheus metrics.
rules: - metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) name: as: ${1}_per_second matches: ^(.*)_requests resources: namespaced: false seriesQuery: nginx_ingress_controller_requests
Run the following command to query a metric:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/*/nginx_ingress_controller_per_second" | jq .
Expected output:
{ "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "metadata": {}, "items": [ { "metricName": "nginx_ingress_controller_per_second", "metricLabels": {}, "timestamp": "2025-02-28T11:34:56Z", "value": "0" } ] }
Step 4: Deploy HPA
Create a file named hpa.yaml and copy the following content to the file:
Run the following command to deploy HPA for the sample-app and test-app applications separately:
kubectl apply -f hpa.yaml
Run the following command to query the deployment progress of HPA:
kubectl get hpa
Expected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-hpa Deployment/sample-app 0/30 (avg) 1 10 1 74s test-hpa Deployment/test-app 0/30 (avg) 1 10 1 59m
Step: Verify the result
After you configure HPA, perform stress tests to check whether the pods of the applications are automatically scaled out when the number of requests increases.
Run the following command to perform stress tests on the
/home
URL path of the host:ab -c 50 -n 5000 test.example.com/home
Run the following command to query the HPA information:
kubectl get hpa
Expected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-hpa Deployment/sample-app 0/30 (avg) 1 10 1 22m test-hpa Deployment/test-app 22096m/30 (avg) 1 10 3 80m
Run the following command to perform stress tests on the root path of the host:
ab -c 50 -n 5000 test.example.com/
Run the following command to query the HPA information:
kubectl get hpa
Expected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-hpa Deployment/sample-app 27778m/30 (avg) 1 10 2 38m test-hpa Deployment/test-app 0/30 (avg) 1 10 1 96m
The output shows that the pods of the applications are automatically scaled out when the number of requests exceeds the scaling threshold.
References
Multi-zone load balancing is a deployment solution commonly used in high availability (HA) scenarios for data services. If an application that is deployed across zones does not have sufficient resources to handle heavy workloads, you may want ACK to create a specific number of nodes in each zone of the application. For more information, see Configure auto scaling for cross-zone deployment.
For more information about how to create custom images to accelerate horizontal pod autoscaling in complex scenarios, see Create custom images.