By Xingji and Yusheng Guo
Test the Prometheus service discovery feature under ECS scaling scenarios. This involves normal scaling, alternating scaling, attaching filter tags and no tags, as well as simultaneously configuring multiple filter tags.
The Alibaba Cloud ECS service discovery feature of Prometheus supports both the discovery of instances with or without filter tags, as well as dynamic scaling of ECS counts. It has been proved that the above functions can work normally.
This test conducted a stress test on the service discovery mechanism of Prometheus on Alibaba Cloud ECS. It verified normal functionality in terms of scaling behavior and ECS tag filtering. In terms of resource consumption and load, under an extreme scenario simulating a cluster with about 1000 ECS instances, and with Tags Filtering conditions, as well as after alternating scaling operations, Prometheus itself consumed about 0.2 cores (vCPUs) and 1.4 GiB of memory.
1. Deploy and Config Prometheus
The binary executable of Prometheus built on the Linux x86_64 platform.
https://github.com/AliyunContainerService/prometheus/releases/tag/v2.55.0-aliyun-ecs-sd
Create an ECS instance to serve as the running environment for Prometheus and open port 9091
in the security group.
Each ECS instance: CPU and memory: 4 cores (vCPUs) and 16GiB. OS: Alibaba Cloud Linux 3.2104 LTS 64-bit
Download the program to the local machine and transfer it to the remote ECS via SCP
command.
Create and edit the configuration file prometheus.yml.
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 30s
scrape_configs:
- job_name: _aliyun-prom/ecs-sd
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
ecs_sd_configs:
- port: 9101
refresh_interval: 30s
region_id: cn-qingdao # Set the region for obtaining ECS instances.
access_key: <access_key> # The AccessKey ID of the Alibaba Cloud account.
access_key_secret: <access_key_secret> # The AccessKey secret of the Alibaba Cloud account.
# tag_filters:
# - key: 'testK'
# values: ['*', 'testV*']
limit: -1 # The maximum number of instances obtained from the API is limited to 100 by default; when less than zero, all instances are retrieved.
Then run the Prometheus program, which will by default use the configuration file named prometheus.yml in the current directory.
./prometheus
ECS scaling can be performed and managed in two ways: ACK and ESS.
2. Manange and Scale a Group of ECS Nodes by Using ACK Kubernetes Cluster
a) Create an ACK Kubernetes cluster and use nodepool to scale and manage a group of ECS nodes.
More infomation about create an ACK cluster.
b) Deploy node-exporter DaemonSet with hostnetwork: true as ECS's testing exporter.
You can apply kube-prometheus community's node-exporter daemonset (include: ServiceAccount, ClusterRole, ClusterRoleBinding, Daemonsets):
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/nodeExporter-serviceAccount.yaml
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/nodeExporter-clusterRole.yaml
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/nodeExporter-clusterRoleBinding.yaml
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/nodeExporter-daemonset.yaml
During our load testing, we utilized the already deployed node-exporter in the Alibaba Cloud ACK cluster and opened port 9101.
3. Change the listening address of node-exporter DaemonSet from 127.0.0.1:9101
to 0.0.0.0:9101
to allow LAN access.
4. Control the number of ECS instances by scaling the node pools.
Note: Use top -p <pid>
to observe CPU and memory usage.
ECS instance count: 5 -> 55 -> 155 -> 55 -> 5
● ECS 5
● ECS 55
● ECS 155
● ECS 55
● ECS 5
CPU and memory changes
ECS count | 5 | 55 | 155 | 55 | 5 |
---|---|---|---|---|---|
%CPU | 0.0 | 0.0 | 1.0 | 0.3 | 0.0 |
%MEM | 0.8 | 1.2 | 2.1 | 1.8 | 2.0 |
ECS instance count: 5 -> 55 -> 45 -> 145 -> 105
● ECS 5
● ECS 55
● ECS 45
● ECS 145
● ECS 105
CPU and memory changes
ECS count | 5 | 55 | 45 | 145 | 105 |
---|---|---|---|---|---|
%CPU | 0.0 | 0.0 | 0.3 | 0.7 | 0.7 |
%MEM | 2.0 | 2.2 | 2.1 | 2.7 | 2.8 |
ECS instance count: 105 -> 605 -> 1105 -> 605 -> 1
● ECS 105
● ECS 605
● ECS 998
● ECS 605
● ECS 1
CPU and memory changes
ECS count | 105 | 605 | 998 | 605 | 1 |
---|---|---|---|---|---|
%CPU | 0.7 | 3.7 | 6.0 | 3.3 | 0.0 |
%MEM | 2.8 | 6.5 | 10.8 | 10.2 | 11.2 |
Set tag filtering
tag_filters:
- key: 'testK'
values: ['testV', '*']
Add tags to ECS instances
More information about ECS tags.
ECS instance count: ECS (testK: testV) 5 -> ECS (testK: testV) 5 + ECS 5 -> ECS (testK: testV) 5 + ECS 5 + ECS (testK: abc) 5
● ECS (testK: testV) 5
● ECS (testK: testV) 5 + ECS 5
ECS instances without tags cannot be discovered.
● ECS (testK: testV) 5 + ECS 5 + ECS (testK: abc) 5
ECS instances with tag value abc match the wildcard * and are discovered.
ECS instance count: ECS (testK:testV) 5 + ECS 5 -> ECS (testK:testV) 55 + ECS 5 -> ECS (testK:testV) 155 + ECS 5 -> ECS (testK:testV) 55 + ECS 5 -> ECS (testK:testV) 5 + ECS 5
● ECS (testK:testV) 5 + ECS 5
● ECS (testK:testV) 55 + ECS 5
● ECS (testK:testV) 155 + ECS 5
● ECS (testK:testV) 55 + ECS 5
● ECS (testK:testV) 5 + ECS 5
CPU and memory changes
ECS count | 5+5 | 55+5 | 155+5 | 55+5 | 5+5 |
---|---|---|---|---|---|
%CPU | 0.0 | 0.0 | 1.3 | 0.3 | 0.0 |
%MEM | 0.7 | 1.1 | 1.8 | 1.6 | 1.8 |
ECS instance count: ECS (testK:testV) 5 + ECS 5 -> ECS (testK:testV) 55 + ECS 5 -> ECS (testK:testV) 45 + ECS 5 -> ECS (testK:testV) 145 + ECS 5 -> ECS (testK:testV) 105 + ECS 5
● ECS (testK:testV) 5 + ECS 5
● ECS (testK:testV) 55 + ECS 5
● ECS (testK:testV) 45 + ECS 5
● ECS (testK:testV) 145 + ECS 5
● ECS (testK:testV) 105 + ECS 5
CPU and memory changes
ECS count | 5+5 | 55+5 | 45+5 | 145+5 | 105+5 |
---|---|---|---|---|---|
%CPU | 0.0 | 0.7 | 0.3 | 1.0 | 0.7 |
%MEM | 1.8 | 2.0 | 1.8 | 2.4 | 2.9 |
ECS instance count: ECS (testK:testV) 105 + ECS 5 -> ECS (testK:testV) 605 + ECS 5 -> ECS (testK:testV) 994 + ECS 5 -> ECS (testK:testV) 605 + ECS 5 -> ECS (testK:testV) 105 + ECS 5
● ECS (testK:testV) 105 + ECS 5
● ECS (testK:testV) 605 + ECS 5
● ECS (testK:testV) 994 + ECS 5
● ECS (testK:testV) 605 + ECS 5
● ECS (testK:testV) 105 + ECS 5
CPU and memory changes
ECS count | 105+5 | 605+5 | 994+5 | 605+5 | 105+5 |
---|---|---|---|---|---|
%CPU | 0.7 | 2.7 | 5.0 | 3.3 | 0.7 |
%MEM | 1.3 | 4.8 | 8.5 | 7.5 | 6.8 |
tag_filters:
- key: 'testK1'
values: ['testV1', '*']
- key: 'testK2'
values: ['testV2', '*']
ECS instance count: ECS (testK1:testV1, testK2:testV2) 5 -> ECS (testK1:testV1, testK2:testV2) 5 + ECS (testK1:testV1) 5 -> ECS (testK1:testV1, testK2:testV2) 55 + ECS (testK1:testV1) 5 -> ECS (testK1:testV1, testK2:testV2) 5 + ECS (testK1:testV1) 5
● ECS (testK1:testV1, testK2:testV2) 5
● ECS (testK1:testV1, testK2:testV2) 5 + ECS (testK1:testV1) 5
ECS instances with only the testK1 tag cannot be discovered.
● ECS (testK1:testV1, testK2:testV2) 55 + ECS (testK1:testV1) 5
● ECS (testK1:testV1, testK2:testV2) 5 + ECS (testK1:testV1) 5
For the same key, values are OR-related; and between different keys, the relationship is AND.
Collect performance data using promtools when the ECS instance count is 100.
Use Argo Workflows to Orchestrate Genetic Computing Workflows
OpenYurt v1.5 Officially Released, Optimizing Multi-region Workload Lifecycle Management
167 posts | 30 followers
FollowAlibaba Cloud Native - March 1, 2024
Alibaba Clouder - July 31, 2019
Alibaba Cloud Native - March 6, 2024
Alibaba Cloud Native - August 14, 2024
Alibaba Cloud Native Community - February 24, 2023
Alibaba Cloud Native - September 8, 2023
167 posts | 30 followers
FollowProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreMore Posts by Alibaba Container Service