Prometheus achieves observable performance pressure measurement

What is performance pressure measurement observable

Observability includes three dimensions: Metrics, Traces, and Logs. Observability helps us quickly identify and locate problems in complex distributed systems, and is an essential operational tool in distributed systems.

In the field of performance pressure testing, observability is more important. In addition to helping to locate performance issues, the Metrics performance index directly determines whether the pressure test passes and has a decisive impact on the system's online launch. The details are as follows:

• Metrics, monitoring metrics

System performance indicators, including request success rate, system throughput, and response time

Resource performance indicators, measuring the usage of system software and hardware resources, in conjunction with system performance indicators, observing the water level of system resources

• Logs, logs

• Pressure engine log, observe if the pressure engine is healthy, and check if there are any errors in the execution of the pressure test script

Sampling log, which records API request and response details, assists in troubleshooting some error requests during pressure testing, and checks for complete error information through response details

Traces, distributed link tracing used in the performance problem diagnosis stage, locates the error system and error stack of the error API by tracing the call link of the request in the system, and quickly locates performance problem points

This article explains how to use Prometheus to achieve observability of performance pressure metrics.

Core indicators for pressure monitoring

System performance indicators

The three most important indicators for pressure monitoring are request success rate, service throughput (TPS), and request response time (RT). If any of these three indicators shows a turning point, it can be considered that the system has reached a performance bottleneck.

Here is a special explanation for the response time. Using the average value to judge this indicator is very misleading, because the response time of a system is not evenly distributed and often results in a long tail phenomenon. This is manifested as some users' request response time being particularly long, but the overall average response time meeting expectations. This actually affects the experience of some users and should not be judged as passing the test. Therefore, for the response time, the 99, 95, and 90 percentile values are commonly used to determine whether the system response time meets the standard.

In addition, if it is necessary to observe the distribution details of request response time, additional indicators such as Connect Time and Idle Time can be added.

Resource performance indicators

During the pressure testing process, monitoring of system hardware, middleware, and database resources is also important, including but not limited to:

• CPU usage rate

• Memory usage rate

• Disk throughput

• Network throughput

• Number of database connections

Cache hit rate

... ...

Please refer to the article 'Testing Indicators' for details.

Performance indicators of pressure machine

In the pressure testing link, the performance of the pressure applicator is easily overlooked. To ensure that the pressure applicator is not a performance bottleneck of the entire pressure testing link, the following performance indicators of the pressure applicator need to be paid attention to:

• Memory usage of pressure testing processes

CPU usage rate of the press machine, Load1, Load5 load indicators

JVM based pressure testing engine, which needs to pay attention to the number of garbage collection times and garbage collection duration

Why use Prometheus for pressure monitoring

Open source stress testing tools such as JMeter support simple system performance monitoring metrics such as request success rate, system throughput, response time, etc. However, for large-scale distributed pressure testing, the native monitoring of open-source pressure testing tools has the following shortcomings:

1. The monitoring indicators are not comprehensive enough and generally only include basic system performance indicators, which can only be used to determine whether the pressure test has passed. However, if the pressure test fails and problems need to be identified and located, such as analyzing the 99th percentile connection duration of an API, native monitoring indicators cannot be achieved.

2. Polymerization timeliness cannot be guaranteed

3. Unable to support large-scale distributed monitoring data aggregation

4. Monitoring indicators do not support backtracking by timeline

In summary, in large-scale distributed pressure testing, it is not recommended to use native monitoring using open-source pressure testing tools.

Below is a comparison of two open-source monitoring solutions:

Option 1: Zabbix

Zabbix is an early open source distributed monitoring system that supports MySQL or PostgreSQL relational database as the data source.

For system performance monitoring, the press is required to provide second level monitoring indicators, and the highly concurrent monitoring indicators are written every second, making the relational database the bottleneck of the monitoring system.

For resource performance monitoring, Zabbix has comprehensive indicators for physical and virtual machines, but its monitoring support for containers and elastic computing is not sufficient.

Option 2: Prometheus

Prometheus uses the temporal database as the data source. Compared with the traditional relational database, its read-write performance is greatly improved. For the scenario where a large number of second level monitoring data are reported by the pressure machine, its performance is good.

For resource performance monitoring, Prometheus is more suitable for monitoring cloud resources, especially for monitoring Kubernates and containers, which is very comprehensive. For users using cloud native technology, it is easier to get started.

In summary, Prometheus is more suitable for collecting and aggregating high concurrency monitoring indicators in pressure testing compared to Zabbix, and is more suitable for monitoring cloud resources and easy to expand.

Of course, using mature cloud products is also a good choice, such as the pressure testing tool PTS [2] and the observable tool ARMS [3], which is a golden pair. PTS provides system performance indicators for pressure measurement, while ARMS provides resource monitoring and overall observability capabilities, providing a one-stop solution to the problem of pressure measurement observability.

How to use Prometheus to achieve pressure monitoring

Open source JMeter transformation

Prometheus is a pull data model, so it requires a pressure testing engine to expose HTTP services for Prometheus to obtain various pressure testing indicators.

JMeter provides a plugin mechanism that can be customized to extend Prometheus monitoring capabilities. In the custom plugin, it is necessary to extend JMeter's BackendListener to update each pressure measurement indicator, such as the number of successful requests, number of failed requests, and request response time, when the sampler execution is completed. And save each pressure measurement indicator in memory, and expose it through HTTP service when Prometheus pulls data. The overall structure is as follows:

The JMeter custom plugin needs to be modified:

1. Add indicator registration center

2. Expand Prometheus Indicator Updater

3. Implement a custom JMeter BackendListener, and call the Prometheus updater after the execution of the sampler

4. Implement HTTP Server and supplement authentication logic if security needs arise

PTS pressure measuring tool

Performance Testing Service (PTS) is an Alibaba Cloud SaaS based performance testing tool. PTS supports self-developed pressure testing engines, as well as open-source JMeter pressure testing. The pressure testing indicators are opened to Prometheus on PTS, without the need to develop custom plugins to modify the engine. It only requires 3 steps of white screen operation.

The specific steps are as follows:

1. In the advanced settings of PTS pressure measurement, turn on the [Prometheus] switch

2. After the pressure test starts, copy the Prometheus configuration with one click on [Monitor Export]

3. Paste and hot load this configuration in the self built Prometheus to take effect

Detailed reference: "How to output PTS pressure measurement indicator data to Prometheus" [4]

Quickly build Grafana monitoring system

PTS provides the official Grafana template [5], which supports one click import of monitoring data and allows for flexible editing and expansion to meet your customized monitoring needs.

This platform provides data on global request success rate, system throughput (TPS), response time in the 99th, 95th, and 90th percentiles, as well as the number of error requests aggregated by error status codes.

In the API distribution column, it is possible to visually compare the monitoring indicators of various APIs and quickly locate performance weaknesses in APIs.

In the API Details column, you can view detailed indicators of individual APIs to accurately locate performance bottlenecks.

In addition, the market also provides JVM garbage collection monitoring indicators for the pressure machine, which can assist in determining whether the pressure machine is a performance bottleneck in the pressure testing link.

The import steps are as follows:

Step 1

In the menu bar, click Import under Dashboard:

Step 2

Fill in the ID of PTS Dashboard: 15981

Select your existing data source in Prometheus, which in this example is named Prometheus. After selecting, click Import to import

Step 3

After importing, in the upper left corner of the PTS pressure testing task, select the pressure testing task that needs to be monitored to see the current monitoring system.

This task name corresponds to the jobname in the PTS console's monitoring export Prometheus configuration.


This article elaborates on

1. What is performance testing observable

2. Why use Prometheus for pressure measurement performance indicator monitoring

3. How to use open-source JMeter and cloud PTS to achieve pressure monitoring based on Prometheus

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us