Elastic Algorithm Service (EAS) provides stress testing methods for large language model (LLM) services and general services. You can easily create stress testing tasks and perform stress testing with a few clicks. This helps you fully understand the performance of EAS services. This topic describes how to create and manage stress testing tasks.
Step 1: Go to the Create Stress Testing Task page
Log on to the PAI console. On the page that appears, select the desired region in the top navigation bar. Then, select the workspace that you want to manage, and click Enter Elastic Algorithm Service (EAS).
On the Benchmark Task tab, click Create Stress Testing Task.
Step 2: Create a stress testing task
LLM service stress testing
If your EAS service is an LLM service, we recommend that you select the LLM Service check box. This way, you will receive a stress testing report designed for LLM scenarios.
You can create LLM service stress testing tasks only in the console. The EASCMD client does not support such tasks.
LLM service stress testing only supports two OpenAI APIs: /v1/completions and /v1/chat/completions. Therefore, only EAS services deployed by using inference engines that are compatible with the two OpenAI APIs, such as vLLM, SGLang, LMDeploy, and BladeLLM, can use the LLM service stress testing feature.
On the Create Stress Testing Task page, configure the following parameters. After the parameters are configured, click Confirm.
Common service stress testing
Common service stress testing supports the following three modes:
auto mode: Automatic load increase mode. In this mode, eas-benchmark automatically creates agent workers for stress testing, configures the concurrency, and uses the optimal algorithm to determine the maximum capacity of the service.
scan mode: Periodic mode. In this mode, eas-benchmark dynamically increases the service load based on parameters such as minQPS, maxQPS, adjustInterval, and qpsGrowthDelta, until the actual value of the maxRT, maxQPS, or faultTolerate parameter reaches the specified one.
manual mode: Manual mode. In this mode, you can manually adjust the number of agents and the concurrency of each agent during stress testing.
The console only supports auto mode, while the EASCMD client supports all three stress testing modes: auto, scan, and manual. Usage:
Use the console
The stress testing console limits the timeout for stress testing requests to 20 seconds. If you see a 512 return code in the stress testing report, it is likely due to request timeout. Currently, the EAS stress testing console does not support custom timeout configuration.
On the Create Stress Testing Task page, configure the following parameters. After the parameters are configured, click Confirm.

Parameter | Description | |
Basic Information | Service | Select a service on which you want to perform stress testing. |
Stress Testing URL | The URL to call the service. | |
Stress Testing Configurations | Data Source | Valid values: Single Data Entry, Data Address, OSS Object, and Upload Local File. For information about how to construct stress testing data and supported file types, see Appendix 1: Stress testing data. Note
|
Split File Data By Line | If Data Source is set to Data Address, OSS Object, or Upload Local File, this switch is available. If you turn on this switch, the uploaded stress testing file is separated by lines and each line is used as a stress testing request. If you turn off this switch, the entire file content is used as a single stress testing request. | |
Maximum Duration (s) | The stress testing duration, in seconds. Default value: 300. | |
Maximum QPS | The maximum QPS for stress testing. Default value: 10000. | |
Maximum Response Time (ms) | The maximum response time of the stress testing, in milliseconds. If the response time exceeds this threshold, the system automatically adjusts QPS until the real-time response time meets your business requirements. | |
HTTP Header | The request header, in key-value pairs. Examples:
| |
Use the EASCMD client
Run the bench create command to create a stress testing task (Download the EASCMD client and complete identity authentication). After the task is created, you can view the real-time monitoring data through the returned URL. In this example, the 64-bit Windows version of the EASCMD client is used. Sample command:
eascmdwin64.exe bench create [bench_desc_json]The bench_desc_json parameter indicates the JSON configuration file for the stress testing task. Examples:
For JSON parameter descriptions, see Appendix 3: JSON configuration parameters for stress testing.
Sample output:
[RequestId]: DE240637-4976-59AF-A28C-BAA55C0A****
[OK] Task [benchmark-xgb-test-b514] is creating
[OK] [Agnet: 0/1]: Succeed to start benchmark master
[OK] [Agnet: 0/1]: Succeed to start benchmark master
[OK] [Agnet: 1/1]: Benchmark task is Running
[OK] Benchmark task is Running
[OK] Click the link http://127.0.0.1:18222/eas-benchmark/statsview to observe realtime visualization details, you can turn it off with CTRL+C.
Turning off will not interrupt the benchmark test task, and you can reopen it by the visualize command:
eascmd -c [config_file] bench visualize benchmark-xgb-test-b514Step 3: View details of the stress testing task
View real-time monitoring data
When the Status of the stress testing task changes to Running, click the task name to view real-time monitoring data.
View stress testing report
When the Status of the stress testing task changes to Completed, click the task name to view the stress testing report.
The stress testing report includes Basic Information, Stress Testing Configurations, Test Result, and Test Monitoring sections. The following content describes the details of the Test Monitoring section:
Metrics only available for LLM services
TTFT (Time To First Token)
The latency of the first output token. This metric displays the time from a request is sent to the time when the first token generated by the service is received.

TPOT (Time per Output Token)
The latency of an output token. This metric displays the time interval between two adjacent tokens generated by the service.

TPS (Token Per Second)
The number of tokens transmitted per second.

Metrics available for all services
QPS Distribution
The distribution of the number of requests received by the service per second.

Response Time Distribution
The distribution of the number of responses returned by the service within the specified time range.

Data Transfer Distribution
The distribution of the requests sent from the client to the service and responses returned by the service to the client within the specified time range.

Response Time Distribution in Range
The proportion of response times returned by the service within different ranges. Unit: milliseconds.

Response Time Distribution
The end-to-end latency of requests at different percentiles. Unit: milliseconds.

Status Code Distribution
The distribution of status codes returned by the service.

Step 4: Manage the stress testing task
Use the console
You can view the created stress testing tasks and enable, clone, copy report, and delete tasks on the Stress Testing Tasks tab.

Use the EASCMD client
View the stress testing tasks
Run the
bench listcommand to view the stress testing tasks that you created. In this example, the 64-bit Windows version of the EASCMD client is used. Sample command:eascmdwin64.exe bench lsSample output:
[RequestId]: 7F953F8E-8897-5785-808A-CA648302**** +-------------------------+--------------------------+-------------+----------------+---------+---------------------+ | TASKNAME | TASKID | REGION | AVAILABLEAGENT | STATUS | CREATETIME | +-------------------------+--------------------------+-------------+----------------+---------+---------------------+ | benchmark-***-test-**** | eas-b-ql470xog6qeh25**** | cn-shanghai | 0 | Stopped | 2022-06-17 17:58:01 | | benchmark-***-test-**** | eas-b-bdnzvwq0z0h3xq**** | cn-shanghai | 2 | Running | 2022-06-20 12:18:54 | +-------------------------+--------------------------+-------------+----------------+---------+---------------------+View the details of a stress testing task
Run the
bench desccommand to view the details of a specific stress testing task. In this example, the 64-bit Windows version of the EASCMD client is used. Sample command:eascmdwin64.exe bench desc [benchmark_task_name]Replace [benchmark_task_name] with the name of the stress testing task that you want to manage.
Sample output:
+----------------+------------------------------------------------------------------------------+ | TaskName | benchmark-***-test-b514 | | TaskId | eas-b-bdnzvwq0z0h3xq**** | | ServiceName | xgb_test | | Region | cn-shanghai | | DesiredAgent | 2 | | AvailableAgent | 2 | | Status | Running | | Message | Benchmark task is running | | CreateTime | 2021-10-20 12:38:35 | | UpdateTime | 2021-10-20 12:38:45 | | Config | { | | | "base": { | | | "agentCount": 2, | | | "concurrency": 40, | | | "duration": 1200, | | | "requestCount": | | | 922337203685477****, | | | }, | | | ... | | | } | +----------------+------------------------------------------------------------------------------+Enable real-time visualization for a stress testing task
Run the
bench visualizecommand to enable real-time visualization for a stress testing task. This command launches a web server on your local machine at 127.0.0.1, which provides a web page that displays the real-time monitoring data of the stress testing task. In this example, the 64-bit Windows version of the EASCMD client is used. Sample command:eascmdwin64.exe bench visualize [benchmark_task_name]Replace [benchmark_task_name] with the name of the stress testing task that you want to manage.
Sample output:
[OK] Click the link http://127.0.0.1:18734/eas-benchmark/statsview to observe realtime visualization details, you can turn it off with CTRL+C. Turning off will not interrupt the benchmark test task, and you can reopen it by the visualize command: eascmd -c [config_file] bench visualize benchmark-xgb-test-b514To view the real-time monitoring data, visit
http://127.0.0.1:18734/eas-benchmark/statsviewin a browser.Obtain the stress testing report
When the status of a stress testing task changes to Stopped, the task is completed. The stress testing report is saved to OSS, and you can run the
bench reportcommand to obtain the report. In this example, the 64-bit Windows version of the EASCMD client is used. Sample command:eascmdwin64.exe bench report [benchmark_task_name]Replace [benchmark_task_name] with the name of the stress testing task that you want to manage.
Sample output:
[OK] Benchmark task benchmark-demo-test-c7eb report url: http://eas-benchmark.oss-cn-chengdu.aliyuncs.com/summary/benchmark-demo-test-c7eb-10004.htmlCopy the link after url: and paste it in a browser to view the stress testing report. The following figure shows a sample report.

Dynamically change the number of agents and concurrency
If you use the manual mode for stress testing, you can use the
bench updatecommand to dynamically change the number of agents and concurrency. In this example, the 64-bit Windows version of the EASCMD client is used. Sample command:eascmdwin64.exe bench update [benchmark_task_name] -Doptional.concurrency=<attr_value> -Doptional.agentCount=<attr_value>Replace <attr_value> with a specific value that you want to configure. Example:
eascmdwin64.exe bench update benchmark-demo-b99c -Doptional.concurrency=2 -Doptional.agentCount=1Sample output:
[RequestId]: 9920C672-4D41-5CC4-8EC0-C690F76EB2BA [OK] Running [TaskName: benchmark-demo-b99c, DesiredAgent:1, AvailableAgent: 1, Message: Benchmark task is Updating] [OK] Benchmark task benchmark-demo-b99c was updated successfullyStop a stress testing task
Run the
bench stopcommand to stop a stress testing task that is running. In this example, the 64-bit Windows version of the EASCMD client is used. Sample command:eascmdwin64.exe bench stop [benchmark_task_name]Replace [benchmark_task_name] with the name of the stress testing task that you want to manage.
Sample output:
Are you sure to stop the benchmark task [benchmark-***-test-b514] in [cn-shanghai]? [Y/n] [OK] Task [benchmark-***-test-b514] is stopping [OK] [Agnet: 0/1]: Benchmark task is Running [OK] [Agnet: 0/1]: Benchmark task is Stopped [OK] Benchmark task is stoppedIf you ran the command that enables real-time visualization in a terminal window before you stop the task, the system generates the stress testing report in the terminal. You can also run the
bench reportcommand to obtain the detailed HTML report with charts and text.Start a stress testing task
Run the
bench startcommand to start a stopped stress testing task. In this example, the 64-bit Windows version of the EASCMD client is used. Sample command:eascmdwin64.exe bench start [benchmark_task_name]NoteCompared with the
bench createcommand, this command restarts a stress testing task based on the most recently updated configuration of the task.Replace [benchmark_task_name] with the name of the stress testing task that you want to manage.
Sample output:
Are you sure to start the benchmark task [benchmark-***-test-b514] in [cn-shanghai]? [Y/n] [OK] Task [benchmark-***-test-b514] is starting [OK] [Agnet: 0/1]: Succeed to start benchmark master [OK] [Agnet: 1/1]: Benchmark task is Running [OK] Benchmark task is Running [OK] Click the link http://127.0.0.1:18947/eas-benchmark/statsview to observe realtime visualization details, you can turn it off with CTRL+C. Turning off will not interrupt the benchmark test task, and you can reopen it by the visualize command: eascmd -c [config_file] bench visualize benchmark-xgb-test-b514Delete a stress testing task
After a stress testing task is completed, the stress testing controller retains records of the task based on the final state of the task. The following table describes the retention periods.
Final state
Retention period
Stopped
48 hours
CreateFailed, UpdateFailed, Terminated, or Error
10 minutes
When the retention period ends, the system automatically deletes the task.
You can also run the
bench deletecommand to manually delete a stress testing task. Format:eascmdwin64.exe bench delete [benchmark_task_name]Replace [benchmark_task_name] with the name of the stress testing task that you want to manage.
Sample output:
Are you sure to delete the benchmark task [benchmark-***-test-b514] in [cn-shanghai]? [Y/n] [OK] Benchmark task benchmark-***-test-b514 is Deleting [OK] Benchmark task was deleted successfully
Appendix 1: Stress testing data
Format
The format of service request data varies based on the model and processor of your service:
For unstructured request data, such as audio, images, text, directly upload the files that contain the request data.
For structured request data, such as data in the TFRequest format, construct request data using the EAS SDK (see Warm up model services), and upload the generated binary data as a file.
File type
Use a supported file type based on your business requirements, such as .txt, .jpg, .bin, and .zip.
Appendix 2: Stress testing configuration examples
When you use the EASCMD client to create stress testing tasks for common services, you can use auto, scan, or manual mode.
You can configure the mode parameter in the optional section of the configuration file to specify a stress testing mode. Configuration examples:
Auto mode
In auto mode, you need to only specify the service name and stress testing data in the configuration file. Retain the default settings of other parameters. Sample configuration:
{
"service": {
"serviceName": "demo"
},
"data": {
"path": "https://examplebucket.oss-cn-chengdu.aliyuncs.com/data/warmup.tf.bin"
},
"optional": {
"maxQPS": 1000,
"duration": 300
}
}Scan mode
{
"service": {
"serviceName": "demo"
},
"data": {
"content": "aGVsbG8K"
},
"optional": {
"mode": "scan",
"maxQPS": 1000,
"minQPS": 500,
"qpsGrowthDelta": 100,
"adjustInterval": 30
}
}Manual mode
{
"service": {
"serviceName": "demo"
},
"data": {
"content": "aGVsbG8K"
},
"optional": {
"mode": "manual",
"agentCount": 1,
"concurrency": 5
}
}Appendix 3: JSON configuration parameters for stress testing
Item | Parameter | Required | Description |
service | serviceName | Yes | The name of the service on which you want to perform stress testing. |
data | content | No | A single entry of stress testing data in the Base64-encoded string format. You can specify the path parameter to configure multiple requests. For information about how to construct stress testing data and supported file types, see Appendix 1: Stress testing data. |
path | No | The path where the stress testing data is stored. You can specify multiple HTTP or OSS paths and separate them with commas (,). You can also package multiple stress testing files into a ZIP file and then specify the path of the ZIP file. Note You do not need to use base64 to encode the files that store the stress testing data. | |
multiLine | No | Specifies whether to separate the stress testing data by line. Valid values: true and false. Default value: false. If you set the parameter to true, the downloaded data is processed line by line. | |
http | headers | No | The HTTP request headers. The value of this parameter is of the LIST type. Example: |
timeout | No | The HTTP request latency, in milliseconds. Default value: 20000. | |
optional | mode | No | The stress testing mode. Valid values:
|
duration | No | The stress testing duration in seconds. Default value: 600. Maximum value: 1200. | |
agentCount | No | The number of stress testing agents when you set the mode parameter to manual. A larger number of agents results in higher loads. Default value: 1. | |
concurrency | No | The concurrency of each agent when you set the mode parameter to manual. Default value: 2. A larger value indicates a higher load. If you want to increase the loads of the service, increase the concurrency value first. If the loads stop increasing with the concurrency, try to add more agent workers. | |
adjustInterval | No | The interval at which the load is automatically increased when you set the mode parameter to scan. Unit: seconds. Default value: 60. | |
minQPS | No | The initial QPS value when you set the mode parameter to scan (automatic load increase mode). Default value: 100. | |
maxQPS | No | The maximum QPS value when you set the mode parameter to scan or auto. | |
maxRT | No | The maximum RT value (TP99) when you set the mode parameter to scan or auto. If the actual RT exceeds the threshold, the QPS is automatically adjusted until the RT drops below the threshold. | |
qpsGrowthDelta | No | The increment of QPS for each adjustment when you set the mode parameter to scan (automatic load increase mode). Default value: 50. | |
faultTolerate | No | The maximum tolerance rate for request errors, which occur if the returned status code is not 200. A value of 0.01 specifies that the error handling mechanism is triggered when over 1% of requests encounter an error. Default value: 0.001. | |
faultAction | No | The operation the stress testing controller performs when you set the mode parameter to scan or auto and the request error rate exceeds the threshold specified by the faultTolerate parameter. Valid values:
|
References
You can create and manage stress testing tasks by calling APIs. For more information, see Benchmark Task.
After the performance of the service meets your business requirements, you can call the service to run inference tasks. For more information, see Overview.
