TPC-DS performance testing - MaxCompute - Alibaba Cloud Documentation Center

MaxCompute has high performance advantages in the industry and is suitable for queries of terabytes, petabytes, or even exabytes of data. This topic describes how to perform a big data benchmark TPC-DS test based on the public datasets and test tools that are provided by MaxCompute to verify the performance of MaxCompute.

Preparations

Prepare an environment.
- Before you perform a TPC-DS test, activate MaxCompute and create a project. For more information, see Create a project.
- Activate MaxCompute Query Acceleration (MCQA) for a subscription MaxCompute project. For more information, see MaxCompute Query Acceleration.
Prepare a test tool.
MaxCompute provides a TPC-DS automated performance test tool to help you quickly complete a TPC-DS test and automatically generate test results.
Important
The test tool can be used only in Linux in which a Java Development Kit (JDK) of 1.7 or later is installed.
You can click mc_tpcds_benchmark to download the package of the test tool and run the following command on the Linux server to decompress the package:
```
unzip mc_tpcds_benchmark.zip
```
The following code shows the directory structure of the decompressed file.
```
.
|_t1c7039e3-2a1d-451b-bfda-d14c49016243-tpc-ds-tool.zip
|_config
|_init_tools.sh
|_load_table.sh
|_logs
|_odps_clt
|_patches
|_pt.sh
|_queries_1
|_queries_1.quality
|_queries_10
|_queries_100
|_queries_1000
|_queries_10000
|_queries_100000
|_querygen.sh
|_results
|_run_stream.sh
|_run_stream.sh.offline
|_sqls
|_start_session_only.sh
|_start_session.sql
|_start_session.sql_tmp
|_tools_file
|_tt.sh
|_v2.10.1rc3
```

Obtain a test dataset.

MaxCompute provides public datasets. You do not need to prepare test data. All test data is stored in the public project BIGDATA_PUBLIC_DATASET of MaxCompute. For more information, see Overview.

TPC-DS test datasets are divided into 10 GB, 100 GB, 1 TB, and 10 TB datasets based on the data size. The following table describes the datasets.

Type	Description	Dataset name	Schema name
TPC-DS	TPC-DS is a decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. TPC-DS enables emerging technologies, such as big data systems, to perform benchmark tests.	TPC-DS 10-GB performance test dataset TPC-DS 100-GB performance test dataset TPC-DS 1-TB performance test dataset TPC-DS 10-TB performance test dataset	tpcds_10g tpcds_100g tpcds_1t tpcds_10t

Procedure

Modify the configuration file of the test tool

Go to the mc_tpcds_benchmark directory of the decompressed package of the test tool and modify the config file. The following table describes the configuration items that you need to modify.

Configuration item	Description	Value
ODPS_CLT_CMD	The absolute path of the executable file of the MaxCompute client. The client that is provided in the package is odps_clt in the working directory. You can modify the related configuration. For more information, see Install and configure the MaxCompute client.	Example: /xxxxx/mc_tpcds_benchmark/odps_clt/bin/odpscmd.
PROJECT	The MaxCompute project that is used for the test.	Example: tpcds_test.
SF	The data size of the TPC-DS test. Unit: GB. 1 indicates 1 GB. 1000 indicates 1 TB. You can change the value based on your test requirements.	Default value: 1000
SQL_FLAGS	The built-in flag parameters of MaxCompute. You do not need to modify the configuration of these parameters.	`set odps.sql.session.result.cache.enable=false`: Disable the result cache feature for a MaxCompute project in MCQA mode. This ensures that each query can be independently executed. `set odps.sql.allow.cartesian=true`: Allow SQL to support Cartesian product calculation. `set odps.sql.session.query.timeout=600`: Specify the timeout period of a Fuxi job for a MaxCompute project in MCQA mode.

Start the test

Run the following command in the mc_tpcds_benchmark directory to start the TPC-DS test:

nohup sh pt.sh > pt.log 2>&1 &

If the test is successful, a pt.log file is generated in the mc_tpcds_benchmark directory. You can run the following command to view the logs of the job:

tail -f pt.log

View the execution information about MaxCompute jobs

You can view the execution information about a job on the Jobs page in the MaxCompute console. For more information, see Manage jobs.

View test results

If the execution is successful, a test result file named console_test_result.csv is generated in the mc_tpcds_benchmark directory. You can view test results in the file, including the total test duration, the execution time of each query, and the related LogView information.