This topic describes how to use the TPC-H benchmark to run a performance test for Online Analytical Processing (OLAP) query scenarios.
TPC-H
The following description is from TPC Benchmark H (TPC-H):
"The TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions."
For details, see the TPCH Specification.
This TPC-H implementation is based on the TPC-H benchmark but is not fully compliant with its official requirements. Therefore, these results are not comparable with officially published TPC-H benchmark results.
Dataset
The TPC-H benchmark was developed by the Transaction Processing Performance Council (TPC) to simulate decision support applications. It is widely used in academia and industry to evaluate the performance of decision support technologies.
TPC-H models a data warehouse for a sales system based on a real-world production environment. It contains eight tables, and the data volume can be set from 1 GB to 3 TB. The benchmark includes 22 queries, and the primary metric is the response time of each query, measured from submission to result return. The test results provide a comprehensive overview of the system's query processing capability. For more information, see the TPC-H benchmark.
Procedure
This test uses the 22 queries from the TPC-H benchmark. The test results are directly affected by the data volume. The TPC-H data generator uses a scale factor (SF) to control the generated data volume, where 1 SF corresponds to 1 GB of data.
The specified data volume refers only to the raw data and does not include the space required for indexes. Therefore, you must provision additional storage space when you prepare the environment.
-
Prepare the environment.
Prepare the required resources for the OLAP query scenario.
NoteFor consistent results, use a new instance for each test. Avoid using instances that have been scaled up or down.
-
Create an ECS instance. For more information, see Create an ECS instance.
The ECS instance for this test is configured as follows:
-
Instance type: ecs.g6e.4xlarge
-
Operating system: CentOS 7.9
-
Data disk: Enterprise SSD (ESSD). The disk capacity depends on the test data volume.
-
-
Create a StarRocks instance. For more information, see Create an instance.
For this test, the backend (BE) is configured with 8 compute units (CUs), providing 8 CPU cores and 32 GB of memory. You can select a configuration based on your business requirements.
NoteCreate the StarRocks instance and the ECS instance in the same region and Virtual Private Cloud (VPC).
-
-
Configure the toolkit parameters.
-
Log on to the ECS instance. For more information, see Connect to an ECS instance.
-
Run the following commands to download and decompress the starrocks-benchmark toolkit.
wget https://emr-olap.oss-cn-beijing.aliyuncs.com/packages/starrocks-benchmark-for-serverless.tar.gz tar xzvf starrocks-benchmark-for-serverless.tar.gz -
Run the following command to change to the starrocks-benchmark-for-serverless directory.
cd starrocks-benchmark-for-serverless -
Run the
vim group_vars/allcommand to configure the benchmark test parameters.# mysql client config login_host: fe-c-8764bab92bc6****-internal.starrocks.aliyuncs.com login_port: 9030 login_user: admin login_password: xxxx # oss config bucket: "" endpoint: "" access_key_id: "" access_key_secret: "" # benchmark config scale_factor: 1 work_dir_root: /mnt/disk1/starrocks-benchmark/workdirs dataset_generate_root_path: /mnt/disk1/starrocks-benchmark/datasetsThe following table describes the parameters.
Parameter
Description
Notes
login_host
The internal endpoint of the frontend (FE) of the StarRocks instance.
On the Instance Details tab of your StarRocks instance, find the endpoint in the FE Details section under Internal Endpoint.
ImportantAvoid using a public endpoint.
StarRocks client connection parameters.
login_port
The query port of the FE of the StarRocks instance. The default value is 9030.
On the Instance Details tab of your StarRocks instance, find the port in the FE Details section under Query Port.
login_user
The initial user of the StarRocks instance.
login_password
The password for the initial user of the StarRocks instance.
bucket
The name of the OSS bucket.
Optional. OSS configuration. If you configure these parameters, the toolkit generates and stores the test dataset in OSS.
endpoint
The endpoint for accessing OSS.
access_key_id
The AccessKey ID of your Alibaba Cloud account.
access_key_secret
The AccessKey Secret of your Alibaba Cloud account.
scale_factor
The scale factor for the dataset, which controls the data volume. The default value is 1. The unit is GB.
Benchmark configuration.
work_dir_root
The root working directory. It stores TPC-H table creation SQL, executed SQL, and other artifacts. The default value is /mnt/disk1/starrocks-benchmark/workdirs.
dataset_generate_root_path
The path for storing the generated test dataset. The default value is /mnt/disk1/starrocks-benchmark/datasets.
If you configure OSS, the specified bucket is mounted to this path.
-
-
Run the following command to perform an automated, end-to-end TPC-H test.
bin/run_tpch.shThis command automates the entire TPC-H test process, which includes creating the database and tables, generating and loading the dataset, and running the 22 SQL queries.
Other operations:
-
To load only the dataset, run the following command.
bin/run_tpch.sh reload -
To run only the TPC-H query test, run the following command.
bin/run_tpch.sh query
-
-
View the results.
-
Test result summary
When the
bin/run_tpch.shcommand finishes, it prints the test results, which look similar to the following:TASK [tpc_h : debug] **************************************************************************************************************************************** ok: [10.1.**.**] => { "command_output.stdout_lines": [ "[info] 2022-03-01 09:51:23.295 | Run sql queries started.", "[info] 2022-03-01 09:51:23.330 | Run q10.sql started.", "[info] 2022-03-01 09:51:23.913 | Run q10.sql finished. Time taken: 0:00:00, .557 seconds", "[info] 2022-03-01 09:51:23.923 | Run q11.sql started.", "[info] 2022-03-01 09:51:24.026 | Run q11.sql finished. Time taken: 0:00:00, .100 seconds", "[info] 2022-03-01 09:51:24.038 | Run q12.sql started.", "[info] 2022-03-01 09:51:24.192 | Run q12.sql finished. Time taken: 0:00:00, .151 seconds", "[info] 2022-03-01 09:51:24.204 | Run q13.sql started.", "[info] 2022-03-01 09:51:24.553 | Run q13.sql finished. Time taken: 0:00:00, .347 seconds", "[info] 2022-03-01 09:51:24.563 | Run q14.sql started.", "[info] 2022-03-01 09:51:24.665 | Run q14.sql finished. Time taken: 0:00:00, .098 seconds", "[info] 2022-03-01 09:51:24.675 | Run q15.sql started.", "[info] 2022-03-01 09:51:24.852 | Run q15.sql finished. Time taken: 0:00:00, .175 seconds", "[info] 2022-03-01 09:51:24.864 | Run q16.sql started.", "[info] 2022-03-01 09:51:25.008 | Run q16.sql finished. Time taken: 0:00:00, .142 seconds", "[info] 2022-03-01 09:51:25.018 | Run q17.sql started.", "[info] 2022-03-01 09:51:25.269 | Run q17.sql finished. Time taken: 0:00:00, .248 seconds", "[info] 2022-03-01 09:51:25.280 | Run q18.sql started.", "[info] 2022-03-01 09:51:25.800 | Run q18.sql finished. Time taken: 0:00:00, .518 seconds", "[info] 2022-03-01 09:51:25.810 | Run q19.sql started.", "[info] 2022-03-01 09:51:25.943 | Run q19.sql finished. Time taken: 0:00:00, .130 seconds", "[info] 2022-03-01 09:51:25.953 | Run q1.sql started.", "[info] 2022-03-01 09:51:26.295 | Run q1.sql finished. Time taken: 0:00:00, .339 seconds", "[info] 2022-03-01 09:51:26.305 | Run q20.sql started.", "[info] 2022-03-01 09:51:26.708 | Run q20.sql finished. Time taken: 0:00:00, .400 seconds", "[info] 2022-03-01 09:51:26.720 | Run q21.sql started.", "[info] 2022-03-01 09:51:27.323 | Run q21.sql finished. Time taken: 0:00:00, .600 seconds", "[info] 2022-03-01 09:51:27.334 | Run q22.sql started.", "[info] 2022-03-01 09:51:27.403 | Run q22.sql finished. Time taken: 0:00:00, .065 seconds", "[info] 2022-03-01 09:51:27.415 | Run q2.sql started.", "[info] 2022-03-01 09:51:27.632 | Run q2.sql finished. Time taken: 0:00:00, .213 seconds", "[info] 2022-03-01 09:51:27.648 | Run q3.sql started.", "[info] 2022-03-01 09:51:27.917 | Run q3.sql finished. Time taken: 0:00:00, .262 seconds", "[info] 2022-03-01 09:51:27.936 | Run q4.sql started.", "[info] 2022-03-01 09:51:28.150 | Run q4.sql finished. Time taken: 0:00:00, .210 seconds", "[info] 2022-03-01 09:51:28.172 | Run q5.sql started.", "[info] 2022-03-01 09:51:28.954 | Run q5.sql finished. Time taken: 0:00:00, .778 seconds", "[info] 2022-03-01 09:51:28.976 | Run q6.sql started.", "[info] 2022-03-01 09:51:29.080 | Run q6.sql finished. Time taken: 0:00:00, .103 seconds", "[info] 2022-03-01 09:51:29.096 | Run q7.sql started.", "[info] 2022-03-01 09:51:29.445 | Run q7.sql finished. Time taken: 0:00:00, .346 seconds", "[info] 2022-03-01 09:51:29.460 | Run q8.sql started.", "[info] 2022-03-01 09:51:32.692 | Run q8.sql finished. Time taken: 0:00:03, 3.229 seconds", "[info] 2022-03-01 09:51:32.703 | Run q9.sql started.", "[info] 2022-03-01 09:51:33.318 | Run q9.sql finished. Time taken: 0:00:00, .611 seconds", "[info] 2022-03-01 09:51:33.324 | Run sql queries finished. Time taken: 0:00:10, 10.026 seconds" ] } TASK [tpc_h : debug] ******************************************************************************************************************************************************************************** ok: [10.1.0.91] => { "work_dir": "/mnt/disk1/starrocks-benchmark/workdirs/tpc_h/sf1" } -
Detailed test results
After you successfully run the
bin/run_tpch.shcommand, the system builds the entire working directory for the TPC-H test and outputs the path to the <work_dir> directory. You can then switch to this directory to view related information, such as query statements, table creation statements, and execution logs.TASK [tpc_h : debug] **************************************************** ok: [10.1.0.91] => { "work_dir": /mnt/disk1/starrocks-benchmark/workdirs/tpc_h/sf1 }NoteIn this example, <work_dir> is /mnt/disk1/starrocks-benchmark/workdirs/tpc_h/sf1.
Run the
cd <work_dir>/logscommand to change to the logs subdirectory and view the test results and detailed SQL output.The <work_dir> directory has the following structure:
<work_dir>/ ├── config # Configurations for the run.sh and run_mysql.sh scripts. ├── logs # Logs from the most recent TPC-H run. │ ├── *.sql.err │ ├── *.sql.out │ └── run.log ├── queries # The 22 TPC-H SQL queries. │ ├── ddl │ │ └── create_tables.sql # TPC-H table creation SQL. │ └── *.sql ├── run_mysql.sh ├── run.sh # Runs all TPC-H queries. └── tpch_tools # The dbgen toolkit.
-