This topic describes how to perform a Transaction Processing Performance Council (TPC) Benchmark H (TPC-H) performance test in Hologres.
Background information
About TPC-H
TPC-H is a decision support benchmark that consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.
The preceding description is quoted from TPC-H. For more information about TPC-H, see TPC-H specifications.
The TPC-H performance tests described in this topic are implemented based on the TPC-H benchmark test but cannot meet all requirements of the TPC-H benchmark test. Therefore, the test results described in this topic cannot be compared with the published results of the TPC-H benchmark test.
About datasets
TPC-H is a test dataset that is developed by the TPC to simulate decision support systems. TPC-H is used in academia and industries to evaluate the performance of decision support systems.
TPC-H models data in production environments to simulate the data warehouse of a sales system. The data warehouse consists of eight tables. The size of each table ranges from 1 GB to 3 TB. The TPC-H benchmark test involves 22 query statements. The test focuses on the response time of each query, which is the amount of time that is consumed from submitting a query to receiving the returned result. The test result can comprehensively reflect the capability of the system to process queries. For more information, see TPC BENCHMARK H Standard Specification.
Usage notes
To reduce the variables that may affect test results, we recommend that you use new Hologres instances each time you perform the test that is described in this topic, instead of upgrading or downgrading the specifications of existing instances to meet the test requirements.
Procedure
In this example, the 22 query statements of the TPC-H benchmark test are used. The amount of test data affects the test results. The data generation tool of TPC-H allows you to change the scale factor (SF) to adjust the amount of data to be generated. One SF indicates 1 GB of data.
Make preparations.
Prepare the basic environment that is required for the tests on online analytical processing (OLAP) queries.
Purchase a Hologres instance. For more information, see Purchase a Hologres instance.
In this example, queries are performed on the TPC-H dataset that contains 1 TB of data, and a Hologres instance with
96 cores and 384 GB of memoryis purchased. You can select computing resource specifications based on your business requirements.Create an Elastic Compute Service (ECS) instance. For more information, see Creation methods.
In this example, an ECS instance that meets the following requirements is created:
Instance type:
ecs.g6e.4xlarge.Operating system:
Alibaba Cloud Linux 3.2104 LTS 64-bit.Data disk: Enhanced SSD (ESSD). Determine the capacity based on the amount of data to be tested.
The data amount of a dataset refers to the amount of raw data. When you prepare the test environment, you must consider factors such as the space to be occupied by indexes.
Configure the toolkit parameters.
Log on to the ECS instance. For more information, see Connection method overview.
Run the following commands on the ECS instance to install the PostgreSQL client:
yum update -y yum install postgresql-server -y yum install postgresql-contrib -yRun the following commands on the ECS instance to download and decompress the benchmark test package for Hologres:
wget https://public-document-oss.oss-cn-shanghai.aliyuncs.com/tpch-tools/hologres_benchmark_for_tpch.tar tar xzvf hologres_benchmark_for_tpch.tarRun the following command on the ECS instance to go to the benchmark directory:
cd hologres_benchmark_for_tpch/benchmark
Perform the performance test.
Procedure
The performance test involves the following steps:
Create a database named
hologres_tpchin Hologres.Generate data.
Create a table and import the data to the table.
Execute the 22 SQL statements for data queries.
Syntax
Run the following command on the ECS instance to enable an automatic TPC-H test:
python tpch_1t_dbgen.py --psql "PGUSER=<access_key_id> PGPASSWORD=<access_key_secret> psql -p 80 -h <vpc_endpoint>" --init_and_run --scale_factor <scale_factor>The execution time may be long. We recommend that you run the command at the background. Sample command:
nohup python tpch_1t_dbgen.py --psql "PGUSER=<access_key_id> PGPASSWORD=<access_key_secret> psql -p 80 -h <vpc_endpoint>" --init_and_run --scale_factor <scale_factor>Parameters
Parameter
Description
access_key_id
The AccessKey ID of the Alibaba Cloud account that is used to connect to the Hologres instance. For more information about how to obtain an AccessKey ID, see Create an Alibaba Cloud account.
access_key_secret
The AccessKey secret of the Alibaba Cloud account that is used to connect to the Hologres instance. For more information about how to obtain an AccessKey secret, see Create an Alibaba Cloud account.
vpc_endpoint
The endpoint of the virtual private cloud (VPC) in which your Hologres instance resides.
You can view the endpoint of the Hologres instance in the Network Information section of the instance details page in the Hologres console.
scale_factor
The SF of the dataset, which controls the amount of data to be generated. Unit: GB.
In this example, this parameter is set to
1024because a dataset with 1 TB of data is used.
Test result
After the performance test is successful, the test result is saved in the tpch_test_log file in the hologres_benchmark_for_tpch directory. The test result includes information such as the query statements, execution plan, and execution time. Sample result:
./
|-- analyze.sql.result
|-- benchmark_result # The execution plan and result of the most recent TPC-H test.
| |-- query10.sql.result
| |-- query10.sql.resultdata
| |-- query11.sql.result
| |-- query11.sql.resultdata
| |-- query12.sql.result
| |-- query12.sql.resultdata
| |-- query13.sql.result
| |-- query13.sql.resultdata
| |-- query14.sql.result
| |-- query14.sql.resultdata
| |-- query15.sql.result
| |-- query15.sql.resultdata
| |-- query16.sql.result
| |-- query16.sql.resultdata
| |-- query17.sql.result
| |-- query17.sql.resultdata
| |-- query18.sql.result
| |-- query18.sql.resultdata
| |-- query19.sql.result
| |-- query19.sql.resultdata
| |-- query1.sql.result
| |-- query1.sql.resultdata
| |-- query20.sql.result
| |-- query20.sql.resultdata
| |-- query21.sql.result
| |-- query21.sql.resultdata
| |-- query22.sql.result
| |-- query22.sql.resultdata
| |-- query2.sql.result
| |-- query2.sql.resultdata
| |-- query3.sql.result
| |-- query3.sql.resultdata
| |-- query4.sql.result
| |-- query4.sql.resultdata
| |-- query5.sql.result
| |-- query5.sql.resultdata
| |-- query6.sql.result
| |-- query6.sql.resultdata
| |-- query7.sql.result
| |-- query7.sql.resultdata
| |-- query8.sql.result
| |-- query8.sql.resultdata
| |-- query9.sql.result
| `-- query9.sql.resultdata
|-- copy_dbgen.sql.result
|-- init_holo.sql.result
|-- init.sql.result
|-- online_config.result
|-- reset_config.result
|-- tmp # The 22 SQL statements that are executed in the TPC-H test.
| |-- query10.sql
| |-- query11.sql
| |-- query12.sql
| |-- query13.sql
| |-- query14.sql
| |-- query15.sql
| |-- query17.sql
| |-- query18.sql
| |-- query19.sql
| |-- query1.sql
| |-- query21.sql
| |-- query22.sql
| |-- query2.sql
| |-- query3.sql
| |-- query4.sql
| |-- query5.sql
| |-- query6.sql
| |-- query7.sql
| |-- query8.sql
| `-- query9.sql
|-- tmp_normalize_file.sql
`-- total.result # The total execution time of the 22 SQL statements in the TPC-H test.