All Products
Search
Document Center

PolarDB:Columnar index TPC-H test (1 TB)

Last Updated:May 29, 2024

Background information

TPC-H is a benchmark that is widely used to evaluate the analytic query capabilities of databases. It is developed and released by Transaction Processing Performance Council (TPC). In the TPC-H benchmark, 22 complex SQL queries are performed on eight tables. Most queries contain joins on several tables, subqueries, and GROUP BY clauses.

Note

In this example, a test based on the TPC-H benchmark is implemented, but it does not meet all the requirements of a TPC-H benchmark test. Therefore, the test results are incomparable with the published results of the TPC-H benchmark test.

Test design

Data volume

1 TB (Scalar Factor=1000), consisting of the following parts:

  • LINEITEM: approximately 6 billion lines

  • ORDERS: 1.5 billion lines

  • PART_SUPP: 0.8 billion lines

Instance specifications for the test

  • Node: 16c128g

  • Number of nodes: 3, 4, and 6

  • Dataset size: 1,000 GB

ECS instance type for the test

ecs.g7.4xlarge (with 16 vCPUs and 64 GB RAM)

Test procedure

Step 1: Prepare the ECS instance

Prepare an Elastic Compute Service (ECS) instance that meets the preceding requirements. The ECS instance will be used in later steps of importing data and applying loads.

Note
  • Deploy the ECS instance in a virtual private cloud (VPC) and record the VPC name and ID. Other required resources will also be deployed in this VPC later.

  • We recommend that you run CentOS and enable Internet access on this ECS instance.

Step 2: Prepare a PolarDB-X instance for the test

Create a PolarDB-X instance. For more information, see Create a PolarDB-X instance.

Note
  • You must deploy the PolarDB-X instance in the same VPC as the ECS instance.

  • You must add the internal IP address of the ECS instance to the whitelist of the PolarDB-X instance.

Step 3: Adjust the PolarDB-X instance

Note

Specific parameters of the PolarDB-X instance must be adjusted to provide optimal performance in the test.

  1. Change the value of the XPROTO_MAX_DN_CONCURRENT parameter to 4000. For more information, see Set parameters.

  2. Connect to the PolarDB-X instance by using the CLI and run the following SQL statements in the same session to disable logging, CPU sampling, and statistics collection:

    set global RECORD_SQL = false;
    set global MPP_METRIC_LEVEL = 0;
    set global ENABLE_CPU_PROFILE = false;
    set global ENABLE_BACKGROUND_STATISTIC_COLLECTION=false;
    set global ENABLE_STATISTIC_FEEDBACK=false;

Step 4: Install Benchmark Boot

  1. Open the 4121 port in the security group of the ECS instance. For more information, see Add a security group rule.

  2. Log on to the ECS instance and run the bash -c "$(curl -fsSL https://benchmark-boot.oss-cn-hangzhou.aliyuncs.com/setup.sh)" command to install Benchmark Boot. For more information, see Download and install Benchmark Boot.

  3. Visit http://{Public IP address of the ECS instance}:4121 in the browser. If the Benchmark Boot homepage appears, the installation is successful.

Step 5: Prepare data

  1. On the Benchmark Boot homepage, choose Stress Test > Database Connection > Configure Database Connection. On the page that appears, configure the host IP address, port, username, password, TPC-H library, and library creation mode (AUTO) to create a PolarDB-X connection. Then, click Submit.

  2. Choose Stress Test > TPCH > Import TPC-H, set the size of the input dataset to 1,000 GB and the number of physical shards to 16, and then click Submit to import 1,000 GB of test data to the primary PolarDB-X instance.

    Note

    After the data is imported, Benchmark Boot automatically executes analyze table to collect statistics.

  3. After the data is imported, execute the following SQL statements on the primary PolarDB-X instance to create a Clustered Columnar Index (CCI). For more information, see Create and use a CCI.

    create clustered columnar index `nation_col_index` on nation(`n_nationkey`) partition by hash(`n_nationkey`) partitions 1;
    
    create clustered columnar index `region_col_index` on region(`r_regionkey`) partition by hash(`r_regionkey`) partitions 1;
    
    create clustered columnar index `customer_col_index` on customer(`c_custkey`) partition by hash(`c_custkey`) partitions 96;
    
    create clustered columnar index `part_col_index` on part(`p_size`) partition by hash(`p_partkey`) partitions 96;
    
    create clustered columnar index `partsupp_col_index` on partsupp(`ps_partkey`) partition by hash(`ps_partkey`) partitions 96;
    
    create clustered columnar index `supplier_col_index` on supplier(`s_suppkey`) partition by hash(`s_suppkey`) partitions 96;
    
    create clustered columnar index `orders_col_index` on orders(`o_orderdate`,`o_orderkey`) partition by hash(`o_orderkey`) partitions 96;
    
    create clustered columnar index `lineitem_col_index` on lineitem(`l_shipdate`,`l_orderkey`) partition by hash(`l_orderkey`) partitions 96;
    Note
    • You can execute show columnar index to view the status of the CCI. For more information, see SHOW COLUMNAR INDEX.

    • It takes about 45 minutes to create a CCI.

Step 6: Add a read-only column store instance

Add a read-only column store instance to the primary PolarDB-X instance. For more information, see Add a column store read-only instance.

Note

You must also add the internal IP address of the ECS instance to the whitelist of the read-only PolarDB-X column store instance.

Step 7: Adjust the read-only column store instance

Note

For optimal performance to be achieved in the test, you must also adjust specific parameters of the read-only PolarDB-X column store instance.

Connect to the read-only PolarDB-X column store instance by using the CLI and run the following SQL statements in the same session to set parameters:

set global SCHEDULE_BY_PARTITION=true;

set global BLOCK_CACHE_MEMORY_SIZE_FACTOR=0.3;
set global OSS_FS_USE_BYTES_CACHE=true;
set global OSS_FS_MEMORY_RATIO_OF_BYTES_CACHE=0.3;
set global ONLY_CACHE_PRIMARY_KEY_IN_BLOCK_CACHE=true;

set global MPP_TASK_LOCAL_MAX_BUFFER_SIZE=8000000000;

set global GLOBAL_RF_ROWS_UPPER_BOUND=50000000;

Step 8: Run the TPC-H test

  1. Choose Stress Test > Database Connection > Configure Database Connection. On the page that appears, configure the TPC-H library name and library creation mode (Auto Detect) to create a connection string for the read-only column store instance. Then, click Submit.

  2. Choose Stress Test > TPCH > Run TPCH, select All Queries, specify a task description, and click Submit to run the 22 query statements for the TPC-H test.

  3. Choose Test Result > TPCH to view the test results.

    Note

    A warm-up is required if the TPC-H test is run for the first time. Stable results are produced after several rounds of test running.

Test results

The following table lists the test results. Unit: seconds.

Query

3*16C128 GB

4*16C128 GB

6*16C128 GB

Q1

28.15

24.81

14

Q2

11.55

9.02

5.12

Q3

20.98

16.54

8.73

Q4

16.26

14.59

7.06

Q5

28.25

21.28

14.01

Q6

7.76

6.84

1.95

Q7

25.81

17.83

11.14

Q8

23.97

21.38

10.37

Q9

71.63

57.86

36.24

Q10

35.98

27.91

16.85

Q11

6.61

4.65

2.54

Q12

8.46

7.1

3.37

Q13

33.15

23.87

16.8

Q14

9.32

8.1

3.31

Q15

7.44

6.2

3.11

Q16

10.18

6.07

4.08

Q17

18.2

11.97

7.46

Q18

47.26

38.65

20.15

Q19

24.86

21.81

13.48

Q20

11.16

8.59

5.35

Q21

50.24

44.85

22.76

Q22

11.54

8.53

6.05

Total time

508.76

408.45

233.93