This topic describes the design, process, and results of the TPC-H test for PolarDB-X.

Background information

TPC-H is a benchmark that is widely used to evaluate the analytic query capabilities of databases. It is developed and released by Transaction Processing Performance Council (TPC). In the TPC-H benchmark, 22 complex SQL queries are performed on 8 tables. Most queries contain joins on several tables, subqueries, and GROUP BY clauses.

Note The TPC-H performance tests described in this topic are implemented based on the TPC-H benchmark test but cannot meet all requirements of the TPC-H benchmark test. Therefore, the test results described in this topic cannot be compared with the published results of the TPC-H benchmark test.

Test condition

  • Amount of test data
    The test results in this topic are obtained based on 100 GB of data, which indicates that the scalar factor is 100. The following list describes the amount of data in each major table:
    • The LINEITEM table contains 600 million rows of data.
    • The ORDERS table contains 150 million rows of data.
    • The PART_SUPP table contains 80 million rows of data.
  • Instance specifications for the TPC-H test
    Node specification Number of nodes Size of data set
    4c32g 6 100 GB
    8c64g 6 100 GB
    16c128g 6 100 GB
  • ECS instance specification for stress tests

    ecs.g7.4xlarge (16 vCPUs, 64 GB of memory, and a disk larger than 200 GB)

Procedure

  1. Create an ECS instance for stress tests

    Create an Elastic Compute Service (ECS) instance that is used to prepare data and perform stress tests. The disk size of the ECS instance must be larger than 200 GB. The data sets in the CSV format generated by the tools are stored in the disk of the ECS instance.

    Note The ECS instance must be deployed in a virtual private cloud (VPC). Record the name and the ID of the VPC for future use. You must deploy all database instances that are described in subsequent steps in this VPC.
  2. Create a PolarDB-X instance for the stress test
    1. Create a PolarDB-X instance. For more information about how to create a PolarDB-X instance, see Create PolarDB-X instances.
    2. Create a database to be tested in the instance. In this example, a database named tpch_100g is created. For more information about how to create a database, see Create a database.
    3. Execute the following statement to create a table in the tpch_100g database:
      CREATE TABLE `customer` (
        `c_custkey` int(11) NOT NULL,
        `c_name` varchar(25) NOT NULL,
        `c_address` varchar(40) NOT NULL,
        `c_nationkey` int(11) NOT NULL,
        `c_phone` varchar(15) NOT NULL,
        `c_acctbal` decimal(15,2) NOT NULL,
        `c_mktsegment` varchar(10) NOT NULL,
        `c_comment` varchar(117) NOT NULL,
        PRIMARY KEY (`c_custkey`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`c_custkey`) tbpartition by hash(`c_custkey`) tbpartitions 4;
      CREATE TABLE `lineitem` (
        `l_orderkey` bigint(20) NOT NULL,
        `l_partkey` int(11) NOT NULL,
        `l_suppkey` int(11) NOT NULL,
        `l_linenumber` bigint(20) NOT NULL,
        `l_quantity` decimal(15,2) NOT NULL,
        `l_extendedprice` decimal(15,2) NOT NULL,
        `l_discount` decimal(15,2) NOT NULL,
        `l_tax` decimal(15,2) NOT NULL,
        `l_returnflag` varchar(1) NOT NULL,
        `l_linestatus` varchar(1) NOT NULL,
        `l_shipdate` date NOT NULL,
        `l_commitdate` date NOT NULL,
        `l_receiptdate` date NOT NULL,
        `l_shipinstruct` varchar(25) NOT NULL,
        `l_shipmode` varchar(10) NOT NULL,
        `l_comment` varchar(44) NOT NULL,
        KEY `IDX_LINEITEM_SUPPKEY` (`l_suppkey`),
        KEY `IDX_LINEITEM_PARTKEY` (`l_partkey`),
        PRIMARY KEY (`l_orderkey`,`l_linenumber`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by RIGHT_SHIFT(`l_orderkey`,6) tbpartition by RIGHT_SHIFT(`l_orderkey`,6) tbpartitions 4;
      CREATE TABLE `orders` (
        `o_orderkey` bigint(20) NOT NULL,
        `o_custkey` int(11) NOT NULL,
        `o_orderstatus` varchar(1) NOT NULL,
        `o_totalprice` decimal(15,2) NOT NULL,
        `o_orderdate` date NOT NULL,
        `o_orderpriority` varchar(15) NOT NULL,
        `o_clerk` varchar(15) NOT NULL,
        `o_shippriority` bigint(20) NOT NULL,
        `o_comment` varchar(79) NOT NULL,
        PRIMARY KEY (`O_ORDERKEY`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by RIGHT_SHIFT(`O_ORDERKEY`,6) tbpartition by RIGHT_SHIFT(`O_ORDERKEY`,6) tbpartitions 4;
      CREATE TABLE `part` (
        `p_partkey` int(11) NOT NULL,
        `p_name` varchar(55) NOT NULL,
        `p_mfgr` varchar(25) NOT NULL,
        `p_brand` varchar(10) NOT NULL,
        `p_type` varchar(25) NOT NULL,
        `p_size` int(11) NOT NULL,
        `p_container` varchar(10) NOT NULL,
        `p_retailprice` decimal(15,2) NOT NULL,
        `p_comment` varchar(23) NOT NULL,
        PRIMARY KEY (`p_partkey`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`p_partkey`) tbpartition by hash(`p_partkey`) tbpartitions 4;
      CREATE TABLE `partsupp` (
        `ps_partkey` int(11) NOT NULL,
        `ps_suppkey` int(11) NOT NULL,
        `ps_availqty` int(11) NOT NULL,
        `ps_supplycost` decimal(15,2) NOT NULL,
        `ps_comment` varchar(199) NOT NULL,
        KEY `IDX_PARTSUPP_SUPPKEY` (`PS_SUPPKEY`),
        PRIMARY KEY (`ps_partkey`,`ps_suppkey`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`ps_partkey`) tbpartition by hash(`ps_partkey`) tbpartitions 4;
      CREATE TABLE `supplier` (
        `s_suppkey` int(11) NOT NULL,
        `s_name` varchar(25) NOT NULL,
        `s_address` varchar(40) NOT NULL,
        `s_nationkey` int(11) NOT NULL,
        `s_phone` varchar(15) NOT NULL,
        `s_acctbal` decimal(15,2) NOT NULL,
        `s_comment` varchar(101) NOT NULL,
        PRIMARY KEY (`s_suppkey`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`s_suppkey`) tbpartition by hash(`s_suppkey`) tbpartitions 4;
      CREATE TABLE `nation` (
        `n_nationkey` int(11) NOT NULL,
        `n_name` varchar(25) NOT NULL,
        `n_regionkey` int(11) NOT NULL,
        `n_comment` varchar(152) DEFAULT NULL,
        PRIMARY KEY (`n_nationkey`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 broadcast;
      CREATE TABLE `region` (
        `r_regionkey` int(11) NOT NULL,
        `r_name` varchar(25) NOT NULL,
        `r_comment` varchar(152) DEFAULT NULL,
        PRIMARY KEY (`r_regionkey`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 broadcast;
      Note Make sure that the ECS instance and the PolarDB-X cluster are in the same VPC.
  3. Adjust instance parameters
    Note To achieve optimal performance in stress tests, adjust some parameters of the compute node of the PolarDB-X instance.
    1. Change the value of the XPROTO_MAX_DN_CONCURRENT parameter to 4000. For more information, see Parameter settings.
    2. Connect to the PolarDB-X instance by using the command line client. Then, execute the following SQL statements in the same session to disable logging and CPU statistic sampling:
      set ENABLE_SET_GLOBAL = true;
      set global RECORD_SQL = false;
      set global MPP_METRIC_LEVEL = 0;
      set global ENABLE_CPU_PROFILE = false;
  4. Prepare data for the stress test
    1. Run the following command to download and decompress the tpchData.tar.gz script:
      tar xzvf tpchData.tar.gz

      Run the following command on the ECS instance to modify the configuration file named params.conf and add the connection information about the PolarDB-X instance to the configuration file:

      cd tpchData/
      vi params.conf
      #!/bin/bash
      
      ### remote generating directory
      export remoteGenDir=./
      
      ### target path
      export targetPath=../tpch/tpchRaw
      
      ### cores per worker, default value is 1
      export coresPerWorker=`cat /proc/cpuinfo| grep "processor"| wc -l`
      
      ### threads per worker, default value is 1
      export threadsPerWorker=`cat /proc/cpuinfo| grep "processor"| wc -l`
      #export threadsPerWorker=1
      
      export hint=""
      
      
      export insertMysql="mysql -h{HOST} -P{PORT} -u{USER} -p{PASSWORD} -Ac --local-infile tpch_100g -e"
      You must specify the following parameters in the configuration file:
      • {HOST}: the host of the PolarDB-X instance.
      • {PORT}: the port number used to access the PolarDB-X instance
      • {USER}: the username used to log on to the PolarDB-X instance.
      • {PASSWORD}: the password used to log on to the PolarDB-X instance.

      To generate data more efficiently, you can set the threadsPerWorker parameter to a larger value in the script.

    2. Run the following commands to run the script to generate 100 GB by using multiple processes:
      cd datagen
      sh generateTPCH.sh 100
      You can view the generated data in the tpch/tpchRaw/SF100/ directory:
      ls ../tpch/tpchRaw/SF100/
      customer  lineitem  nation  orders  part  partsupp  region  supplier
    3. Run the following commands to import the data to the PolarDB-X instance:
      cd ../loadTpch
      sh loadTpch.sh 100
    4. Verify data integrity
      Connect to the PolarDB-X instance by using the command-line client and check whether the amount of data stored in each table meet your expectation.
      MySQL [tpch_100g]> select (select count(*) from customer) as customer_cnt,
       (select count(*)  from lineitem) as lineitem_cnt,
       (select count(*)  from nation) as nation_cnt,
       (select count(*)  from orders) as order_cnt,
       (select count(*) from part) as part_cnt,
       (select count(*) from partsupp) as partsupp_cnt,
       (select count(*) from region) as region_cnt,
       (select count(*) from supplier) as supplier_cnt;
      +--------------+--------------+------------+-----------+----------+--------------+------------+--------------+
      | customer_cnt | lineitem_cnt | nation_cnt | order_cnt | part_cnt | partsupp_cnt | region_cnt | supplier_cnt |
      +--------------+--------------+------------+-----------+----------+--------------+------------+--------------+
      |     15000000 |    600037902 |         25 | 150000000 | 20000000 |     80000000 |          5 |      1000000 |
      +--------------+--------------+------------+-----------+----------+--------------+------------+--------------+
    5. Collect statistics
      Connect to the PolarDB-X instance by using the command-line client and run the analyze table command to collect the statistics of each table.
      analyze table customer;
      analyze table lineitem;
      analyze table nation;
      analyze table orders;
      analyze table part;
      analyze table partsupp;
      analyze table region;
      analyze table supplier;
  5. Perform the stress test
    1. Run the following command to download and compress the test script tpch-queries.tar.gz:
      tar xzvf tpch-queries.tar.gz
    2. Run the script to execute queries and time the test.
      cd tpch-queries
      'time' -f "%e" sh all_query.sh {HOST} {USER} {PASSWORD} {DB} {PORT}

Test results

The following figures show the test results.

The duration of each query for different specifications456789Total query duration for different specifications456789
Query 4c32g x 6 (unit: second) 8c64g × 6 (unit: second) 16c128g x 6 (unit: seconds)
Q1 95.88 47.5 26.06
Q2 3.12 1.71 1.29
Q3 43.28 10.48 5.63
Q4 5.54 2.86 1.61
Q5 17.28 7.49 4.09
Q6 16.8 8.77 4.62
Q7 60.64 18.56 10.09
Q8 39.9 8.63 4.87
Q9 102.24 41.89 21.75
Q10 24.51 5.65 3.59
Q11 14.89 3.5 2.21
Q12 26.51 10.31 5.56
Q13 9.6 4.78 2.54
Q14 18.84 9.84 5.14
Q15 34.21 17.53 9.12
Q16 3.79 1.62 1.2
Q17 3.73 2.06 1.69
Q18 43.68 13.39 6.82
Q19 9.28 2.43 1.77
Q20 35.8 11.05 5.86
Q21 93.81 44.07 23.05
Q22 5.6 3.24 1.94
Total duration 694.04 273.86 148.29