This topicdescribes the TPC-H test design, process, and results for DRDS.

TPC-H overview

TPC-H is a widely used decision support benchmark. It is developed and released by the Transaction Processing Performance Council (TPC) to evaluate the query capabilities of databases. In the TPC-H benchmark, 22 complex SQL queries are performed on 8 tables. Most queries contain joins on several tables, subqueries, and GROUP BY clauses.

Note This implementation of TPC-H is derived from the TPC-H Benchmark and is not comparable to published TPC-H Benchmark results, as this implementation does not comply with all the requirements of the TPC-H Benchmark.

Test design

  • Test environment for DRDS Enterprise Edition: One 32-core 128 GB Distributed Relational Database Service (DRDS) Enterprise Edition instance that has two 16-core 64 GB nodes, and four 8-core 32 GB dedicated ApsaraDB RDS for MySQL 5.7 instances
  • Test environment for DRDS Standard Edition: One 16-core 64 GB DRDS Enterprise Edition instance that has two 8-core 32 GB nodes, and four 4-core 32 GB dedicated ApsaraDB RDS for MySQL 5.7 instances

The following test results are provided based on the queries performed on 50 GB data. The scalar factor of the 50 GB data is 50. The main tables are the LINEITEM table, ORDERS table, and PARSUPP table. The LINEITEM table has approximately 300 million rows. The ORDERS table has 75 million rows. The PARSUPP table has 40 million rows.

In the following example, Q18 contains joins on four tables that have tens of millions of rows to hundreds of millions of rows, including a subquery with semi-joins. In addition, Q18 contains aggregation operations based on the columns specified in the GROUP BY clause. It takes approximately 11 seconds for a DRDS instance to perform this query.

select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from CUSTOMER, ORDERS, LINEITEM
where o_orderkey in (
        select l_orderkey
        from LINEITEM
        group by l_orderkey
        having sum(l_quantity) > 314
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
limit 100;

Test process

Note In the following test process, the LoadData feature is used to import Table Per Hierarchy (TPH) data to your DRDS instance. If you want to use this feature, the kernel version of your DRDS instance must be 5.4.7-16000638 or later.
  1. Create an Electric Compute Service (ECS) instance as the stress tester. Create an ECS instance on which you can perform all the following operations such as preparing data and running stress tests.
    Note
    • We recommend that you select a virtual private cloud (VPC) when you create an ECS instance. If you use the classic network type, some specifications of ApsaraDB RDS for MySQL instances may be unavailable. We recommend that you record the ID and name of the VPC. All the instances that are described in subsequent steps must be deployed and all the corresponding data must be stored in this VPC.
    • We recommend that you use the latest Debian or CentOS image. This ensures that all the required dependency libraries are available during compilation.
  2. Create a DRDS instance, and the corresponding ApsaraDB RDS for MySQL instance. Make sure that these instances are deployed in the same VPC as the ECS instance created in the preceding step.
  3. Create databases and tables on the DRDS instance. In this step, you must specify a sharding method. We recommend that you use the table schemas shown in the following code:
    CREATE TABLE `customer` (
      `c_custkey` int(11) NOT NULL,
      `c_name` varchar(25) NOT NULL,
      `c_address` varchar(40) NOT NULL,
      `c_nationkey` int(11) NOT NULL,
      `c_phone` varchar(15) NOT NULL,
      `c_acctbal` decimal(15,2) NOT NULL,
      `c_mktsegment` varchar(10) NOT NULL,
      `c_comment` varchar(117) NOT NULL,
      PRIMARY KEY (`c_custkey`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`c_custkey`) tbpartition by hash(`c_custkey`) tbpartitions 4;
    CREATE TABLE `lineitem` (
      `l_orderkey` bigint(20) NOT NULL,
      `l_partkey` int(11) NOT NULL,
      `l_suppkey` int(11) NOT NULL,
      `l_linenumber` bigint(20) NOT NULL,
      `l_quantity` decimal(15,2) NOT NULL,
      `l_extendedprice` decimal(15,2) NOT NULL,
      `l_discount` decimal(15,2) NOT NULL,
      `l_tax` decimal(15,2) NOT NULL,
      `l_returnflag` varchar(1) NOT NULL,
      `l_linestatus` varchar(1) NOT NULL,
      `l_shipdate` date NOT NULL,
      `l_commitdate` date NOT NULL,
      `l_receiptdate` date NOT NULL,
      `l_shipinstruct` varchar(25) NOT NULL,
      `l_shipmode` varchar(10) NOT NULL,
      `l_comment` varchar(44) NOT NULL,
      KEY `IDX_LINEITEM_PARTKEY` (`l_partkey`),
      PRIMARY KEY (`l_orderkey`,`l_linenumber`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by RIGHT_SHIFT(`l_orderkey`,6) tbpartition by RIGHT_SHIFT(`l_orderkey`,6) tbpartitions 4;
    CREATE TABLE `orders` (
      `o_orderkey` bigint(20) NOT NULL,
      `o_custkey` int(11) NOT NULL,
      `o_orderstatus` varchar(1) NOT NULL,
      `o_totalprice` decimal(15,2) NOT NULL,
      `o_orderdate` date NOT NULL,
      `o_orderpriority` varchar(15) NOT NULL,
      `o_clerk` varchar(15) NOT NULL,
      `o_shippriority` bigint(20) NOT NULL,
      `o_comment` varchar(79) NOT NULL,
      PRIMARY KEY (`O_ORDERKEY`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by RIGHT_SHIFT(`O_ORDERKEY`,6) tbpartition by RIGHT_SHIFT(`O_ORDERKEY`,6) tbpartitions 4;
    CREATE TABLE `part` (
      `p_partkey` int(11) NOT NULL,
      `p_name` varchar(55) NOT NULL,
      `p_mfgr` varchar(25) NOT NULL,
      `p_brand` varchar(10) NOT NULL,
      `p_type` varchar(25) NOT NULL,
      `p_size` int(11) NOT NULL,
      `p_container` varchar(10) NOT NULL,
      `p_retailprice` decimal(15,2) NOT NULL,
      `p_comment` varchar(23) NOT NULL,
      PRIMARY KEY (`p_partkey`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`p_partkey`) tbpartition by hash(`p_partkey`) tbpartitions 4;
    CREATE TABLE `partsupp` (
      `ps_partkey` int(11) NOT NULL,
      `ps_suppkey` int(11) NOT NULL,
      `ps_availqty` int(11) NOT NULL,
      `ps_supplycost` decimal(15,2) NOT NULL,
      `ps_comment` varchar(199) NOT NULL,
      KEY `IDX_PARTSUPP_SUPPKEY` (`PS_SUPPKEY`),
      PRIMARY KEY (`ps_partkey`,`ps_suppkey`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`ps_partkey`) tbpartition by hash(`ps_partkey`) tbpartitions 4;
    CREATE TABLE `supplier` (
      `s_suppkey` int(11) NOT NULL,
      `s_name` varchar(25) NOT NULL,
      `s_address` varchar(40) NOT NULL,
      `s_nationkey` int(11) NOT NULL,
      `s_phone` varchar(15) NOT NULL,
      `s_acctbal` decimal(15,2) NOT NULL,
      `s_comment` varchar(101) NOT NULL,
      PRIMARY KEY (`s_suppkey`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`s_suppkey`) tbpartition by hash(`s_suppkey`) tbpartitions 4;
    CREATE TABLE `nation` (
      `n_nationkey` int(11) NOT NULL,
      `n_name` varchar(25) NOT NULL,
      `n_regionkey` int(11) NOT NULL,
      `n_comment` varchar(152) DEFAULT NULL,
      PRIMARY KEY (`n_nationkey`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 broadcast;
    CREATE TABLE `region` (
      `r_regionkey` int(11) NOT NULL,
      `r_name` varchar(25) NOT NULL,
      `r_comment` varchar(152) DEFAULT NULL,
      PRIMARY KEY (`r_regionkey`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 broadcast;
  4. Import data to the DRDS instance. Download the script from tpchData.zip to the ECS instance.
    # Extract the script from the tpchData.zip file.
    unzip tpchData.zip
    cd tpchData
    # Edit the conf file.
    vim param.conf

    Change the connection information in the last line of the conf file to the actual connection information.

    #!/bin/bash
    
    ### remote generating directory
    export remoteGenDir=./
    
    ### target path
    export targetPath=../tpch/tpchRaw
    export sourcePath=../tpch/tpchRaw
    
    ### cores per worker, default value is 1
    export coresPerWorker=1
    
    ### threads per worker, default value is 1
    export threadsPerWorker=1
    
    export hint=""
    ### If you need to test data of a different size, change the database name. For example, if you want to test 100 GB of data, change the name to tpch_100g.
    export insertMysql="mysql -hxxxxxxxxxx.drds.aliyuncs.com -P3306 -uxxx -pxxxxxx -Ac --local-infile tpch_50g -e"

    Generate 50 GB data.

    cd workloads
    # Edit the number of files where data in each table is generated.
    cp tpch.workload.1.lst tpch.workload.50.lst
    # Go to the datagen parent directory of the files.
    cd datagen
    # Generate data. This step needs to be performed only once. If you need prepared data again in this test, you can reuse the generated data.
    sh generateTPCH.sh 50
    # Load the data into the DRDS instance.
    cd ../loadTpch
    sh loadTpch.sh 50

    Verify whether the generated data is correct.

    MySQL [tpch_5g]> select (select count(*) from customer) as customer_cnt,
        ->        (select count(*)  from lineitem) as lineitem_cnt,
        ->        (select count(*)  from nation) as nation_cnt,
        ->        (select count(*)  from orders) as order_cnt,
        ->        (select count(*) from part) as part_cnt,
        ->        (select count(*) from partsupp) as partsupp_cnt,
        ->        (select count(*) from region) as region_cnt,
        ->        (select count(*) from supplier) as supplier_cnt;
  5. Execute the ANALYZE statement to collect statistics before you run the TPC-H test.
    analyze table customer;
    analyze table lineitem;
    analyze table nation;
    analyze table orders;
    analyze table part;
    analyze table partsupp;
    analyze table region;
    analyze table supplier;
  6. Run the TPC-H test.
    cd tpchData
    cd runTpch
    sh runTpch.sh
    # After the test is run, go to the result_fixed_mget directory and check the result of each SQL query.
    cd result_fixed_mget

Test results

Note DRDS Starter Edition instances do not support parallel queries. We recommend that you do not use these instances to perform analytical queries.
25473701
Query Enterprise Edition (unit: second) Standard Edition (unit: second)
Q01 55.82 111.84
Q02 6.12 11.54
Q03 15.99 30
Q04 17.71 36.56
Q05 10.89 23.01
Q06 8.06 16.76
Q07 17.09 34.80
Q08 13.44 26.09
Q09 53.81 101.51
Q10 8.73 19.67
Q11 18.25 19.74
Q12 8.80 18.60
Q13 14.15 31.33
Q14 17.49 42.43
Q15 20.62 42.79
Q16 2.13 4.15
Q17 1.93 4.07
Q18 11.01 22.82
Q19 12.97 27.61
Q20 27.77 49.25
Q21 38.84 68.08
Q22 5.27 11.29
Total 386.77 754.65