This topic describes how to use the TPC-H decision support benchmark for databases to test the online analytical processing (OLAP) performance of a the ApsaraDB PolarDB MySQL-compatible edition 8.0 cluster of Cluster Edition. You can follow the instructions in this topic to test the performance of your database.

Parallel query

the ApsaraDB PolarDB MySQL-compatible edition 8.0 Cluster Edition launches a parallel query framework. By default, parallel query is disabled. After you enable the feature, if the volume of the queried data reaches a threshold value, parallel query is automatically enabled. This shortens the time that is required to execute queries.

Note You can specify the loose_max_parallel_degree parameter to enable parallel query. For more information, see Specify cluster parameters.

The following items describe the loose_max_parallel_degree parameter:

  • The minimum value is 0. This value specifies that parallel query is disabled.
  • The maximum value is 1024.
  • We recommend that you set the loose_max_parallel_degree parameter to 16.

At the storage layer of a the ApsaraDB PolarDB MySQL-compatible edition 8.0 cluster of Cluster Edition, data shards are distributed to different threads. Multiple threads perform parallel computing and return the results in sequence to the leader thread. Then, the leader thread merges the results and returns the final result to the user. This improves query efficiency.

Parallel query is implemented based on the parallel processing capability of multi-core CPUs. A cluster whose specification is 8 cores and 32 GB memory is used in this example. The following figure shows how parallel query works.

Figure

The following sections describe the method of testing the performance of a PolarDB cluster and the test result that is returned when the loose_max_parallel_degree parameter is set to 16 or 0.

Test environment

  • The Elastic Computing Service (ECS) instance and the PolarDB cluster that you want to test must be deployed in the same zone of the same region. In this example, Hangzhou Zone I in the China (Hangzhou) region is used.
  • The network type is virtual private cloud (VPC).
    Note Make sure that the ECS instance and the PolarDB cluster are in the same VPC.
  • The following list describes the information about the PolarDB cluster that you want to test:
    • The node specification code is polar.mysql.x8.4xlarge. The specification () is 32 cores and 256 GB memory.
    • The database engine is MySQL 8.0.
    • The service edition of the cluster is Cluster Edition.
    • The cluster consists of a primary node and a read-only node.
    • The primary endpoint is used as the connection string. For more information about how to view the primary endpoint of a PolarDB cluster, see Apply for a cluster endpoint or a primary endpoint.
  • The following list describes the information about the ECS instance that is used in the test:
    • The instance type is ecs.c5.4xlarge.
    • An ultra disk of 1,000 GB is attached to the instance.
    • The image of the instance is 64-bit CentOS 7.0.

Test tools

TPC-H is a commonly used benchmark that is developed and released by the Transaction Processing Performance Council (TPC) to evaluate the analytical query capabilities of databases. The TPC-H benchmark contains eight tables and 22 complex SQL statements. Most of the queries contain JOIN clauses on several tables, subqueries, and GROUP BY clauses.

Install the TPC-H tools

  1. Install the TPC-H tools on the ECS instance.
    Note
    • TPC-H_Tools_v2.18.0 is used in this topic.
    • You must finish the registration before you download the TPC-H tools.
  2. Open the dbgen directory.
    cd dbgen
  3. Copy the makefile file to this directory.
    cp makefile.suite makefile
  4. Modify parameters, such as CC, DATABASE, MACHINE, and WORKLOAD, in the makefile file.
    1. Open the makefile file.
      vim makefile
    2. Modify the CC, DATABASE, MACHINE, and WORKLOAD parameters.
        ################
        ## CHANGE NAME OF ANSI COMPILER HERE
        ################
        CC      = gcc
        # Current values for DATABASE are: INFORMIX, DB2, ORACLE,
        #                                  SQLSERVER, SYBASE, TDAT (Teradata)
        # Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
        #                                  SGI, SUN, U2200, VMS, LINUX, WIN32
        # Current values for WORKLOAD are:  TPCH
        DATABASE= MYSQL
        MACHINE = LINUX
        WORKLOAD = TPCH
    3. Press ESC and enter :wq to save the file and exit.
  5. Modify the tpcd.h file and add new macros.
    1. Open the tpcd.h file.
      vim tpcd.h
    2. Add the following macros:
      #ifdef MYSQL
      #define GEN_QUERY_PLAN ""
      #define START_TRAN "START TRANSACTION"
      #define END_TRAN "COMMIT"
      #define SET_OUTPUT ""
      #define SET_ROWCOUNT "limit %d;\n"
      #define SET_DBASE "use %s;\n"
      #endif
    3. Press ESC and enter :wq to save the file and exit.
  6. Compile the file.
    make

    After the file is compiled, the following executable files are generated in the directory:

    • dbgen: the tool that is used to generate data. If you use the InfiniDB test script, you must use this tool to generate data for TPC-H.
    • qgen: the tool that is used to generate SQL statements. This tool generates initial statements for the test. The queries generated by using different seeds are different from each other. To ensure the repeatability of the test results, use the 22 queries that are provided in the attachment.
  7. Use dbgen to generate test data.
    ./dbgen -s 100

    The -s parameter specifies the number of repositories that are used to generate data.

  8. Use qgen to generate a query.
    Note To ensure the repeatability of test results, skip this step and click here to download and use the 22 queries.
    1. Copy qgen and dists.dss into the queries directory.
      cp qgen queries
      cp dists.dss queries
    2. Use the following script to generate a query:
      #!/usr/bin/bash
      for i in {1..22}
      do  
        ./qgen -d $i -s 100 > db"$i".sql
      done

Test procedure

  1. Specify the loose_max_parallel_degree parameter to enable or disable parallel query.
    Note In this example, the loose_max_parallel_degree parameter is set to 0 and 16. When the parameter is set to 0, parallel query is disabled. When the parameter is set to 16, parallel query is enabled. You can compare the test results. For more information, see Specify cluster parameters.
  2. Connect to the PolarDB database from the ECS instance. For more information, see Connect to a cluster.
  3. Create a database.
    create database tpch100g
  4. Create a table.
    source ./dss.ddl
    Note dss.ddl is in the dbgen directory of the TPC-H tools package.
  5. Load data.
    1. Run the following script to create load.ddl:
      load data local INFILE 'customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|';
      load data local INFILE 'region.tbl' INTO TABLE region FIELDS TERMINATED BY '|';
      load data local INFILE 'nation.tbl' INTO TABLE nation FIELDS TERMINATED BY '|';
      load data local INFILE 'supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|';
      load data local INFILE 'part.tbl' INTO TABLE part FIELDS TERMINATED BY '|';
      load data local INFILE 'partsupp.tbl' INTO TABLE partsupp FIELDS TERMINATED BY '|';
      load data local INFILE 'orders.tbl' INTO TABLE orders FIELDS TERMINATED BY '|';
      load data local INFILE 'lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY '|';
    2. Load data.
      source ./load.ddl
  6. Create primary keys and foreign keys.
    source ./dss.ri

    The tpch100g database is used in the example. Replace the content in the dss.ri file in the TPC-H tools package with the following content:

    use TPCH100G;
    -- ALTER TABLE REGION DROP PRIMARY KEY;
    -- ALTER TABLE NATION DROP PRIMARY KEY;
    -- ALTER TABLE PART DROP PRIMARY KEY;
    -- ALTER TABLE SUPPLIER DROP PRIMARY KEY;
    -- ALTER TABLE PARTSUPP DROP PRIMARY KEY;
    -- ALTER TABLE ORDERS DROP PRIMARY KEY;
    -- ALTER TABLE LINEITEM DROP PRIMARY KEY;
    -- ALTER TABLE CUSTOMER DROP PRIMARY KEY;
    -- For table REGION
    ALTER TABLE REGION
    ADD PRIMARY KEY (R_REGIONKEY);
    -- For table NATION
    ALTER TABLE NATION
    ADD PRIMARY KEY (N_NATIONKEY);
    ALTER TABLE NATION
    ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references REGION(R_REGIONKEY);
    COMMIT WORK;
    -- For table PART
    ALTER TABLE PART
    ADD PRIMARY KEY (P_PARTKEY);
    COMMIT WORK;
    -- For table SUPPLIER
    ALTER TABLE SUPPLIER
    ADD PRIMARY KEY (S_SUPPKEY);
    ALTER TABLE SUPPLIER
    ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references NATION(N_NATIONKEY);
    COMMIT WORK;
    -- For table PARTSUPP
    ALTER TABLE PARTSUPP
    ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);
    COMMIT WORK;
    -- For table CUSTOMER
    ALTER TABLE CUSTOMER
    ADD PRIMARY KEY (C_CUSTKEY);
    ALTER TABLE CUSTOMER
    ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references NATION(N_NATIONKEY);
    COMMIT WORK;
    -- For table LINEITEM
    ALTER TABLE LINEITEM
    ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);
    COMMIT WORK;
    -- For table ORDERS
    ALTER TABLE ORDERS
    ADD PRIMARY KEY (O_ORDERKEY);
    COMMIT WORK;
    -- For table PARTSUPP
    ALTER TABLE PARTSUPP
    ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references SUPPLIER(S_SUPPKEY);
    COMMIT WORK;
    ALTER TABLE PARTSUPP
    ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references PART(P_PARTKEY);
    COMMIT WORK;
    -- For table ORDERS
    ALTER TABLE ORDERS
    ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references CUSTOMER(C_CUSTKEY);
    COMMIT WORK;
    -- For table LINEITEM
    ALTER TABLE LINEITEM
    ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references ORDERS(O_ORDERKEY);
    COMMIT WORK;
    ALTER TABLE LINEITEM
    ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references 
            PARTSUPP(PS_PARTKEY,PS_SUPPKEY);
    COMMIT WORK;
  7. Create indexes.
    #!/usr/bin/bash
    host=$1
    port=$2
    user=$3
    password=$4
    db=$5
    sqls=("create index i_s_nationkey on supplier (s_nationkey);"
    "create index i_ps_partkey on partsupp (ps_partkey);"
    "create index i_ps_suppkey on partsupp (ps_suppkey);"
    "create index i_c_nationkey on customer (c_nationkey);"
    "create index i_o_custkey on orders (o_custkey);"
    "create index i_o_orderdate on orders (o_orderdate);"
    "create index i_l_orderkey on lineitem (l_orderkey);"
    "create index i_l_partkey on lineitem (l_partkey);"
    "create index i_l_suppkey on lineitem (l_suppkey);"
    "create index i_l_partkey_suppkey on lineitem (l_partkey, l_suppkey);"
    "create index i_l_shipdate on lineitem (l_shipdate);"
    "create index i_l_commitdate on lineitem (l_commitdate);"
    "create index i_l_receiptdate on lineitem (l_receiptdate);"
    "create index i_n_regionkey on nation (n_regionkey);"
    "analyze table supplier"
    "analyze table part"
    "analyze table partsupp"
    "analyze table customer"
    "analyze table orders"
    "analyze table lineitem"
    "analyze table nation"
    "analyze table region")
    for sql in "${sqls[@]}"
    do
        mysql -h$host -P$port -u$user -p$password -D$db  -e "$sql"
    done
    Note To measure the performance improvement brought by parallel query, execute the following query to preload the indexes to the memory pool.
    #!/bin/bash
    host=$1
    port=$2
    user=$3
    password=$4
    dbname=$5
    MYSQL="mysql -h$host -P$port -u$user -p$password -D$dbname"
    if [ -z ${dbname} ]; then
        echo "dbname not defined."
        exit 1
    fi
    table_indexes=(
            "supplier PRIMARY"
            "supplier i_s_nationkey"
            "part PRIMARY"
            "partsupp PRIMARY"
            "partsupp i_ps_partkey"
            "partsupp i_ps_suppkey"
            "customer PRIMARY"
            "customer i_c_nationkey"
            "orders PRIMARY"
            "orders i_o_custkey"
            "orders i_o_orderdate"
            "lineitem PRIMARY"
            "lineitem i_l_orderkey"
            "lineitem i_l_partkey"
            "lineitem i_l_suppkey"
            "lineitem i_l_partkey_suppkey"
            "lineitem i_l_shipdate"
            "lineitem i_l_commitdate"
            "lineitem i_l_receiptdate"
            "nation i_n_regionkey"
            "nation PRIMARY"
            "region PRIMARY"
    )
    for table_index in "${table_indexes[@]}"
    do
        ti=($table_index)
        table=${ti[0]}
        index=${ti[1]}
        SQL="select count(*) from ${table} force index(${index})"
        echo "$MYSQL -e '$SQL'"
        $MYSQL -e "$SQL"
    done
  8. Execute queries.
    #!/usr/bin/env bash
    host=$1
    port=$2
    user=$3
    password=$4
    database=$5
    resfile=$6
    echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out
    for (( i=1; i<=22;i=i+1 ))
    do
    queryfile="Q"${i}".sql"
    start_time=`date "+%s.%N"`
    echo "run query ${i}"|tee -a ${resfile}.out
    mysql -h ${host}  -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out
    end_time=`date "+%s.%N"`
    start_s=${start_time%.*}
    start_nanos=${start_time#*.}
    end_s=${end_time%.*}
    end_nanos=${end_time#*.}
    if [ "$end_nanos" -lt "$start_nanos" ];then
            end_s=$(( 10#$end_s -1 ))
            end_nanos=$(( 10#$end_nanos + 10 ** 9))
    fi
    time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))`
    echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time
    done

Result

The following figures show the comparison between the result returned when the parallel query parameter is set to 16 and the result returned when the parameter is set to 0.

Time required to execute each queryPerformance improvement
Note In the preceding figures, the letter Q is short for the word query. For example, Q1 represents the first query.

The following table lists the test results.

Query Time consumed (seconds)

Degree of parallelism (DOP) = 16

Time consumed (seconds)

DOP = 0

Times of performance improvement

(DOP 0/DOP 16)

Q1 80.18 1290.6 16
Q2 1.44 11.8 8
Q3 25.05 244.92 10
Q4 6.91 59.61 9
Q5 24.44 231.18 9
Q6 14.51 217.42 15
Q7 51.97 410.59 8
Q8 5.61 57.52 10
Q9 37.84 415.11 11
Q10 38.72 139.73 4
Q11 11.75 30.67 3
Q12 15.89 245.19 15
Q13 104.12 718.2 7
Q14 8.31 66.66 8
Q15 32.5 123.79 4
Q16 26.9 37.54 1
Q17 16.2 54.34 3
Q18 66.77 240.28 4
Q19 1.58 18.62 12
Q20 45.88 46.91 1
Q21 53.99 253.27 5
Q22 2.07 17.08 8