This topic describes how to test the online analytical processing (OLAP) performance of an ApsaraDB PolarDB MySQL cluster 8.0 by using TPC-H.

About parallel query

ApsaraDB PolarDB MySQL 8.0 launches a parallel query framework. By default, parallel query is disabled. After parallel query is enabled, the parallel query framework is automatically enabled when the amount of queried data reaches the specified threshold. This helps to reduce the time required to run the query.

Note You can enable parallel query by setting the loose_max_parallel_degree parameter. For more information, see Set cluster parameters.

Follow these rules to set the loose_max_parallel_degree parameter.

  • The minimum value is 0, which specifies to disable parallel query.
  • The maximum value is 1024.
  • We recommend that you set the parameter value to 16.

ApsaraDB PolarDB MySQL 8.0 distributes data to different threads at the storage layer. Multiple threads perform parallel computing and return the results to the leader thread. Then, the leader thread merges the results and returns the final result to the user. This helps to improve the query speed and accuracy.

Parallel query is implemented based on the parallel processing capability of multi-core CPUs. The following figure shows how parallel query works on a cluster that has an 8-core CPU and 32-GB memory.

Parallel query

The following section describes how to test the performance of an ApsaraDB PolarDB MySQL cluster under different conditions where the loose_max_parallel_degree parameter is set to 16 and 0, respectively. The test results are also provided.

Test environment

  • An Elastic Compute Service (ECS) instance and an ApsaraDB PolarDB MySQL cluster are used for testing, and they must be deployed in the same region and zone. In this test, they are both deployed in Zone I of the China (Hangzhou) region.
  • The VPC network type is selected for both the ECS instance and the ApsaraDB PolarDB MySQL cluster.
    Note Make sure that the ECS instance and the ApsaraDB PolarDB MySQL cluster are connected to the same VPC network.
  • The ApsaraDB PolarDB MySQL cluster for testing is as follows:
    • The node specification is polar.mysql.x8.4xlarge (32-core 256 GB).
    • The database engine is MySQL 8.0.
    • The cluster contains two nodes, one primary node and one read-only node.
    • Use the primary endpoint to connect to the cluster. For information about how to query the primary endpoint, see View or apply for an endpoint.
  • The ECS instance for testing is as follows:
    • The instance type is ecs.c5.4xlarge.
    • An ultra disk of 1,000 GiB is attached to the instance.
    • The image used by the instance is 64-bit CentOS 7.0.

Test tool

TPC-H is a standard benchmark. It is developed and released by the Transaction Processing Performance (TPC) Council to evaluate query capabilities of databases. The TPC-H benchmark contains 8 tables and defines 22 complex SQL queries. Most of the queries contain Join operations on several tables, subqueries, and Group by clauses.

Install TPC-H

  1. Install TPC-H on the ECS instance.
    Note
    • The TPC-H version used in this topic is v2.18.0. Download TPC-H v2.18.0.
    • Before you download TPC-H, make sure that you have performed real-name registration.
    Download TPC-H
  2. Open the dbgen directory.
    cd dbgen
  3. Copy the makefile file.
    cp makefile.suite makefile
  4. Modify parameters in the makefile file, including the CC, DATABASE, MACHINE, and WORKLOAD parameters.
    1. Open the makefile file.
      vim makefile
    2. Modify the definitions of the CC, DATABASE, MACHINE, and WORKLOAD parameters.
        ################
        ## CHANGE NAME OF ANSI COMPILER HERE
        ################
        CC      = gcc
        # Current values for DATABASE are: INFORMIX, DB2, ORACLE,
        #                                  SQLSERVER, SYBASE, TDAT (Teradata)
        # Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
        #                                  SGI, SUN, U2200, VMS, LINUX, WIN32
        # Current values for WORKLOAD are:  TPCH
        DATABASE= MYSQL
        MACHINE = LINUX
        WORKLOAD = TPCH
    3. Press Esc and then enter :wq to save the file and exit.
  5. Modify the tpcd.h file and add new macro definitions.
    1. Open the tpcd.h file.
      vim tpcd.h
    2. Add the following macro definitions.
      #ifdef MYSQL
      #define GEN_QUERY_PLAN ""
      #define START_TRAN "START TRANSACTION"
      #define END_TRAN "COMMIT"
      #define SET_OUTPUT ""
      #define SET_ROWCOUNT "limit %d;\n"
      #define SET_DBASE "use %s;\n"
      #endif
    3. Press Esc and then enter :wq to save the file and exit.
  6. Compile the makefile file.
    make

    After the file is compiled, two executable files are generated in this directory:

    • dbgen: the tool to generate data. When you use the InfiniDB test script, you must use this tool to generate TPC-H data.
    • qgen: the tool to generate SQL queries. The queries generated by different seeds are different. To ensure repeatability, use the 22 queries provided in the attachment.
  7. Use TPC-H to generate test data.
    ./dbgen -s 100

    The -s parameter is used to specify the amount of data to be generated.

  8. Use TPC-H to generate a query.
    Note To ensure the repeatability of test results, you may skip this step and use the 22 queries in the Attachment.
    1. Copy qgen and dists.dss into the queries directory.
      cp qgen queries
      cp dists.dss queries
    2. Use the following script to generate a query.
      #! /usr/bin/bash
      for i in {1..22}
      do  
        ./qgen -d $i -s 100 > db"$i".sql
      done

Test procedure

  1. Verify that parallel query is enabled on the ApsaraDB PolarDB MySQL cluster.
    1. Log on to the ApsaraDB for PolarDB console.
    2. In the upper-left corner of the console, select the region where the cluster is deployed.
    3. Click the ID of the cluster.
    4. In the left-side navigation pane, choose Settings and Management > Parameters.
    5. Enter loose_max_parallel_degree in the search box and click the search icon.
      Modify the loose_max_parallel_degree parameter
    6. Set the current value to 16.
      Note To clearly understand how parallel query improves the performance of the cluster, perform a comparative test, in which this parameter is set to 16 and then 0.
    7. After the parameter is modified, click Apply Changes in the upper-left corner of the page.
    8. In the Save Changes dialog box that appears, click OK.
  2. Connect to an ApsaraDB PolarDB MySQL database on the ECS instance. For more information, see Connect to a database cluster.
  3. Create a database.
    create database tpch100g
  4. Creates a table.
    source ./dss.ddl
    Note dss.ddl is under the dbgen directory in TPC-H.
  5. Load data.
    1. Use the following script to create load.ddl.
      load data local INFILE 'customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|';
      load data local INFILE 'region.tbl' INTO TABLE region FIELDS TERMINATED BY '|';
      load data local INFILE 'nation.tbl' INTO TABLE nation FIELDS TERMINATED BY '|';
      load data local INFILE 'supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|';
      load data local INFILE 'part.tbl' INTO TABLE part FIELDS TERMINATED BY '|';
      load data local INFILE 'partsupp.tbl' INTO TABLE partsupp FIELDS TERMINATED BY '|';
      load data local INFILE 'orders.tbl' INTO TABLE orders FIELDS TERMINATED BY '|';
      load data local INFILE 'lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY '|';
    2. Load data.
      source ./load.ddl
  6. Create a primary key and a foreign key.
    source ./dss.ri

    This example uses the tpch100g database that was previously created. Replace the content in the dss.ri file in TPC-H with the following content.

    use TPCH100G;
    -- ALTER TABLE REGION DROP PRIMARY KEY;
    -- ALTER TABLE NATION DROP PRIMARY KEY;
    -- ALTER TABLE PART DROP PRIMARY KEY;
    -- ALTER TABLE SUPPLIER DROP PRIMARY KEY;
    -- ALTER TABLE PARTSUPP DROP PRIMARY KEY;
    -- ALTER TABLE ORDERS DROP PRIMARY KEY;
    -- ALTER TABLE LINEITEM DROP PRIMARY KEY;
    -- ALTER TABLE CUSTOMER DROP PRIMARY KEY;
    -- For table REGION
    ALTER TABLE REGION
    ADD PRIMARY KEY (R_REGIONKEY);
    -- For table NATION
    ALTER TABLE NATION
    ADD PRIMARY KEY (N_NATIONKEY);
    ALTER TABLE NATION
    ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references REGION(R_REGIONKEY);
    COMMIT WORK;
    -- For table PART
    ALTER TABLE PART
    ADD PRIMARY KEY (P_PARTKEY);
    COMMIT WORK;
    -- For table SUPPLIER
    ALTER TABLE SUPPLIER
    ADD PRIMARY KEY (S_SUPPKEY);
    ALTER TABLE SUPPLIER
    ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references NATION(N_NATIONKEY);
    COMMIT WORK;
    -- For table PARTSUPP
    ALTER TABLE PARTSUPP
    ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);
    COMMIT WORK;
    -- For table CUSTOMER
    ALTER TABLE CUSTOMER
    ADD PRIMARY KEY (C_CUSTKEY);
    ALTER TABLE CUSTOMER
    ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references NATION(N_NATIONKEY);
    COMMIT WORK;
    -- For table LINEITEM
    ALTER TABLE LINEITEM
    ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);
    COMMIT WORK;
    -- For table ORDERS
    ALTER TABLE ORDERS
    ADD PRIMARY KEY (O_ORDERKEY);
    COMMIT WORK;
    -- For table PARTSUPP
    ALTER TABLE PARTSUPP
    ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references SUPPLIER(S_SUPPKEY);
    COMMIT WORK;
    ALTER TABLE PARTSUPP
    ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references PART(P_PARTKEY);
    COMMIT WORK;
    -- For table ORDERS
    ALTER TABLE ORDERS
    ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references CUSTOMER(C_CUSTKEY);
    COMMIT WORK;
    -- For table LINEITEM
    ALTER TABLE LINEITEM
    ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references ORDERS(O_ORDERKEY);
    COMMIT WORK;
    ALTER TABLE LINEITEM
    ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references 
            PARTSUPP(PS_PARTKEY,PS_SUPPKEY);
    COMMIT WORK;
  7. Create an index.
    #! /usr/bin/bash
    host=$1
    port=$2
    user=$3
    password=$4
    db=$5
    sqls=("create index i_s_nationkey on supplier (s_nationkey);"
    "create index i_ps_partkey on partsupp (ps_partkey);"
    "create index i_ps_suppkey on partsupp (ps_suppkey);"
    "create index i_c_nationkey on customer (c_nationkey);"
    "create index i_o_custkey on orders (o_custkey);"
    "create index i_o_orderdate on orders (o_orderdate);"
    "create index i_l_orderkey on lineitem (l_orderkey);"
    "create index i_l_partkey on lineitem (l_partkey);"
    "create index i_l_suppkey on lineitem (l_suppkey);"
    "create index i_l_partkey_suppkey on lineitem (l_partkey, l_suppkey);"
    "create index i_l_shipdate on lineitem (l_shipdate);"
    "create index i_l_commitdate on lineitem (l_commitdate);"
    "create index i_l_receiptdate on lineitem (l_receiptdate);"
    "create index i_n_regionkey on nation (n_regionkey);"
    "analyze table supplier"
    "analyze table part"
    "analyze table partsupp"
    "analyze table customer"
    "analyze table orders"
    "analyze table lineitem"
    "analyze table nation"
    "analyze table region")
    for sql in "${sqls[@]}"
    do
        mysql -h$host -P$port -u$user -p$password -D$db  -e "$sql"
    done
    Note To more effectively measure the performance improvement brought by parallel query, you can use the following query to preload the used index data to the memory pool.
    #! /bin/bash
    host=$1
    port=$2
    user=$3
    password=$4
    dbname=$5
    MYSQL="mysql -h$host -P$port -u$user -p$password -D$dbname"
    if [ -z ${dbname} ]; then
        echo "dbname not defined."
        exit 1
    fi
    table_indexes=(
            "supplier PRIMARY"
            "supplier i_s_nationkey"
            "part PRIMARY"
            "partsupp PRIMARY"
            "partsupp i_ps_partkey"
            "partsupp i_ps_suppkey"
            "customer PRIMARY"
            "customer i_c_nationkey"
            "orders PRIMARY"
            "orders i_o_custkey"
            "orders i_o_orderdate"
            "lineitem PRIMARY"
            "lineitem i_l_orderkey"
            "lineitem i_l_partkey"
            "lineitem i_l_suppkey"
            "lineitem i_l_partkey_suppkey"
            "lineitem i_l_shipdate"
            "lineitem i_l_commitdate"
            "lineitem i_l_receiptdate"
            "nation i_n_regionkey"
            "nation PRIMARY"
            "region PRIMARY"
    )
    for table_index in "${table_indexes[@]}"
    do
        ti=($table_index)
        table=${ti[0]}
        index=${ti[1]}
        SQL="select count(*) from ${table} force index(${index})"
        echo "$MYSQL -e '$SQL'"
        $MYSQL -e "$SQL"
    done
  8. Run the query.
    #! /usr/bin/env bash
    host=$1
    port=$2
    user=$3
    password=$4
    database=$5
    resfile=$6
    echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out
    for (( i=1; i<=22;i=i+1 ))
    do
    queryfile="Q"${i}".sql"
    start_time=`date "+%s. %N"`
    echo "run query ${i}"|tee -a ${resfile}.out
    mysql -h ${host}  -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out
    end_time=`date "+%s. %N"`
    start_s=${start_time%.*}
    start_nanos=${start_time#*.}
    end_s=${end_time%.*}
    end_nanos=${end_time#*.}
    if [ "$end_nanos" -lt "$start_nanos" ];then
            end_s=$(( 10#$end_s -1 ))
            end_nanos=$(( 10#$end_nanos + 10 ** 9))
    fi
    time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))`
    echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time
    done

Result

The following figures illustrate how query performance is improved when parallel query is enabled.

ComparisonImprovement made by parallel query
Note In the preceding figures, the letter Q is short for query. For example, Q1 represents the first query.

The test results are as follows.

Query Time consumed (seconds)

Degree of parallelism (DOP) = 16

Time consumed (seconds)

Degree of parallelism (DOP) = 0

Performance increases by

(DOP 0/DOP 16)

Q1 80.18 1290.6 16
Q2 1.44 11.8 8
Q3 25.05 244.92 10
Q4 6.91 59.61 9
Q5 24.44 231.18 9
Q6 14.51 217.42 15
Q7 51.97 410.59 8
Q8 5.61 57.52 10
Q9 37.84 415.11 11
Q10 38.72 139.73 4
Q11 11.75 30.67 3
Q12 15.89 245.19 15
Q13 104.12 718.2 7
Q14 8.31 66.66 8
Q15 32.5 123.79 4
Q16 26.9 37.54 1
Q17 16.2 54.34 3
Q18 66.77 240.28 4
Q19 1.58 18.62 12
Q20 45.88 46.91 1
Q21 53.99 253.27 5
Q22 2.07 17.08 8