This topic describes how to use TPC-H to test the online analytical processing (OLAP)
performance of a PolarDB for MySQL 8.0 cluster. You can follow the instructions in this topic to perform tests to understand
the performance of database systems.
Introduction to parallel query
The parallel query framework is introduced in PolarDB for MySQL 8.0 clusters. By default, parallel query is disabled. After parallel query is enabled,
the parallel query framework is automatically enabled when the amount of queried data
reaches a specific threshold. This helps to reduce the time that is consumed to run
the query.
Note You can specify the
loose_max_parallel_degree parameter to enable parallel query. For more information about how to specify cluster
parameters, see
Specify cluster parameters.
The following items describe the loose_max_parallel_degree parameter:
- The minimum value is 0. This value indicates that parallel query is disabled.
- The maximum value is 1024.
- We recommend that you set the parallel query parameter to 16.
At the storage layer of a PolarDB for MySQL 8.0 cluster, data shards are distributed to different threads. Multiple threads perform
parallel computing and return the results to the leader thread. Then, the leader thread
merges the results in a simple way and returns the final result to the user. This
improves query efficiency.
Parallel query is implemented based on the parallel processing capability of multi-core
CPUs. A cluster whose specification is eight cores and 32 GB is used in this example.
The following figure shows how parallel query works.

The following section describes the test method and results of the workload performance
for a PolarDB cluster when the parallel query parameter is set to 16 and 0.
Test environment
- The Elastic Computing Service (ECS) instance and the PolarDB cluster to be tested must be deployed in the same zone of the same region. In this
example, the zone is Zone I of the China (Hangzhou) region.
- The network type is virtual private cloud (VPC).
Note Make sure that the ECS instance and the PolarDB cluster are in the same VPC.
- The following PolarDB cluster is used for testing:
- The node type is polar.mysql.x8.4xlarge (32 cores and 256 GB).
- The database version is MySQL 8.0.
- Two nodes (one primary node and one read-only node) are available.
- The used connection string is a primary endpoint. For more information about how to
query the primary endpoint of PolarDB, see View or apply for an endpoint.
- The following information about the ECS instance that is used for testing is provided:
- The instance type is ecs.c5.4xlarge.
- An efficient ultra disk of 1,000 GB is attached to the instance.
- The image used by the instance is 64-bit CentOS 7.0.
Test tool
TPC-H is a benchmark that is widely used in the industry. It is formulated and released
by the Transaction Processing Performance Council (TPC) to evaluate the analytical
query capabilities of databases. The TPC-H query consists of eight tables and 22 complex
Structured Query Language (SQL) queries. Most of the queries contain join operations
on tables, subqueries, and GROUP BY that is used for aggregation.
Install TPC-H
- Install TPC-H on the ECS instance.
Note
- The TPC-H version used in this topic is v2.18.0.
- Before you download TPC-H, make sure that you have completed registration.
- Open the dbgen directory.
- Copy the
makefile
file.cp makefile.suite makefile
- Modify parameters, such as CC, DATABASE, MACHINE, and WORKLOAD, in the
makefile
file.
- Open the
makefile
file.
- Modify the definitions of the CC, DATABASE, MACHINE, and WORKLOAD parameters.
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, ORACLE,
# SQLSERVER, SYBASE, TDAT (Teradata)
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE= MYSQL
MACHINE = LINUX
WORKLOAD = TPCH
- Press ECS and then enter
:wq
to save the file and exit.
- Modify the
tpcd.h
file and add new macro definitions.
- Open the
tpcd.h
file.
- Add the following macro definitions:
#ifdef MYSQL
#define GEN_QUERY_PLAN ""
#define START_TRAN "START TRANSACTION"
#define END_TRAN "COMMIT"
#define SET_OUTPUT ""
#define SET_ROWCOUNT "limit %d;\n"
#define SET_DBASE "use %s;\n"
#endif
- Press ECS and then enter
:wq
to save the file and exit.
- Compile the file.
make
After the file is compiled, the following two executable files are generated in this
directory:
dbgen
: the tool that is used to generate data. When you use the official test script of
InfiniDB, you must use this tool to generate table data that is related to TPC-H.
qgen
: the tool that is used to generate SQL queries. This tool generates initialized test
queries. The queries generated by different seeds are different. To ensure repeatability,
use the 22 queries that are provided in the attachment.
- Use TPC-H to generate test data.
./dbgen -s 100
The dbgen parameter -s
specifies the number of repositories from which the test data is generated.
- Use TPC-H to generate a query.
Note To ensure the repeatability of test results, you can skip the following steps that
are performed to generate queries, and use the 22 queries for testing in the attachment.
For more information about the attachment, see
Attachment.
- Copy
qgen
and dists.dss
to the queries directory.cp qgen queries
cp dists.dss queries
- Use the following script to generate a query:
#! /usr/bin/bash
for i in {1..22}
do
./qgen -d $i -s 100 > db"$i".sql
done
Test method
- Check whether parallel query is enabled by specifying the loose_max_parallel_degree parameter.
Note This example is used to compare the effect of enabled parallel query with the effect
of disabled parallel query. Set this parameter to
16 and then
0 to test the results for comparison. For how to set the parameter, see
Specify cluster parameters.
- On the ECS instance, connect to the PolarDB database. For more information about this operation, see Connect to a PolarDB for MySQL cluster.
- Create a database.
- Create a table.
source ./dss.ddl
Note dss.ddl
is in the dbgen directory in TPC-H.
- Load data.
- Use the following script to create
load.ddl
:load data local INFILE 'customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|';
load data local INFILE 'region.tbl' INTO TABLE region FIELDS TERMINATED BY '|';
load data local INFILE 'nation.tbl' INTO TABLE nation FIELDS TERMINATED BY '|';
load data local INFILE 'supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|';
load data local INFILE 'part.tbl' INTO TABLE part FIELDS TERMINATED BY '|';
load data local INFILE 'partsupp.tbl' INTO TABLE partsupp FIELDS TERMINATED BY '|';
load data local INFILE 'orders.tbl' INTO TABLE orders FIELDS TERMINATED BY '|';
load data local INFILE 'lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY '|';
- Load data.
- Create a primary key and a foreign key.
source ./dss.ri
Replace the content in the dss.ri
file of TPC-H with the content in the following code. The created tpch100g
database is used in this example.
use TPCH100G;
-- ALTER TABLE REGION DROP PRIMARY KEY;
-- ALTER TABLE NATION DROP PRIMARY KEY;
-- ALTER TABLE PART DROP PRIMARY KEY;
-- ALTER TABLE SUPPLIER DROP PRIMARY KEY;
-- ALTER TABLE PARTSUPP DROP PRIMARY KEY;
-- ALTER TABLE ORDERS DROP PRIMARY KEY;
-- ALTER TABLE LINEITEM DROP PRIMARY KEY;
-- ALTER TABLE CUSTOMER DROP PRIMARY KEY;
-- For table REGION
ALTER TABLE REGION
ADD PRIMARY KEY (R_REGIONKEY);
-- For table NATION
ALTER TABLE NATION
ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE NATION
ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references REGION(R_REGIONKEY);
COMMIT WORK;
-- For table PART
ALTER TABLE PART
ADD PRIMARY KEY (P_PARTKEY);
COMMIT WORK;
-- For table SUPPLIER
ALTER TABLE SUPPLIER
ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE SUPPLIER
ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references NATION(N_NATIONKEY);
COMMIT WORK;
-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;
-- For table CUSTOMER
ALTER TABLE CUSTOMER
ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE CUSTOMER
ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references NATION(N_NATIONKEY);
COMMIT WORK;
-- For table LINEITEM
ALTER TABLE LINEITEM
ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);
COMMIT WORK;
-- For table ORDERS
ALTER TABLE ORDERS
ADD PRIMARY KEY (O_ORDERKEY);
COMMIT WORK;
-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references SUPPLIER(S_SUPPKEY);
COMMIT WORK;
ALTER TABLE PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references PART(P_PARTKEY);
COMMIT WORK;
-- For table ORDERS
ALTER TABLE ORDERS
ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references CUSTOMER(C_CUSTKEY);
COMMIT WORK;
-- For table LINEITEM
ALTER TABLE LINEITEM
ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY) references ORDERS(O_ORDERKEY);
COMMIT WORK;
ALTER TABLE LINEITEM
ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references
PARTSUPP(PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;
- Create an index.
#! /usr/bin/bash
host=$1
port=$2
user=$3
password=$4
db=$5
sqls=("create index i_s_nationkey on supplier (s_nationkey);"
"create index i_ps_partkey on partsupp (ps_partkey);"
"create index i_ps_suppkey on partsupp (ps_suppkey);"
"create index i_c_nationkey on customer (c_nationkey);"
"create index i_o_custkey on orders (o_custkey);"
"create index i_o_orderdate on orders (o_orderdate);"
"create index i_l_orderkey on lineitem (l_orderkey);"
"create index i_l_partkey on lineitem (l_partkey);"
"create index i_l_suppkey on lineitem (l_suppkey);"
"create index i_l_partkey_suppkey on lineitem (l_partkey, l_suppkey);"
"create index i_l_shipdate on lineitem (l_shipdate);"
"create index i_l_commitdate on lineitem (l_commitdate);"
"create index i_l_receiptdate on lineitem (l_receiptdate);"
"create index i_n_regionkey on nation (n_regionkey);"
"analyze table supplier"
"analyze table part"
"analyze table partsupp"
"analyze table customer"
"analyze table orders"
"analyze table lineitem"
"analyze table nation"
"analyze table region")
for sql in "${sqls[@]}"
do
mysql -h$host -P$port -u$user -p$password -D$db -e "$sql"
done
Note To evaluate the performance improvement brought by parallel query, you can run the
following query to preload the index data to the memory pool:
#! /bin/bash
host=$1
port=$2
user=$3
password=$4
dbname=$5
MYSQL="mysql -h$host -P$port -u$user -p$password -D$dbname"
if [ -z ${dbname} ]; then
echo "dbname not defined."
exit 1
fi
table_indexes=(
"supplier PRIMARY"
"supplier i_s_nationkey"
"part PRIMARY"
"partsupp PRIMARY"
"partsupp i_ps_partkey"
"partsupp i_ps_suppkey"
"customer PRIMARY"
"customer i_c_nationkey"
"orders PRIMARY"
"orders i_o_custkey"
"orders i_o_orderdate"
"lineitem PRIMARY"
"lineitem i_l_orderkey"
"lineitem i_l_partkey"
"lineitem i_l_suppkey"
"lineitem i_l_partkey_suppkey"
"lineitem i_l_shipdate"
"lineitem i_l_commitdate"
"lineitem i_l_receiptdate"
"nation i_n_regionkey"
"nation PRIMARY"
"region PRIMARY"
)
for table_index in "${table_indexes[@]}"
do
ti=($table_index)
table=${ti[0]}
index=${ti[1]}
SQL="select count(*) from ${table} force index(${index})"
echo "$MYSQL -e '$SQL'"
$MYSQL -e "$SQL"
done
- Run the query.
#! /usr/bin/env bash
host=$1
port=$2
user=$3
password=$4
database=$5
resfile=$6
echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out
for (( i=1; i<=22;i=i+1 ))
do
queryfile="Q"${i}".sql"
start_time=`date "+%s. %N"`
echo "run query ${i}"|tee -a ${resfile}.out
mysql -h ${host} -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out
end_time=`date "+%s. %N"`
start_s=${start_time%.*}
start_nanos=${start_time#*.}
end_s=${end_time%.*}
end_nanos=${end_time#*.}
if [ "$end_nanos" -lt "$start_nanos" ];then
end_s=$(( 10#$end_s -1 ))
end_nanos=$(( 10#$end_nanos + 10 ** 9))
fi
time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))`
echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time
done
Result
The following figures illustrate the comparison result when the parallel query parameter
is set to 16 and 0.


Note Q1 represents the first query, Q2 represents the second query, and so on.
The following table lists the test results.
Query |
Time consumed (seconds)
Degree of parallelism (DOP) = 16
|
Time consumed (seconds)
DOP = 0
|
Increased multiple
(DOP = 0 or DOP = 16)
|
Q1 |
80.18 |
1290.6 |
16 |
Q2 |
1.44 |
11.8 |
8 |
Q3 |
25.05 |
244.92 |
10 |
Q4 |
6.91 |
59.61 |
9 |
Q5 |
24.44 |
231.18 |
9 |
Q6 |
14.51 |
217.42 |
15 |
Q7 |
51.97 |
410.59 |
8 |
Q8 |
5.61 |
57.52 |
10 |
Q9 |
37.84 |
415.11 |
11 |
Q10 |
38.72 |
139.73 |
4 |
Q11 |
11.75 |
30.67 |
3 |
Q12 |
15.89 |
245.19 |
15 |
Q13 |
104.12 |
718.2 |
7 |
Q14 |
8.31 |
66.66 |
8 |
Q15 |
32.5 |
123.79 |
4 |
Q16 |
26.9 |
37.54 |
1 |
Q17 |
16.2 |
54.34 |
3 |
Q18 |
66.77 |
240.28 |
4 |
Q19 |
1.58 |
18.62 |
12 |
Q20 |
45.88 |
46.91 |
1 |
Q21 |
53.99 |
253.27 |
5 |
Q22 |
2.07 |
17.08 |
8 |