This topic describes how to run a TPC-H benchmark against a PolarDB for MySQL 8.0 Cluster Edition cluster to measure OLAP parallel query performance.
TPC-H stresses the query patterns that parallel query is designed to accelerate: multi-table JOINs, subqueries, and GROUP BY aggregations across large datasets. Running these 22 queries with and without parallel query enabled gives you a reproducible baseline for evaluating the performance gains in your own environment.
Test environment
The ECS instance and the PolarDB for MySQL cluster are deployed in the same region, zone, and VPC.
PolarDB for MySQL cluster
| Parameter | Value |
|---|---|
| Number of clusters | 1 |
| Database engine | MySQL 8.0.1 or MySQL 8.0.2 |
| Product edition | Enterprise Edition |
| Edition | Cluster Edition |
| Specification type | Dedicated |
| Node specifications | polar.mysql.x8.4xlarge (32 cores, 256 GB) |
| Number of nodes | 2 (1 primary node, 1 read-only node) |
The test uses the primary endpoint to connect to the cluster. For more information, see Manage the endpoints of a cluster.
ECS instance
| Parameter | Value |
|---|---|
| Instance type | ecs.c5.4xlarge (16 vCPU, 32 GiB) |
| Image | CentOS 7.0 64-bit |
| System disk | ESSD, 1000 GB |
Test tool
TPC-H is a benchmark developed by the Transaction Processing Performance Council (TPC) to evaluate database analytical query performance. It includes 8 tables and 22 complex SQL statements. Most queries combine multi-table JOINs, subqueries, and GROUP BY clauses — the workload patterns that benefit most from parallel execution.
This test is based on the TPC-H benchmark but does not satisfy all TPC-H requirements. The results cannot be compared with published TPC-H benchmark results.
Install TPC-H tools
Log in to the ECS instance as the root user before running the following commands.
This topic uses TPC-H_Tools_v2.18.0. Registration is required before downloading.
Register and download the TPC-H package, then upload it to the ECS instance. For more information, see Upload files.
Decompress the package. Replace
TPC-H-Tool.zipwith the actual filename.unzip TPC-H-Tool.zipGo to the
dbgendirectory. ReplaceTPC-H-Tool/dbgenwith the actual path.cd TPC-H-Tool/dbgenCopy the
makefile.suitefile tomakefile.cp makefile.suite makefileInstall GCC.
This example uses CentOS. On Ubuntu, run
sudo apt install gccinstead.sudo yum install gccEdit the
makefilefile and set theCC,DATABASE,MACHINE, andWORKLOADparameters.vim makefilePress
ito enter edit mode and set the following values:################ ## CHANGE NAME OF ANSI COMPILER HERE ################ CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCHPress
Esc, type:wq, and pressEnterto save and exit.Edit the
tpcd.hfile and add MySQL macro definitions.vim tpcd.hPress
ito enter edit mode, locate thedatabase portability definessection, and add the following:#ifdef MYSQL #define GEN_QUERY_PLAN "EXPLAIN PLAN" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endifPress
Esc, type:wq, and pressEnterto save and exit.Compile the source.
makeAfter compilation, two executables are generated in the directory:
dbgen: generates test dataqgen: generates SQL statements (not used in this test; the 22 standard TPC-H queries are used instead)
Generate the test data with a Scale Factor of 100 (approximately 100 GB).
The
-sparameter sets the Scale Factor, which controls dataset size.-s 100generates approximately 100 GB of data. The exact size may vary slightly based on table structure and data distribution../dbgen -s 100(Optional) Generate queries using
qgen.Copy
qgenanddists.dssinto thequeriesdirectory.cp qgen queries cp dists.dss queriesCreate the
generate_queries.shscript.vim generate_queries.shPress
iand add:#!/usr/bin/bash # Go to the query directory. cd queries # Generate 22 queries. for i in {1..22} do ./qgen -d $i -s 100 > db"$i".sql donePress
Esc, type:wq, and pressEnter.Make the script executable and run it.
chmod +x generate_queries.sh ./generate_queries.sh
Skip this step to keep results reproducible. Download the 22 standard TPC-H SQL statements here and use them directly.
Run the test
Connect to the PolarDB for MySQL cluster from the ECS instance, initialize the schema, load data, and run the 22 TPC-H queries.
Set up the schema and load data
Create the
load.ddlfile in thedbgendirectory to define the data load sequence.vim load.ddlPress
iand add:load data local INFILE 'customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|'; load data local INFILE 'region.tbl' INTO TABLE region FIELDS TERMINATED BY '|'; load data local INFILE 'nation.tbl' INTO TABLE nation FIELDS TERMINATED BY '|'; load data local INFILE 'supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|'; load data local INFILE 'part.tbl' INTO TABLE part FIELDS TERMINATED BY '|'; load data local INFILE 'partsupp.tbl' INTO TABLE partsupp FIELDS TERMINATED BY '|'; load data local INFILE 'orders.tbl' INTO TABLE orders FIELDS TERMINATED BY '|'; load data local INFILE 'lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY '|';Press
Esc, type:wq, and pressEnter.Modify
dss.rito define primary and foreign keys. Back up the original file and clear it:cp dss.ri dss_bk.ri > dss.riOpen
dss.riand replace its content:vim dss.riPress
iand add:use tpch100g; -- ALTER TABLE REGION DROP PRIMARY KEY; -- ALTER TABLE NATION DROP PRIMARY KEY; -- ALTER TABLE PART DROP PRIMARY KEY; -- ALTER TABLE SUPPLIER DROP PRIMARY KEY; -- ALTER TABLE PARTSUPP DROP PRIMARY KEY; -- ALTER TABLE ORDERS DROP PRIMARY KEY; -- ALTER TABLE LINEITEM DROP PRIMARY KEY; -- ALTER TABLE CUSTOMER DROP PRIMARY KEY; -- For table REGION ALTER TABLE REGION ADD PRIMARY KEY (R_REGIONKEY); -- For table NATION ALTER TABLE NATION ADD PRIMARY KEY (N_NATIONKEY); ALTER TABLE NATION ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references REGION(R_REGIONKEY); COMMIT WORK; -- For table PART ALTER TABLE PART ADD PRIMARY KEY (P_PARTKEY); COMMIT WORK; -- For table SUPPLIER ALTER TABLE SUPPLIER ADD PRIMARY KEY (S_SUPPKEY); ALTER TABLE SUPPLIER ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references NATION(N_NATIONKEY); COMMIT WORK; -- For table PARTSUPP ALTER TABLE PARTSUPP ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY); COMMIT WORK; -- For table CUSTOMER ALTER TABLE CUSTOMER ADD PRIMARY KEY (C_CUSTKEY); ALTER TABLE CUSTOMER ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references NATION(N_NATIONKEY); COMMIT WORK; -- For table LINEITEM ALTER TABLE LINEITEM ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER); COMMIT WORK; -- For table ORDERS ALTER TABLE ORDERS ADD PRIMARY KEY (O_ORDERKEY); COMMIT WORK; -- For table PARTSUPP ALTER TABLE PARTSUPP ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references SUPPLIER(S_SUPPKEY); COMMIT WORK; ALTER TABLE PARTSUPP ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references PART(P_PARTKEY); COMMIT WORK; -- For table ORDERS ALTER TABLE ORDERS ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references CUSTOMER(C_CUSTKEY); COMMIT WORK; -- For table LINEITEM ALTER TABLE LINEITEM ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY) references ORDERS(O_ORDERKEY); COMMIT WORK; ALTER TABLE LINEITEM ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references PARTSUPP(PS_PARTKEY,PS_SUPPKEY); COMMIT WORK;Press
Esc, type:wq, and pressEnter.Install the MySQL client.
This example uses CentOS. Adjust the command for your operating system.
sudo yum install mysqlConnect to the PolarDB for MySQL cluster using the privileged account and the primary endpoint. For more information, see Create a privileged account and Connect to a database.
If you see
Access denied for user 'xxx'@'xxx' (using password: YES), the username or password is incorrect.mysql -h <primary-endpoint> -P <port> -u <username> -p <password>Create the
tpch100gdatabase and switch to it.CREATE DATABASE tpch100g; use tpch100g;Create the tables using the
dss.ddlschema file in thedbgendirectory.source ./dss.ddlLoad the data using the
load.ddlfile created in step 1.source ./load.ddlApply primary and foreign keys using the
dss.rifile modified in step 2.source ./dss.ri
Create indexes (optional)
Indexes improve query performance for some of the 22 TPC-H queries. Run exit; to leave the MySQL client, then create and run the following script on the ECS instance.
#!/usr/bin/bash
host=$1
port=$2
user=$3
password=$4
db=$5
sqls=("create index i_s_nationkey on supplier (s_nationkey);"
"create index i_ps_partkey on partsupp (ps_partkey);"
"create index i_ps_suppkey on partsupp (ps_suppkey);"
"create index i_c_nationkey on customer (c_nationkey);"
"create index i_o_custkey on orders (o_custkey);"
"create index i_o_orderdate on orders (o_orderdate);"
"create index i_l_orderkey on lineitem (l_orderkey);"
"create index i_l_partkey on lineitem (l_partkey);"
"create index i_l_suppkey on lineitem (l_suppkey);"
"create index i_l_partkey_suppkey on lineitem (l_partkey, l_suppkey);"
"create index i_l_shipdate on lineitem (l_shipdate);"
"create index i_l_commitdate on lineitem (l_commitdate);"
"create index i_l_receiptdate on lineitem (l_receiptdate);"
"create index i_n_regionkey on nation (n_regionkey);"
"analyze table supplier"
"analyze table part"
"analyze table partsupp"
"analyze table customer"
"analyze table orders"
"analyze table lineitem"
"analyze table nation"
"analyze table region")
for sql in "${sqls[@]}"
do
mysql -h$host -P$port -u$user -p$password -D$db -e "$sql"
doneSave the script as create_indexes.sh, make it executable, and run it:
chmod +x create_indexes.sh
./create_indexes.sh <host> <port> <user> <password> <database>Replace the placeholders with your cluster details:
| Placeholder | Description |
|---|---|
<host> | Primary endpoint of the cluster |
<port> | Port number |
<user> | Database username |
<password> | Database password |
<database> | Database name |
If you see[Warning] Using a password on the command line interface can be insecure, this is expected. The command passes the password as a command-line argument, which other users on the system can view by runningps. The warning does not affect the operation. After the test, change the database password in the PolarDB console.
Preload indexes into memory
To measure the full benefit of parallel query, preload all indexes into the buffer pool before running the queries.
Parallel query bypasses the buffer pool during execution, so consecutive query runs do not benefit from cached data — each run incurs the full I/O cost. This means you may see elevated read I/O during the test; that is expected behavior, not a problem. Preloading the indexes establishes a consistent starting state so that all 22 queries begin from the same I/O baseline.
Run exit; to leave the MySQL client, then create and run the following script:
#!/bin/bash
host=$1
port=$2
user=$3
password=$4
dbname=$5
MYSQL="mysql -h$host -P$port -u$user -p$password -D$dbname"
if [ -z ${dbname} ]; then
echo "dbname not defined."
exit 1
fi
table_indexes=(
"supplier PRIMARY"
"supplier i_s_nationkey"
"part PRIMARY"
"partsupp PRIMARY"
"partsupp i_ps_partkey"
"partsupp i_ps_suppkey"
"customer PRIMARY"
"customer i_c_nationkey"
"orders PRIMARY"
"orders i_o_custkey"
"orders i_o_orderdate"
"lineitem PRIMARY"
"lineitem i_l_orderkey"
"lineitem i_l_partkey"
"lineitem i_l_suppkey"
"lineitem i_l_partkey_suppkey"
"lineitem i_l_shipdate"
"lineitem i_l_commitdate"
"lineitem i_l_receiptdate"
"nation i_n_regionkey"
"nation PRIMARY"
"region PRIMARY"
)
for table_index in "${table_indexes[@]}"
do
ti=($table_index)
table=${ti[0]}
index=${ti[1]}
SQL="select count(*) from ${table} force index(${index})"
echo "$MYSQL -e '$SQL'"
$MYSQL -e "$SQL"
doneRun the 22 TPC-H queries
Run exit; to leave the MySQL client, then create and run the following script from the dbgen directory:
#!/usr/bin/env bash
host=$1
port=$2
user=$3
password=$4
database=$5
resfile=$6
echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out
for (( i=1; i<=22;i=i+1 ))
do
queryfile="./queries/Q"${i}".sql"
start_time=`date "+%s.%N"`
echo "run query ${i}"|tee -a ${resfile}.out
mysql -h ${host} -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out
end_time=`date "+%s.%N"`
start_s=${start_time%.*}
start_nanos=${start_time#*.}
end_s=${end_time%.*}
end_nanos=${end_time#*.}
if [ "$end_nanos" -lt "$start_nanos" ];then
end_s=$(( 10#$end_s -1 ))
end_nanos=$(( 10#$end_nanos + 10 ** 9))
fi
time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))`
echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time
doneSave the script as run_queries.sh, make it executable, and run it:
chmod +x run_queries.sh
./run_queries.sh <host> <port> <user> <password> <database> <resfile>Replace the placeholders with your cluster details:
| Placeholder | Description |
|---|---|
<host> | Primary endpoint of the cluster |
<port> | Port number |
<user> | Database username |
<password> | Database password |
<database> | Database name |
<resfile> | Name of the output results file |
Before running the script, confirm the following:
The script has execute permissions (
chmod +x run_queries.sh).The current directory is
dbgen. The script references./queries/Q${i}.sql. If you run it from a different directory, update thequeryfilepath accordingly. An incorrect path causes the errorERROR at line 1: Failed to open file 'QXX.sql'.The 22 TPC-H query files are present in the
dbgen/queriesdirectory. Download them here if you have not already.
If you see [Warning] Using a password on the command line interface can be insecure, this is expected. See the note in the index creation section above.Results
For the full benchmark results, see Parallel query performance.