Benchmark PolarDB MySQL OLAP Performance with TPC-H - PolarDB

This topic describes how to run a TPC-H benchmark against a PolarDB for MySQL 8.0 Cluster Edition cluster to measure OLAP parallel query performance.

TPC-H stresses the query patterns that parallel query is designed to accelerate: multi-table JOINs, subqueries, and GROUP BY aggregations across large datasets. Running these 22 queries with and without parallel query enabled gives you a reproducible baseline for evaluating the performance gains in your own environment.

Test environment

The ECS instance and the PolarDB for MySQL cluster are deployed in the same region, zone, and VPC.

PolarDB for MySQL cluster

Parameter	Value
Number of clusters	1
Database engine	MySQL 8.0.1 or MySQL 8.0.2
Product edition	Enterprise Edition
Edition	Cluster Edition
Specification type	Dedicated
Node specifications	polar.mysql.x8.4xlarge (32 cores, 256 GB)
Number of nodes	2 (1 primary node, 1 read-only node)

The test uses the primary endpoint to connect to the cluster. For more information, see Manage the endpoints of a cluster.

ECS instance

Parameter	Value
Instance type	ecs.c5.4xlarge (16 vCPU, 32 GiB)
Image	CentOS 7.0 64-bit
System disk	ESSD, 1000 GB

Test tool

TPC-H is a benchmark developed by the Transaction Processing Performance Council (TPC) to evaluate database analytical query performance. It includes 8 tables and 22 complex SQL statements. Most queries combine multi-table JOINs, subqueries, and GROUP BY clauses — the workload patterns that benefit most from parallel execution.

This test is based on the TPC-H benchmark but does not satisfy all TPC-H requirements. The results cannot be compared with published TPC-H benchmark results.

Install TPC-H tools

Important

Log in to the ECS instance as the root user before running the following commands.
This topic uses TPC-H_Tools_v2.18.0. Registration is required before downloading.

Register and download the TPC-H package, then upload it to the ECS instance. For more information, see Upload files.
Decompress the package. Replace TPC-H-Tool.zip with the actual filename.
```
unzip TPC-H-Tool.zip
```
Go to the dbgen directory. Replace TPC-H-Tool/dbgen with the actual path.
```
cd TPC-H-Tool/dbgen
```
Copy the makefile.suite file to makefile.
```
cp makefile.suite makefile
```
Install GCC.
This example uses CentOS. On Ubuntu, run sudo apt install gcc instead.
```
sudo yum install gcc
```

Edit the makefile file and set the CC, DATABASE, MACHINE, and WORKLOAD parameters.

vim makefile

Press i to enter edit mode and set the following values:

################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= MYSQL
MACHINE = LINUX
WORKLOAD = TPCH

Press Esc, type :wq, and press Enter to save and exit.

Edit the tpcd.h file and add MySQL macro definitions.

vim tpcd.h

Press i to enter edit mode, locate the database portability defines section, and add the following:

#ifdef MYSQL
#define GEN_QUERY_PLAN "EXPLAIN PLAN"
#define START_TRAN "START TRANSACTION"
#define END_TRAN "COMMIT"
#define SET_OUTPUT ""
#define SET_ROWCOUNT "limit %d;\n"
#define SET_DBASE "use %s;\n"
#endif

Press Esc, type :wq, and press Enter to save and exit.

Compile the source.
```
make
```
After compilation, two executables are generated in the directory:
- dbgen: generates test data
- qgen: generates SQL statements (not used in this test; the 22 standard TPC-H queries are used instead)
Generate the test data with a Scale Factor of 100 (approximately 100 GB).
The -s parameter sets the Scale Factor, which controls dataset size. -s 100 generates approximately 100 GB of data. The exact size may vary slightly based on table structure and data distribution.
```
./dbgen -s 100
```
(Optional) Generate queries using qgen.
1. Copy qgen and dists.dss into the queries directory.
```
cp qgen queries
cp dists.dss queries
```
2. Create the generate_queries.sh script.
```
vim generate_queries.sh
```
  Press i and add:
```
#!/usr/bin/bash

# Go to the query directory.
cd queries

# Generate 22 queries.
for i in {1..22}
do
  ./qgen -d $i -s 100 > db"$i".sql
done
```
  Press Esc, type :wq, and press Enter.
3. Make the script executable and run it.
```
chmod +x generate_queries.sh
./generate_queries.sh
```
Skip this step to keep results reproducible. Download the 22 standard TPC-H SQL statements here and use them directly.

Run the test

Connect to the PolarDB for MySQL cluster from the ECS instance, initialize the schema, load data, and run the 22 TPC-H queries.

Set up the schema and load data

Create the load.ddl file in the dbgen directory to define the data load sequence.

vim load.ddl

Press i and add:

load data local INFILE 'customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|';
load data local INFILE 'region.tbl' INTO TABLE region FIELDS TERMINATED BY '|';
load data local INFILE 'nation.tbl' INTO TABLE nation FIELDS TERMINATED BY '|';
load data local INFILE 'supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|';
load data local INFILE 'part.tbl' INTO TABLE part FIELDS TERMINATED BY '|';
load data local INFILE 'partsupp.tbl' INTO TABLE partsupp FIELDS TERMINATED BY '|';
load data local INFILE 'orders.tbl' INTO TABLE orders FIELDS TERMINATED BY '|';
load data local INFILE 'lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY '|';

Press Esc, type :wq, and press Enter.

Modify dss.ri to define primary and foreign keys. Back up the original file and clear it:

cp dss.ri dss_bk.ri
> dss.ri

Open dss.ri and replace its content:

vim dss.ri

Press i and add:

use tpch100g;
-- ALTER TABLE REGION DROP PRIMARY KEY;
-- ALTER TABLE NATION DROP PRIMARY KEY;
-- ALTER TABLE PART DROP PRIMARY KEY;
-- ALTER TABLE SUPPLIER DROP PRIMARY KEY;
-- ALTER TABLE PARTSUPP DROP PRIMARY KEY;
-- ALTER TABLE ORDERS DROP PRIMARY KEY;
-- ALTER TABLE LINEITEM DROP PRIMARY KEY;
-- ALTER TABLE CUSTOMER DROP PRIMARY KEY;
-- For table REGION
ALTER TABLE REGION
ADD PRIMARY KEY (R_REGIONKEY);
-- For table NATION
ALTER TABLE NATION
ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE NATION
ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references REGION(R_REGIONKEY);
COMMIT WORK;
-- For table PART
ALTER TABLE PART
ADD PRIMARY KEY (P_PARTKEY);
COMMIT WORK;
-- For table SUPPLIER
ALTER TABLE SUPPLIER
ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE SUPPLIER
ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references NATION(N_NATIONKEY);
COMMIT WORK;
-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;
-- For table CUSTOMER
ALTER TABLE CUSTOMER
ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE CUSTOMER
ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references NATION(N_NATIONKEY);
COMMIT WORK;
-- For table LINEITEM
ALTER TABLE LINEITEM
ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);
COMMIT WORK;
-- For table ORDERS
ALTER TABLE ORDERS
ADD PRIMARY KEY (O_ORDERKEY);
COMMIT WORK;
-- For table PARTSUPP
ALTER TABLE PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references SUPPLIER(S_SUPPKEY);
COMMIT WORK;
ALTER TABLE PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references PART(P_PARTKEY);
COMMIT WORK;
-- For table ORDERS
ALTER TABLE ORDERS
ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references CUSTOMER(C_CUSTKEY);
COMMIT WORK;
-- For table LINEITEM
ALTER TABLE LINEITEM
ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references ORDERS(O_ORDERKEY);
COMMIT WORK;
ALTER TABLE LINEITEM
ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references
        PARTSUPP(PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;

Press Esc, type :wq, and press Enter.

Install the MySQL client.
This example uses CentOS. Adjust the command for your operating system.
```
sudo yum install mysql
```
Connect to the PolarDB for MySQL cluster using the privileged account and the primary endpoint. For more information, see Create a privileged account and Connect to a database.
If you see Access denied for user 'xxx'@'xxx' (using password: YES), the username or password is incorrect.
```
mysql -h <primary-endpoint> -P <port> -u <username> -p <password>
```
Create the tpch100g database and switch to it.
```
CREATE DATABASE tpch100g;
use tpch100g;
```
Create the tables using the dss.ddl schema file in the dbgen directory.
```
source ./dss.ddl
```
Load the data using the load.ddl file created in step 1.
```
source ./load.ddl
```
Apply primary and foreign keys using the dss.ri file modified in step 2.
```
source ./dss.ri
```

Create indexes (optional)

Indexes improve query performance for some of the 22 TPC-H queries. Run exit; to leave the MySQL client, then create and run the following script on the ECS instance.

#!/usr/bin/bash
host=$1
port=$2
user=$3
password=$4
db=$5
sqls=("create index i_s_nationkey on supplier (s_nationkey);"
"create index i_ps_partkey on partsupp (ps_partkey);"
"create index i_ps_suppkey on partsupp (ps_suppkey);"
"create index i_c_nationkey on customer (c_nationkey);"
"create index i_o_custkey on orders (o_custkey);"
"create index i_o_orderdate on orders (o_orderdate);"
"create index i_l_orderkey on lineitem (l_orderkey);"
"create index i_l_partkey on lineitem (l_partkey);"
"create index i_l_suppkey on lineitem (l_suppkey);"
"create index i_l_partkey_suppkey on lineitem (l_partkey, l_suppkey);"
"create index i_l_shipdate on lineitem (l_shipdate);"
"create index i_l_commitdate on lineitem (l_commitdate);"
"create index i_l_receiptdate on lineitem (l_receiptdate);"
"create index i_n_regionkey on nation (n_regionkey);"
"analyze table supplier"
"analyze table part"
"analyze table partsupp"
"analyze table customer"
"analyze table orders"
"analyze table lineitem"
"analyze table nation"
"analyze table region")
for sql in "${sqls[@]}"
do
    mysql -h$host -P$port -u$user -p$password -D$db  -e "$sql"
done

Save the script as create_indexes.sh, make it executable, and run it:

chmod +x create_indexes.sh
./create_indexes.sh <host> <port> <user> <password> <database>

Replace the placeholders with your cluster details:

Placeholder	Description
`<host>`	Primary endpoint of the cluster
`<port>`	Port number
`<user>`	Database username
`<password>`	Database password
`<database>`	Database name

If you see [Warning] Using a password on the command line interface can be insecure, this is expected. The command passes the password as a command-line argument, which other users on the system can view by running ps. The warning does not affect the operation. After the test, change the database password in the PolarDB console.

Preload indexes into memory

To measure the full benefit of parallel query, preload all indexes into the buffer pool before running the queries.

Parallel query bypasses the buffer pool during execution, so consecutive query runs do not benefit from cached data — each run incurs the full I/O cost. This means you may see elevated read I/O during the test; that is expected behavior, not a problem. Preloading the indexes establishes a consistent starting state so that all 22 queries begin from the same I/O baseline.

Run exit; to leave the MySQL client, then create and run the following script:

#!/bin/bash
host=$1
port=$2
user=$3
password=$4
dbname=$5
MYSQL="mysql -h$host -P$port -u$user -p$password -D$dbname"
if [ -z ${dbname} ]; then
    echo "dbname not defined."
    exit 1
fi
table_indexes=(
        "supplier PRIMARY"
        "supplier i_s_nationkey"
        "part PRIMARY"
        "partsupp PRIMARY"
        "partsupp i_ps_partkey"
        "partsupp i_ps_suppkey"
        "customer PRIMARY"
        "customer i_c_nationkey"
        "orders PRIMARY"
        "orders i_o_custkey"
        "orders i_o_orderdate"
        "lineitem PRIMARY"
        "lineitem i_l_orderkey"
        "lineitem i_l_partkey"
        "lineitem i_l_suppkey"
        "lineitem i_l_partkey_suppkey"
        "lineitem i_l_shipdate"
        "lineitem i_l_commitdate"
        "lineitem i_l_receiptdate"
        "nation i_n_regionkey"
        "nation PRIMARY"
        "region PRIMARY"
)
for table_index in "${table_indexes[@]}"
do
    ti=($table_index)
    table=${ti[0]}
    index=${ti[1]}
    SQL="select count(*) from ${table} force index(${index})"
    echo "$MYSQL -e '$SQL'"
    $MYSQL -e "$SQL"
done

Run the 22 TPC-H queries

Run exit; to leave the MySQL client, then create and run the following script from the dbgen directory:

#!/usr/bin/env bash
host=$1
port=$2
user=$3
password=$4
database=$5
resfile=$6
echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out
for (( i=1; i<=22;i=i+1 ))
do
queryfile="./queries/Q"${i}".sql"
start_time=`date "+%s.%N"`
echo "run query ${i}"|tee -a ${resfile}.out
mysql -h ${host}  -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out
end_time=`date "+%s.%N"`
start_s=${start_time%.*}
start_nanos=${start_time#*.}
end_s=${end_time%.*}
end_nanos=${end_time#*.}
if [ "$end_nanos" -lt "$start_nanos" ];then
        end_s=$(( 10#$end_s -1 ))
        end_nanos=$(( 10#$end_nanos + 10 ** 9))
fi
time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))`
echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time
done

Save the script as run_queries.sh, make it executable, and run it:

chmod +x run_queries.sh
./run_queries.sh <host> <port> <user> <password> <database> <resfile>

Replace the placeholders with your cluster details:

Placeholder	Description
`<host>`	Primary endpoint of the cluster
`<port>`	Port number
`<user>`	Database username
`<password>`	Database password
`<database>`	Database name
`<resfile>`	Name of the output results file

Before running the script, confirm the following:

The script has execute permissions (chmod +x run_queries.sh).
The current directory is dbgen. The script references ./queries/Q${i}.sql. If you run it from a different directory, update the queryfile path accordingly. An incorrect path causes the error ERROR at line 1: Failed to open file 'QXX.sql'.
The 22 TPC-H query files are present in the dbgen/queries directory. Download them here if you have not already.

If you see [Warning] Using a password on the command line interface can be insecure, this is expected. See the note in the index creation section above.

Results

For the full benchmark results, see Parallel query performance.