Test method for In-Memory Column Index - PolarDB - Alibaba Cloud Documentation Center

This topic describes how to use TPC-H benchmarking to test the performance of the In-Memory Column Index (IMCI). You can follow these instructions to run your own tests and quickly learn how to use the IMCI feature.

Precautions

The TPC-H implementation in this topic is based on TPC-H benchmarking. The results cannot be compared with official, published TPC-H benchmark results because these tests do not meet all TPC-H requirements.

Test environment

The ECS instance and PolarDB cluster for the test must be in the same region and zone.
Note
The performance test for the In-Memory Column Index (IMCI) is currently available in the China (Chengdu) region.
The network type is set to VPC.
Note
The ECS instance and the PolarDB cluster must be in the same VPC.
The PolarDB cluster used for the test has the following configuration:
- Node specifications: polar.mysql.x8.4xlarge (32-core 256 GB Dedicated) and polar.mysql.x4.xlarge (8-core 32 GB Dedicated).
- Version: MySQL 8.0.1 and MySQL 8.0.2.
- Edition: Cluster Exclusive.
- Number of nodes: 3 to 6 (one primary node, one read-only node, and 1 to 4 read-only column store nodes).
The ECS instance used for the test has the following configuration:
- Instance type: ecs.c5.4xlarge.
- An on-demand instance with a 200 GB, 2 TB, or 4 TB disk attached. These disks are used to test performance in 100 GB, 1 TB, and 3 TB environments, respectively.
- Image: CentOS 7.0 64-bit.

Test tool

TPC-H is an industry-standard benchmark developed and published by the Transaction Processing Performance Council (TPC). It is used to evaluate the analytical query capabilities of databases. TPC-H includes 8 data tables and 22 complex SQL queries. Most of these queries contain table joins, subqueries, and Group By aggregations.

Install the TPC-H tool

git clone https://github.com/yubinr/benchtpch.git
cd benchtpch/dbgen-src
make

After the command runs, two executable files are generated in the benchtpch/dbgen-src folder:

dbgen: A data generation tool. This tool is required to generate TPC-H table data when you use the official InfiniDB test script.
qgen: An SQL generation tool that generates initial test queries. To ensure repeatable results, use the 22 queries provided in the attachment.zip file because different seeds generate different queries.

Test steps

Connect to the PolarDB database from the ECS instance. For more information, see Connect to a database cluster.

Create a database and import data.

cd benchtpch/tpch
# Run data_kit.sh. The following command uses tpch100g as an example. (mysql uses /usr/bin/mysql)
sh ./data_kit.sh -g /root/benchtpch/dbgen-src/ -c <CHUNKS> -d <DATADIR> -s <SCALE_FACTOR> -b /usr/ -H <HOST>  -P <PORT> -u <NAME> -p <PASSWORD> --database <DATABASE> --ddl columnar_p.ddl --parallel 40

The following describes the command-line parameters:

Parameter name	Description
<CHUNKS>	The number of chunks to divide the generated data into. For 100 GB of data, set the number of chunks to 400.
<DATADIR>	The data catalog for the generated TPC-H data.
<SCALE_FACTOR>	Dbgen specifications
<HOST>	The primary endpoint of the PolarDB cluster.
<PORT>	The port number.
<NAME>	The database account.
<PASSWORD>	The password for the database account.
<DATABASE>	The database to import data into.

Run the ANALYZE TABLE operation on all used tables to obtain more accurate statistics information.

ANALYZE TABLE supplier;
ANALYZE TABLE part;
ANALYZE TABLE partsupp;
ANALYZE TABLE customer;
ANALYZE TABLE orders;
ANALYZE TABLE lineitem;
ANALYZE TABLE nation;
ANALYZE TABLE region;

Run the following script.

#!/usr/bin/env bash
host=$1
port=$2
user=$3
password=$4
database=$5
resfile=$6
echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out
for (( i=1; i<=22;i=i+1 ))
do
queryfile="Q"${i}".sql"
# pre-run
mysql -h ${host}  -P${port} -u${user} -p${password} $database -e" source $queryfile;" > /dev/null 2>&1
start_time=`date "+%s.%N"`
echo "run query ${i}"|tee -a ${resfile}.out
mysql -h ${host}  -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out
end_time=`date "+%s.%N"`
start_s=${start_time%.*}
start_nanos=${start_time#*.}
end_s=${end_time%.*}
end_nanos=${end_time#*.}
if [ "$end_nanos" -lt "$start_nanos" ];then
        end_s=$(( 10#$end_s -1 ))
        end_nanos=$(( 10#$end_nanos + 10 ** 9))
fi
time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))`
echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time
done

Test results

For the IMCI performance test results, see In-Memory Column Index (IMCI) performance.