This topic describes how to use TPC-H benchmarking to test the performance of the In-Memory Column Index (IMCI). You can follow these instructions to run your own tests and quickly learn how to use the IMCI feature.
Precautions
The TPC-H implementation in this topic is based on TPC-H benchmarking. The results cannot be compared with official, published TPC-H benchmark results because these tests do not meet all TPC-H requirements.
Test environment
The ECS instance and PolarDB cluster for the test must be in the same region and zone.
NoteThe performance test for the In-Memory Column Index (IMCI) is currently available in the China (Chengdu) region.
The network type is set to VPC.
NoteThe ECS instance and the PolarDB cluster must be in the same VPC.
The PolarDB cluster used for the test has the following configuration:
Node specifications: polar.mysql.x8.4xlarge (32-core 256 GB Dedicated) and polar.mysql.x4.xlarge (8-core 32 GB Dedicated).
Version: MySQL 8.0.1 and MySQL 8.0.2.
Edition: Cluster Exclusive.
Number of nodes: 3 to 6 (one primary node, one read-only node, and 1 to 4 read-only column store nodes).
The ECS instance used for the test has the following configuration:
Instance type: ecs.c5.4xlarge.
An on-demand instance with a 200 GB, 2 TB, or 4 TB disk attached. These disks are used to test performance in 100 GB, 1 TB, and 3 TB environments, respectively.
Image: CentOS 7.0 64-bit.
Test tool
TPC-H is an industry-standard benchmark developed and published by the Transaction Processing Performance Council (TPC). It is used to evaluate the analytical query capabilities of databases. TPC-H includes 8 data tables and 22 complex SQL queries. Most of these queries contain table joins, subqueries, and Group By aggregations.
Install the TPC-H tool
git clone https://github.com/yubinr/benchtpch.git
cd benchtpch/dbgen-src
makeAfter the command runs, two executable files are generated in the benchtpch/dbgen-src folder:
dbgen: A data generation tool. This tool is required to generate TPC-H table data when you use the official InfiniDB test script.
qgen: An SQL generation tool that generates initial test queries. To ensure repeatable results, use the 22 queries provided in the attachment.zip file because different seeds generate different queries.
Test steps
Connect to the PolarDB database from the ECS instance. For more information, see Connect to a database cluster.
Create a database and import data.
cd benchtpch/tpch # Run data_kit.sh. The following command uses tpch100g as an example. (mysql uses /usr/bin/mysql) sh ./data_kit.sh -g /root/benchtpch/dbgen-src/ -c <CHUNKS> -d <DATADIR> -s <SCALE_FACTOR> -b /usr/ -H <HOST> -P <PORT> -u <NAME> -p <PASSWORD> --database <DATABASE> --ddl columnar_p.ddl --parallel 40The following describes the command-line parameters:
Parameter name
Description
<CHUNKS>
The number of chunks to divide the generated data into. For 100 GB of data, set the number of chunks to 400.
<DATADIR>
The data catalog for the generated TPC-H data.
<SCALE_FACTOR>
Dbgen specifications
<HOST>
The primary endpoint of the PolarDB cluster.
<PORT>
The port number.
<NAME>
The database account.
<PASSWORD>
The password for the database account.
<DATABASE>
The database to import data into.
Run the
ANALYZE TABLEoperation on all used tables to obtain more accurate statistics information.ANALYZE TABLE supplier; ANALYZE TABLE part; ANALYZE TABLE partsupp; ANALYZE TABLE customer; ANALYZE TABLE orders; ANALYZE TABLE lineitem; ANALYZE TABLE nation; ANALYZE TABLE region;Run the following script.
#!/usr/bin/env bash host=$1 port=$2 user=$3 password=$4 database=$5 resfile=$6 echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out for (( i=1; i<=22;i=i+1 )) do queryfile="Q"${i}".sql" # pre-run mysql -h ${host} -P${port} -u${user} -p${password} $database -e" source $queryfile;" > /dev/null 2>&1 start_time=`date "+%s.%N"` echo "run query ${i}"|tee -a ${resfile}.out mysql -h ${host} -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out end_time=`date "+%s.%N"` start_s=${start_time%.*} start_nanos=${start_time#*.} end_s=${end_time%.*} end_nanos=${end_time#*.} if [ "$end_nanos" -lt "$start_nanos" ];then end_s=$(( 10#$end_s -1 )) end_nanos=$(( 10#$end_nanos + 10 ** 9)) fi time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))` echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time done
Test results
For the IMCI performance test results, see In-Memory Column Index (IMCI) performance.