This topic describes how to use the TPC Benchmark-H (TPC-H) to test the performance of In-Memory Column Indexes (IMCIs). To evaluate the performance of the IMCIs, you can test and compare the capabilities of databases by following the instructions described in this topic.
Usage notes
In this example, a test based on the TPC-H benchmark is implemented, but it does not meet all the requirements of the TPC-H benchmark test. Therefore, the test results may not match the published results of the TPC-H benchmark test.
Test environment
The Elastic Compute Service (ECS) instance and PolarDB cluster that are used in the test are deployed in the same zone of the same region.
The network type is Virtual Private Cloud (VPC).
NoteMake sure that the ECS instance and PolarDB cluster are in the same VPC.
The following list describes the information of the PolarDB cluster that is used in the test:
The specifications are the following types: 32 cores and 256 GB of memory (polar.mysql.x8.4xlarge) and 8 cores and 32 GB of memory (polar.mysql.x4.xlarge).
The version is MySQL 8.0.1 or MySQL 8.0.2.
The edition is Cluster and the compute node type is Dedicated.
The cluster consists of a primary node, a read-only node, and up to four read-only column store nodes.
The following list describes the information of the ECS instance that is used in the test:
The instance type is ecs.c5.4xlarge.
Cloud disks that are 200 GB, 2,000 GB, and 4,000 GB in size are attached to the instance. These disks are used to test the IMCI performance in test environments in which 100 GB, 1 TB and 3 TB of data is generated.
The image of the instance is 64-bit CentOS 7.0.
Test tool
TPC-H is a commonly used benchmark that is developed and released by the Transaction Processing Performance Council (TPC) to evaluate the analytical query capabilities of databases. The TPC-H benchmark contains eight tables and 22 complex SQL statements. Most of the queries contain JOIN clauses on several tables, subqueries, and GROUP BY clauses.
Install TPC-H tools
git clone https://github.com/yubinr/benchtpch.git
cd benchtpch/dbgen-src
makeAfter the command is run, two executable files are generated in the benchtpch/dbgen-src directory:
dbgen: the tool that is used to generate data. If you use the InfiniDB test script, you must use this tool to generate data for TPC-H.
qgen: a SQL generation tool that generates the initial test queries. Different seeds produce different queries. To ensure repeatable results, use the 22 queries provided in attachment.zip.
Test procedure
Connect to the PolarDB cluster from the ECS instance. For more information, see Connect to a cluster.
Create a database and import data.
cd benchtpch/tpch # Execute the data_kit.sh script. This example uses tpch100g. The path of MySQL is /usr/bin/mysql. sh ./data_kit.sh -g /root/benchtpch/dbgen-src/ -c <CHUNKS> -d <DATADIR> -s <SCALE_FACTOR> -b /usr/ -H <HOST> -P <PORT> -u <NAME> -p <PASSWORD> --database <DATABASE> --ddl columnar_p.ddl --parallel 40The following table describes parameters that are included in the preceding command line.
Parameter
Description
<CHUNKS>
The number of blocks that store generated data. If 100 GB of data is imported, we recommend that you set the number of blocks to 400.
<DATADIR>
The directory that stores the generated TPC-H data.
<SCALE_FACTOR>
The specification of the dbgen tool.
<HOST>
The primary endpoint of the PolarDB cluster.
<PORT>
The port number.
<NAME>
The database account.
<PASSWORD>
The password of the database account.
<DATABASE>
The database to which data is imported.
Run the
ANALYZE TABLEcommand on all tables used in the test to obtain accurate statistics.ANALYZE TABLE supplier; ANALYZE TABLE part; ANALYZE TABLE partsupp; ANALYZE TABLE customer; ANALYZE TABLE orders; ANALYZE TABLE lineitem; ANALYZE TABLE nation; ANALYZE TABLE region;Execute the following query statements:
#!/usr/bin/env bash host=$1 port=$2 user=$3 password=$4 database=$5 resfile=$6 echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out for (( i=1; i<=22;i=i+1 )) do queryfile="Q"${i}".sql" # pre run mysql -h ${host} -P${port} -u${user} -p${password} $database -e" source $queryfile;" > /dev/null 2>&1 start_time=`date "+%s.%N"` echo "run query ${i}"|tee -a ${resfile}.out mysql -h ${host} -P${port} -u${user} -p${password} $database -e" source $queryfile;" |tee -a ${resfile}.out end_time=`date "+%s.%N"` start_s=${start_time%.*} start_nanos=${start_time#*.} end_s=${end_time%.*} end_nanos=${end_time#*.} if [ "$end_nanos" -lt "$start_nanos" ];then end_s=$(( 10#$end_s -1 )) end_nanos=$(( 10#$end_nanos + 10 ** 9)) fi time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))` echo ${queryfile} "the "${j}" run cost "${time}" second start at"`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at"`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time done
Test results
For more information about IMCI performance test results, see IMCI performance.