×
Community Blog PolarDB Hands-on | Running TPC-H Benchmark on PolarDB IMCI

PolarDB Hands-on | Running TPC-H Benchmark on PolarDB IMCI

In this article, we will show you the full steps of generating TPC-H data and running TPC-H 22 queries on PolarDB IMCI.

Overview

PolarDB MySQL supports the In-Memory Column Index (IMCI), which can increase the speed of complex analytical queries by several orders of magnitude while maintaining high performance in real-time transaction processing. IMCI's key features include hybrid row-column storage for real-time transactions and analysis, parallel SQL operators that support vectorized execution, and 100% MySQL syntax compatibility. As shown in the figure below, PolarDB eliminates the complex ETL process by creating read-only nodes with IMCI. IMCI makes PolarDB become a cloud-native HTAP database. Users can easily get real-time data analysis and transaction processing capabilities a single PolarDB cluster. With the cluster endpoint, PolarDB will automatically dispatch OLAP and OLTP requests to different nodes.

1

Before You Start

  • Added a read-only columnar node in your PolarDB cluster.
  • Set HTAP Optimization: Transactional/Analytical ON. image.png

2

  • Created a database account: the account name is 'tpch', the password is 'Tpch_password'.

Steps for Running TPC-H Queries

Follow the steps below to experience the HTAP solution with PolarDB IMCI:

Step 1. Generate and load TPC-H data

First, log in to your ECS server via the ECS IP, and install the GIT and MySQL client.

$> sudo yum install -y mysql git

To easily generate and load TPC-H data, clone this Github project.

$> git clone  git@github.com:ApsaraDB/PolarDB-Hands-On.git
$> cp -rf PolarDB-Hands-On/PolarDB-M/IMCI/benchtpch ./

Then, compile the TPC-H data generation tool.

$> cd benchtpch/dbgen-src
$> echo "#define EOL_HANDLING 1" >> config.h 
$> make

Now, use the following command to generate 1GB TPC-H data (i.e., Scale Factor=1), and load it into PolarDB. Please fill in your PolarDB's cluster endpoint in the command. This step may take a few minutes, please wait and do not interrupt the process.

$> cd benchtpch
$> sh tpch/data_kit.sh \
      --parallel 2 \
      --base /usr  \
      -s 1 \
      -c 64 \
      --database tpch1g  \
      -g dbgen-src \
      --data tpchdata1g \
      --ddl tpch/columnar.ddl \
      --host <your polardb's cluster endpoint>  \
      --port 3306 \
      -u tpch \
      -p Tpch_password \
      all

Note: To generate data with different Scale Factor, you can use sh tpch/data_kit.sh --help for more information.

Step 2. Verify

You can check that the TPC-H data has been successfully loaded by the following commands:

Connect to PolarDB by the cluster endpoint.

$> mysql --host <your polardb's cluster endpoint> \
                 --port 3306 \
                 -utpch \
                 -pTpch_password

The mysql> prompt tells you that mysql is ready to enter SQL statements.

mysql> use tpch1g;
mysql> select count(*) from lineitem;

+----------+
| count(*) |
+----------+
|  6001215 |
+----------+

Step 3. Run TPC-H benchmark

Run TPC-H q1, a typical single table aggregation query.

mysql> use tpch1g;
mysql> select
        l_returnflag,
        l_linestatus,
        sum(l_quantity) as sum_qty,
        sum(l_extendedprice) as sum_base_price,
        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        avg(l_quantity) as avg_qty,
        avg(l_extendedprice) as avg_price,
        avg(l_discount) as avg_disc,
        count(*) as count_order
    from
        lineitem
    where
        l_shipdate <= date '1998-12-01' - interval '90' day
    group by
        l_returnflag,
        l_linestatus
    order by
        l_returnflag,
        l_linestatus;

+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
| l_returnflag | l_linestatus | sum_qty     | sum_base_price  | sum_disc_price    | sum_charge          | avg_qty   | avg_price    | avg_disc | count_order |
+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
| A            | F            | 37734107.00 |  56586554400.73 |  53758257134.8700 |  55909065222.827692 | 25.522006 | 38273.129735 | 0.049985 |     1478493 |
| N            | F            |   991417.00 |   1487504710.38 |   1413082168.0541 |   1469649223.194375 | 25.516472 | 38284.467761 | 0.050093 |       38854 |
| N            | O            | 74476040.00 | 111701729697.74 | 106118230307.6056 | 110367043872.497010 | 25.502227 | 38249.117989 | 0.049997 |     2920374 |
| R            | F            | 37719753.00 |  56568041380.90 |  53741292684.6040 |  55889619119.831932 | 25.505794 | 38250.854626 | 0.050009 |     1478870 |
+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
4 rows in set (0.25 sec)

Using the cluster endpoint, SQLs are automatically dispatched to read-only columnar nodes and accelerated using column indexes. To determine if a query is executed by columnar nodes, you can show the execution plan by EXPLAIN. For example, the IMCI execution plan of q1 is shown below.

3

Run TPC-H q3, a multi-table join query.

mysql> select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
    from
        customer,
        orders,
        lineitem
    where
        c_mktsegment='BUILDING'
        and c_custkey =o_custkey
        and l_orderkey=o_orderkey
        and o_orderdate < date'1995-03-15'
        and l_shipdate > date'1995-03-15'
    group by
        l_orderkey,
        o_orderdate,
        o_shippriority
    order by
        revenue desc,
        o_orderdate
    limit 10;


+------------+-------------+-------------+----------------+
| l_orderkey | revenue     | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
|    2456423 | 406181.0111 | 1995-03-05  |              0 |
|    3459808 | 405838.6989 | 1995-03-04  |              0 |
|     492164 | 390324.0610 | 1995-02-19  |              0 |
|    1188320 | 384537.9359 | 1995-03-09  |              0 |
|    2435712 | 378673.0558 | 1995-02-26  |              0 |
|    4878020 | 378376.7952 | 1995-03-12  |              0 |
|    5521732 | 375153.9215 | 1995-03-13  |              0 |
|    2628192 | 373133.3094 | 1995-02-22  |              0 |
|     993600 | 371407.4595 | 1995-03-05  |              0 |
|    2300070 | 367371.1452 | 1995-03-13  |              0 |
+------------+-------------+-------------+----------------+
10 rows in set (0.10 sec)

All the TPC-H queries are located in TPC-H Queries.

Step 4. Run point queries via automatic request distribution

When using the cluster endpoint, PolarDB will automatically choose the appropriate row-store indexes for point query executions. The following command runs several point query SQLs.

mysql> select * from lineitem where l_orderkey = 1;
+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+------------------------------------+
| l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct    | l_shipmode | l_comment                          |
+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+------------------------------------+
|          1 |    155190 |      7706 |            1 |      17.00 |        21168.23 |       0.04 |  0.02 | N            | O            | 1996-03-13 | 1996-02-12   | 1996-03-22    | DELIVER IN PERSON | TRUCK      | egular courts above the            |
|          1 |     67310 |      7311 |            2 |      36.00 |        45983.16 |       0.09 |  0.06 | N            | O            | 1996-04-12 | 1996-02-28   | 1996-04-20    | TAKE BACK RETURN  | MAIL       | ly final dependencies: slyly bold  |
|          1 |     63700 |      3701 |            3 |       8.00 |        13309.60 |       0.10 |  0.02 | N            | O            | 1996-01-29 | 1996-03-05   | 1996-01-31    | TAKE BACK RETURN  | REG AIR    | riously. regular, express dep      |
|          1 |      2132 |      4633 |            4 |      28.00 |        28955.64 |       0.09 |  0.06 | N            | O            | 1996-04-21 | 1996-03-30   | 1996-05-16    | NONE              | AIR        | lites. fluffily even de            |
|          1 |     24027 |      1534 |            5 |      24.00 |        22824.48 |       0.10 |  0.04 | N            | O            | 1996-03-30 | 1996-03-14   | 1996-04-01    | NONE              | FOB        |  pending foxes. slyly re           |
|          1 |     15635 |       638 |            6 |      32.00 |        49620.16 |       0.07 |  0.02 | N            | O            | 1996-01-30 | 1996-02-07   | 1996-02-03    | DELIVER IN PERSON | MAIL       | arefully slyly ex                  |
+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+------------------------------------+
6 rows in set (0.00 sec)


mysql> select * from lineitem where l_orderkey between 1 and 1000 order by l_orderkey limit 10;
+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+-------------------------------------+
| l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct    | l_shipmode | l_comment                           |
+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+-------------------------------------+
|          1 |    155190 |      7706 |            1 |      17.00 |        21168.23 |       0.04 |  0.02 | N            | O            | 1996-03-13 | 1996-02-12   | 1996-03-22    | DELIVER IN PERSON | TRUCK      | egular courts above the             |
|          1 |     67310 |      7311 |            2 |      36.00 |        45983.16 |       0.09 |  0.06 | N            | O            | 1996-04-12 | 1996-02-28   | 1996-04-20    | TAKE BACK RETURN  | MAIL       | ly final dependencies: slyly bold   |
|          1 |     63700 |      3701 |            3 |       8.00 |        13309.60 |       0.10 |  0.02 | N            | O            | 1996-01-29 | 1996-03-05   | 1996-01-31    | TAKE BACK RETURN  | REG AIR    | riously. regular, express dep       |
|          1 |      2132 |      4633 |            4 |      28.00 |        28955.64 |       0.09 |  0.06 | N            | O            | 1996-04-21 | 1996-03-30   | 1996-05-16    | NONE              | AIR        | lites. fluffily even de             |
|          1 |     24027 |      1534 |            5 |      24.00 |        22824.48 |       0.10 |  0.04 | N            | O            | 1996-03-30 | 1996-03-14   | 1996-04-01    | NONE              | FOB        |  pending foxes. slyly re            |
|          1 |     15635 |       638 |            6 |      32.00 |        49620.16 |       0.07 |  0.02 | N            | O            | 1996-01-30 | 1996-02-07   | 1996-02-03    | DELIVER IN PERSON | MAIL       | arefully slyly ex                   |
|          2 |    106170 |      1191 |            1 |      38.00 |        44694.46 |       0.00 |  0.05 | N            | O            | 1997-01-28 | 1997-01-14   | 1997-02-02    | TAKE BACK RETURN  | RAIL       | ven requests. deposits breach a     |
|          3 |      4297 |      1798 |            1 |      45.00 |        54058.05 |       0.06 |  0.00 | R            | F            | 1994-02-02 | 1994-01-04   | 1994-02-23    | NONE              | AIR        | ongside of the furiously brave acco |
|          3 |     19036 |      6540 |            2 |      49.00 |        46796.47 |       0.10 |  0.00 | R            | F            | 1993-11-09 | 1993-12-20   | 1993-11-24    | TAKE BACK RETURN  | RAIL       |  unusual accounts. eve              |
|          3 |    128449 |      3474 |            3 |      27.00 |        39890.88 |       0.06 |  0.07 | A            | F            | 1994-01-16 | 1993-11-22   | 1994-01-23    | DELIVER IN PERSON | SHIP       | nal foxes wake.                     |
+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+-------------------------------------+
10 rows in set (0.00 sec)

Use EXPLAIN to check the execution plan of the above query.

4

0 1 0
Share on

ApsaraDB

314 posts | 41 followers

You may also like

Comments

ApsaraDB

314 posts | 41 followers

Related Products

  • PolarDB

    Alibaba Cloud PolarDB is a cloud-native relational database service that decouples computing resources from storage resources

    Learn More
  • PolarDB-X

    Designed to address database challenges such as ultra-high concurrency, massive data storage, and large table performance bottlenecks.

    Learn More
  • PolarDB Stack

    A cloud-native database management platform that allows you to manage on-premises databases in the same way as in Alibaba Cloud.

    Learn More
  • Elastic High Performance Computing Solution

    High Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.

    Learn More