All Products
Search
Document Center

PolarDB:Analyze the IMCI-based query performance

Last Updated:Jul 10, 2024

PolarDB for MySQL provides the In-Memory Column Index (IMCI)-based query performance analysis feature. The feature collects detailed SQL statement execution information such as the execution duration, and returns the information together with the query plan obtained by using the EXPLAIN statement. This enables you to gain insights into the execution details of SQL statements and helps you analyze slow queries.

Prerequisites

Your cluster runs PolarDB for MySQL 8.0.1.1.42 or later. For information about how to check the version information of a cluster, see Query an engine version.

Usage notes

You must set the imci_analyze_query parameter to ON to enable the IMCI-based query performance analysis feature.

Parameter

Level

Description

imci_analyze_query

Session

Specifies whether to enable the IMCI-based query performance analysis feature. Valid values:

  • OFF (default)

  • ON

Examples

Simple query example

In the following example, a simple SQL statement is executed on the Transaction Processing Performance Council Benchmark H (TPC-H) schema. Sample SQL statement:

SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;
  1. Execute the EXPLAIN statement to retrieve detailed information about the execution plan for the query, including the estimated execution cost of each operator and the number of rows that each operator processes.

    EXPLAIN SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;

    Sample result:

    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
    | ID | Operator               | Name     | E-Rows | E-Cost   | Extra Info                                                    |
    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
    |  1 | Select Statement       |          |        |          | IMCI Execution Plan (max_dop = 1, max_query_mem = unlimited)  |
    |  2 | └─Hash Groupby         |          | 6      | 51218.50 | Group Key: lineitem.l_shipmode                                |
    |  3 |   └─Compute Scalar     |          | 1869   | 50191.00 |                                                               |
    |  4 |     └─Hash Join        |          | 1869   | 31000.00 | Join Cond: lineitem.l_orderkey = orders.o_orderkey            |
    |  5 |       ├─Table Scan     | lineitem | 2000   | 0.00     |                                                               |
    |  6 |       └─Table Scan     | orders   | 2000   | 0.00     |                                                               |
    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
  2. Enable the IMCI-based query performance analysis feature.

    SET imci_analyze_query = ON;
  3. Execute the SQL statement.

    SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;

    Sample result:

    +------------+----------+
    | l_shipmode | COUNT(*) |
    +------------+----------+
    | REG AIR    |      283 |
    | SHIP       |      269 |
    | FOB        |      299 |
    | RAIL       |      289 |
    | TRUCK      |      314 |
    | MAIL       |      274 |
    | AIR        |      272 |
    +------------+----------+
    7 rows in set (0.05 sec)

    The performance profiling data is stored in the information_schema.imci_sql_profiling table. After you set the imci_analyze_query parameter to ON, the table stores the performance profiling data for the most recent SQL statement executed by using IMCIs. You can retrieve the performance profiling data of the SQL statement by querying the table.

  4. Query the performance profiling data of the SQL statement.

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    
    Note

    The /*ROUTE_TO_LAST_USED*/ hint instructs PolarProxy to route the query to the node in which the most recent SQL statement was executed by using IMCIs. If you do not use the /*ROUTE_TO_LAST_USED*/ hint, the query may be routed to an incorrect node and the query results may not be obtained or may be incorrect.

    Sample result:

    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+
    | ID | Operator               | Name     | A-Rows | A-Cost   | Execution Time(s) | Extra Info                                                                                             |
    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+
    |  1 | Select Statement       |          |      0 | 52609.51 |                 0 | IMCI Execution Plan (max_dop = 1, real_dop = 1, max_query_mem = unlimited, real_query_mem = unlimited) |
    |  2 | └─Hash Groupby         |          |      7 |  52609.5 |             0.002 | Group Key: lineitem.l_shipmode                                                                         |
    |  3 |   └─Compute Scalar     |          |   2000 |    51501 |                 0 |                                                                                                        |
    |  4 |     └─Hash Join        |          |   2000 |    31000 |             0.007 | Join Cond: lineitem.l_orderkey = orders.o_orderkey                                                     |
    |  5 |       ├─Table Scan     | lineitem |   2000 |        0 |             0.001 |                                                                                                        |
    |  6 |       └─Table Scan     | orders   |   2000 |        0 |                 0 |                                                                                                        |
    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+

    In the output, the Extra Info column displays the join condition for the Hash Join operator and the group key for the Hash Groupby operator. The Extra Info column in the first row displays the usage of CPU and memory resources.

  5. (Optional) Perform operations such as aggregation on the information_schema.imci_sql_profiling table. For example, you can query the execution duration of the SQL statement.

    /*ROUTE_TO_LAST_USED*/SELECT SUM(`Execution Time(s)`) AS TOTAL_TIME FROM information_schema.imci_sql_profiling;

    Sample result:

    +----------------------+
    | TOTAL_TIME           |
    +----------------------+
    | 0.010000000000000002 |
    +----------------------+
    1 row in set (0.00 sec)

    The result indicates that the total execution duration of the SQL statement is 10 ms.

Complex query example

In the following example, a complex SQL statement is executed to query a TPC-H SF100 dataset. Sample statement:

SELECT
    c_name,
    sum(l_quantity)
FROM
    customer,
    orders,
    lineitem
WHERE
    o_orderkey IN (
        SELECT
            l_orderkey
        FROM
            lineitem
        WHERE 
            l_partkey > 18000000
    )
    AND c_custkey = o_custkey
    AND o_orderkey = l_orderkey
GROUP BY
    c_name
ORDER BY
    c_name
LIMIT 10;
  1. Execute the EXPLAIN statement to retrieve detailed information about the execution plan for the query.

    EXPLAIN SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    Sample result:

    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    | ID | Operator                         | Name     | E-Rows    | E-Cost     | Extra Info                                                    |
    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    |  1 | Select Statement                 |          |           |            | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) |
    |  2 | └─Limit                          |          | 10        | 7935739.96 | Offset=0 Limit=10                                             |
    |  3 |   └─Sort                         |          | 10        | 7935739.96 | Sort Key: c_name ASC                                          |
    |  4 |     └─Hash Groupby               |          | 1503700   | 7933273.99 | Group Key: customer.C_NAME                                    |
    |  5 |       └─Hash Right Semi Join     |          | 54010545  | 7865930.26 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY            |
    |  6 |         ├─Table Scan             | lineitem | 59785766  | 24001.52   | Cond: (L_PARTKEY > 18000000)                                  |
    |  7 |         └─Hash Join              |          | 538776190 | 5488090.33 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY            |
    |  8 |           ├─Hash Join            |          | 181006430 | 668535.99  | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY              |
    |  9 |           │ ├─Table Scan         | customer | 15000000  | 600.00     |                                                               |
    | 10 |           │ └─Table Scan         | orders   | 150000000 | 6000.00    |                                                               |
    | 11 |           └─Table Scan           | lineitem | 600037902 | 24001.52   |                                                               |
    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+

    The execution cost of the SQL statement is 7935739. The Hash Join operator whose ID is 7 consumes a large amount of CPU resources. The execution cost is 5488090, which accounts for approximately 70% of the total cost. To identify the performance issue of the SQL statement, you can enable the IMCI-based query performance analysis feature, execute the SQL statement, and then view the performance information of the SQL statement.

  2. Enable the IMCI-based query performance analysis feature.

    SET imci_analyze_query = ON;
  3. Execute the SQL statement.

    SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    Sample result:

    +--------------------+-----------------+
    | c_name             | sum(l_quantity) |
    +--------------------+-----------------+
    | Customer#000000001 |          172.00 |
    | Customer#000000002 |          663.00 |
    | Customer#000000004 |          174.00 |
    | Customer#000000005 |          488.00 |
    | Customer#000000007 |         1135.00 |
    | Customer#000000008 |          440.00 |
    | Customer#000000010 |          625.00 |
    | Customer#000000011 |          143.00 |
    | Customer#000000013 |         1032.00 |
    | Customer#000000014 |          564.00 |
    +--------------------+-----------------+
    10 rows in set (21.37 sec)
  4. Query the performance profiling data of the SQL statement.

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    
    Note

    The /*ROUTE_TO_LAST_USED*/ hint instructs PolarProxy to route the query to the node in which the most recent SQL statement was executed by using IMCIs. If you do not use the /*ROUTE_TO_LAST_USED*/ hint, the query may be routed to an incorrect node and the query results may not be obtained.

    Sample result:

    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    | ID | Operator                         | Name     | A-Rows    | A-Cost     | Execution Time(s) | Extra Info                                         |
    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    |  1 | Select Statement                 |          |         0 | 8336856.67 |                 0 |                                                    |
    |  2 | └─Limit                          |          |        10 | 8336856.67 |             0.002 | Offset=0 Limit=10                                  |
    |  3 |   └─Sort                         |          |         0 | 8336856.67 |             2.275 | Sort Key: c_name ASC                               |
    |  4 |     └─Hash Groupby               |          |   9813586 | 8320763.22 |           160.083 | Group Key: customer.C_NAME                         |
    |  5 |       └─Hash Right Semi Join     |          | 239598134 | 7994854.23 |            98.174 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY |
    |  6 |         ├─Table Scan             | lineitem |  60013756 |   24001.52 |              3.28 | Cond: (L_PARTKEY > 18000000)                       |
    |  7 |         └─Hash Join              |          | 600037902 | 5156677.35 |           301.503 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY |
    |  8 |           ├─Hash Join            |          | 150000000 |  629777.96 |            97.201 | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY   |
    |  9 |           │ ├─Table Scan         | customer |  15000000 |        600 |             3.321 |                                                    |
    | 10 |           │ └─Table Scan         | orders   | 150000000 |       6000 |             0.241 |                                                    |
    | 11 |           └─Table Scan           | lineitem | 600037902 |   24001.52 |             0.661 |                                                    |
    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+

    The result indicates that the Hash Join operator whose ID is 7 consumes approximately half of the execution duration. This is because the lineitem table and the results of the Hash Join operator whose ID is 8 are large. As a result, a large amount of data is involved in the join operation. In a hash join operation, the overhead of building a hash table is much greater than that of a hash lookup. In this case, you only need to reduce the result size of the Hash Join operator whose ID is 8 to effectively reduce the query duration.

    Based on the SQL statement and the execution plan of the query, the query performance can be improved if the optimizer converts the o_orderkey in (...) condition into a semi-join subquery and pushes down the semi-join subquery.

  5. Enable cost-based semi-join pushdown.

    SET imci_optimizer_switch = 'semijoin_pushdown=on';
  6. Execute the EXPLAIN statement to retrieve detailed information about the execution plan for the query.

    EXPLAIN SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    Sample result:

    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    | ID | Operator                               | Name     | E-Rows    | E-Cost     | Extra Info                                                    |
    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    |  1 | Select Statement                       |          |           |            | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) |
    |  2 | └─Limit                                |          | 10        | 2800433.74 | Offset=0 Limit=10                                             |
    |  3 |   └─Sort                               |          | 10        | 2800433.74 | Sort Key: c_name ASC                                          |
    |  4 |     └─Hash Groupby                     |          | 14567321  | 2776544.58 | Group Key: customer.C_NAME                                    |
    |  5 |       └─Hash Join                      |          | 57918330  | 2631846.75 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY            |
    |  6 |         ├─Hash Join                    |          | 14567321  | 1014041.92 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY              |
    |  7 |         │ ├─Hash Right Semi Join       |          | 12071937  | 906226.67  | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY            |
    |  8 |         │ │ ├─Table Scan               | lineitem | 59785766  | 24001.52   | Cond: (L_PARTKEY > 18000000)                                  |
    |  9 |         │ │ └─Table Scan               | orders   | 150000000 | 6000.00    |                                                               |
    | 10 |         │ └─Table Scan                 | customer | 15000000  | 600.00     |                                                               |
    | 11 |         └─Table Scan                   | lineitem | 600037902 | 24001.52   |                                                               |
    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+

    The result indicates that the total query cost is reduced from 7935739 to 2800433 after the cost-based semi-join pushdown feature is enabled.

  7. Re-execute the SQL statement.

    SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    Sample result:

    +--------------------+-----------------+
    | c_name             | sum(l_quantity) |
    +--------------------+-----------------+
    | Customer#000000001 |          172.00 |
    | Customer#000000002 |          663.00 |
    | Customer#000000004 |          174.00 |
    | Customer#000000005 |          488.00 |
    | Customer#000000007 |         1135.00 |
    | Customer#000000008 |          440.00 |
    | Customer#000000010 |          625.00 |
    | Customer#000000011 |          143.00 |
    | Customer#000000013 |         1032.00 |
    | Customer#000000014 |          564.00 |
    +--------------------+-----------------+
    10 rows in set (13.74 sec)

    The result indicates that the execution duration is reduced by approximately 40%.

  8. Query the performance profiling data of the SQL statement.

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    

    Sample result:

    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    | ID | Operator                               | Name     | A-Rows    | A-Cost     | Execution Time(s) | Extra Info                                         |
    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    |  1 | Select Statement                       |          |         0 | 4318488.35 |                 0 |                                                    |
    |  2 | └─Limit                                |          |        10 | 4318488.35 |             0.002 | Offset=0 Limit=10                                  |
    |  3 |   └─Sort                               |          |         0 | 4318488.34 |             3.076 | Sort Key: c_name ASC                               |
    |  4 |     └─Hash Groupby                     |          |   9813586 | 4302394.89 |           163.149 | Group Key: customer.C_NAME                         |
    |  5 |       └─Hash Join                      |          | 239598134 | 3976485.91 |           151.253 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY |
    |  6 |         ├─Hash Join                    |          |  49393149 | 1321335.54 |            55.392 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY   |
    |  7 |         │ ├─Hash Right Semi Join       |          |  49393149 |  954805.16 |            52.552 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY |
    |  8 |         │ │ ├─Table Scan               | lineitem |  60013756 |   24001.52 |             2.791 | Cond: (L_PARTKEY > 18000000)                       |
    |  9 |         │ │ └─Table Scan               | orders   | 150000000 |       6000 |             0.152 |                                                    |
    | 10 |         │ └─Table Scan                 | customer |  15000000 |        600 |             0.028 |                                                    |
    | 11 |         └─Table Scan                   | lineitem | 600037902 |   24001.52 |             0.642 |                                                    |
    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+

    The result indicates that after the semi-join subquery is pushed down, the previous large table join operation is eliminated, which significantly reduces the query duration.