PolarDB for MySQL provides the In-Memory Column Index (IMCI)-based query performance analysis feature. The feature collects detailed SQL statement execution information such as the execution duration, and returns the information together with the query plan obtained by using the EXPLAIN statement. This enables you to gain insights into the execution details of SQL statements and helps you analyze slow queries.
Prerequisites
Your cluster runs PolarDB for MySQL 8.0.1.1.42 or later. For information about how to check the version information of a cluster, see Query an engine version.
Usage notes
You must set the imci_analyze_query parameter to ON to enable the IMCI-based query performance analysis feature.
Parameter | Level | Description |
imci_analyze_query | Session | Specifies whether to enable the IMCI-based query performance analysis feature. Valid values:
|
Examples
Simple query example
In the following example, a simple SQL statement is executed on the Transaction Processing Performance Council Benchmark H (TPC-H) schema. Sample SQL statement:
SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;Execute the EXPLAIN statement to retrieve detailed information about the execution plan for the query, including the estimated execution cost of each operator and the number of rows that each operator processes.
EXPLAIN SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;Sample result:
+----+------------------------+----------+--------+----------+---------------------------------------------------------------+ | ID | Operator | Name | E-Rows | E-Cost | Extra Info | +----+------------------------+----------+--------+----------+---------------------------------------------------------------+ | 1 | Select Statement | | | | IMCI Execution Plan (max_dop = 1, max_query_mem = unlimited) | | 2 | └─Hash Groupby | | 6 | 51218.50 | Group Key: lineitem.l_shipmode | | 3 | └─Compute Scalar | | 1869 | 50191.00 | | | 4 | └─Hash Join | | 1869 | 31000.00 | Join Cond: lineitem.l_orderkey = orders.o_orderkey | | 5 | ├─Table Scan | lineitem | 2000 | 0.00 | | | 6 | └─Table Scan | orders | 2000 | 0.00 | | +----+------------------------+----------+--------+----------+---------------------------------------------------------------+Enable the IMCI-based query performance analysis feature.
SET imci_analyze_query = ON;Execute the SQL statement.
SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;Sample result:
+------------+----------+ | l_shipmode | COUNT(*) | +------------+----------+ | REG AIR | 283 | | SHIP | 269 | | FOB | 299 | | RAIL | 289 | | TRUCK | 314 | | MAIL | 274 | | AIR | 272 | +------------+----------+ 7 rows in set (0.05 sec)The performance profiling data is stored in the
information_schema.imci_sql_profilingtable. After you set theimci_analyze_queryparameter to ON, the table stores the performance profiling data for the most recent SQL statement executed by using IMCIs. You can retrieve the performance profiling data of the SQL statement by querying the table.Query the performance profiling data of the SQL statement.
/*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;NoteThe
/*ROUTE_TO_LAST_USED*/hint instructs PolarProxy to route the query to the node in which the most recent SQL statement was executed by using IMCIs. If you do not use the /*ROUTE_TO_LAST_USED*/ hint, the query may be routed to an incorrect node and the query results may not be obtained or may be incorrect.Sample result:
+----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+ | ID | Operator | Name | A-Rows | A-Cost | Execution Time(s) | Extra Info | +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+ | 1 | Select Statement | | 0 | 52609.51 | 0 | IMCI Execution Plan (max_dop = 1, real_dop = 1, max_query_mem = unlimited, real_query_mem = unlimited) | | 2 | └─Hash Groupby | | 7 | 52609.5 | 0.002 | Group Key: lineitem.l_shipmode | | 3 | └─Compute Scalar | | 2000 | 51501 | 0 | | | 4 | └─Hash Join | | 2000 | 31000 | 0.007 | Join Cond: lineitem.l_orderkey = orders.o_orderkey | | 5 | ├─Table Scan | lineitem | 2000 | 0 | 0.001 | | | 6 | └─Table Scan | orders | 2000 | 0 | 0 | | +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+In the output, the
Extra Infocolumn displays the join condition for theHash Joinoperator and the group key for theHash Groupbyoperator. TheExtra Infocolumn in the first row displays the usage of CPU and memory resources.(Optional) Perform operations such as aggregation on the
information_schema.imci_sql_profilingtable. For example, you can query the execution duration of the SQL statement./*ROUTE_TO_LAST_USED*/SELECT SUM(`Execution Time(s)`) AS TOTAL_TIME FROM information_schema.imci_sql_profiling;Sample result:
+----------------------+ | TOTAL_TIME | +----------------------+ | 0.010000000000000002 | +----------------------+ 1 row in set (0.00 sec)The result indicates that the total execution duration of the SQL statement is 10 ms.
Complex query example
In the following example, a complex SQL statement is executed to query a TPC-H SF100 dataset. Sample statement:
SELECT
c_name,
sum(l_quantity)
FROM
customer,
orders,
lineitem
WHERE
o_orderkey IN (
SELECT
l_orderkey
FROM
lineitem
WHERE
l_partkey > 18000000
)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY
c_name
ORDER BY
c_name
LIMIT 10;Execute the EXPLAIN statement to retrieve detailed information about the execution plan for the query.
EXPLAIN SELECT c_name, sum(l_quantity) FROM customer, orders, lineitem WHERE o_orderkey IN ( SELECT l_orderkey FROM lineitem WHERE l_partkey > 18000000 ) AND c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY c_name ORDER BY c_name LIMIT 10;Sample result:
+----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+ | ID | Operator | Name | E-Rows | E-Cost | Extra Info | +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+ | 1 | Select Statement | | | | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) | | 2 | └─Limit | | 10 | 7935739.96 | Offset=0 Limit=10 | | 3 | └─Sort | | 10 | 7935739.96 | Sort Key: c_name ASC | | 4 | └─Hash Groupby | | 1503700 | 7933273.99 | Group Key: customer.C_NAME | | 5 | └─Hash Right Semi Join | | 54010545 | 7865930.26 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY | | 6 | ├─Table Scan | lineitem | 59785766 | 24001.52 | Cond: (L_PARTKEY > 18000000) | | 7 | └─Hash Join | | 538776190 | 5488090.33 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY | | 8 | ├─Hash Join | | 181006430 | 668535.99 | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY | | 9 | │ ├─Table Scan | customer | 15000000 | 600.00 | | | 10 | │ └─Table Scan | orders | 150000000 | 6000.00 | | | 11 | └─Table Scan | lineitem | 600037902 | 24001.52 | | +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+The execution cost of the SQL statement is 7935739. The Hash Join operator whose ID is 7 consumes a large amount of CPU resources. The execution cost is 5488090, which accounts for approximately 70% of the total cost. To identify the performance issue of the SQL statement, you can enable the IMCI-based query performance analysis feature, execute the SQL statement, and then view the performance information of the SQL statement.
Enable the IMCI-based query performance analysis feature.
SET imci_analyze_query = ON;Execute the SQL statement.
SELECT c_name, sum(l_quantity) FROM customer, orders, lineitem WHERE o_orderkey IN ( SELECT l_orderkey FROM lineitem WHERE l_partkey > 18000000 ) AND c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY c_name ORDER BY c_name LIMIT 10;Sample result:
+--------------------+-----------------+ | c_name | sum(l_quantity) | +--------------------+-----------------+ | Customer#000000001 | 172.00 | | Customer#000000002 | 663.00 | | Customer#000000004 | 174.00 | | Customer#000000005 | 488.00 | | Customer#000000007 | 1135.00 | | Customer#000000008 | 440.00 | | Customer#000000010 | 625.00 | | Customer#000000011 | 143.00 | | Customer#000000013 | 1032.00 | | Customer#000000014 | 564.00 | +--------------------+-----------------+ 10 rows in set (21.37 sec)Query the performance profiling data of the SQL statement.
/*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;NoteThe
/*ROUTE_TO_LAST_USED*/hint instructs PolarProxy to route the query to the node in which the most recent SQL statement was executed by using IMCIs. If you do not use the /*ROUTE_TO_LAST_USED*/ hint, the query may be routed to an incorrect node and the query results may not be obtained.Sample result:
+----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+ | ID | Operator | Name | A-Rows | A-Cost | Execution Time(s) | Extra Info | +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+ | 1 | Select Statement | | 0 | 8336856.67 | 0 | | | 2 | └─Limit | | 10 | 8336856.67 | 0.002 | Offset=0 Limit=10 | | 3 | └─Sort | | 0 | 8336856.67 | 2.275 | Sort Key: c_name ASC | | 4 | └─Hash Groupby | | 9813586 | 8320763.22 | 160.083 | Group Key: customer.C_NAME | | 5 | └─Hash Right Semi Join | | 239598134 | 7994854.23 | 98.174 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY | | 6 | ├─Table Scan | lineitem | 60013756 | 24001.52 | 3.28 | Cond: (L_PARTKEY > 18000000) | | 7 | └─Hash Join | | 600037902 | 5156677.35 | 301.503 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY | | 8 | ├─Hash Join | | 150000000 | 629777.96 | 97.201 | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY | | 9 | │ ├─Table Scan | customer | 15000000 | 600 | 3.321 | | | 10 | │ └─Table Scan | orders | 150000000 | 6000 | 0.241 | | | 11 | └─Table Scan | lineitem | 600037902 | 24001.52 | 0.661 | | +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+The result indicates that the Hash Join operator whose ID is 7 consumes approximately half of the execution duration. This is because the
lineitemtable and the results of the Hash Join operator whose ID is 8 are large. As a result, a large amount of data is involved in the join operation. In a hash join operation, the overhead of building a hash table is much greater than that of a hash lookup. In this case, you only need to reduce the result size of the Hash Join operator whose ID is 8 to effectively reduce the query duration.Based on the SQL statement and the execution plan of the query, the query performance can be improved if the optimizer converts the
o_orderkey in (...)condition into a semi-join subquery and pushes down the semi-join subquery.Enable cost-based semi-join pushdown.
SET imci_optimizer_switch = 'semijoin_pushdown=on';Execute the EXPLAIN statement to retrieve detailed information about the execution plan for the query.
EXPLAIN SELECT c_name, sum(l_quantity) FROM customer, orders, lineitem WHERE o_orderkey IN ( SELECT l_orderkey FROM lineitem WHERE l_partkey > 18000000 ) AND c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY c_name ORDER BY c_name LIMIT 10;Sample result:
+----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+ | ID | Operator | Name | E-Rows | E-Cost | Extra Info | +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+ | 1 | Select Statement | | | | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) | | 2 | └─Limit | | 10 | 2800433.74 | Offset=0 Limit=10 | | 3 | └─Sort | | 10 | 2800433.74 | Sort Key: c_name ASC | | 4 | └─Hash Groupby | | 14567321 | 2776544.58 | Group Key: customer.C_NAME | | 5 | └─Hash Join | | 57918330 | 2631846.75 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY | | 6 | ├─Hash Join | | 14567321 | 1014041.92 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY | | 7 | │ ├─Hash Right Semi Join | | 12071937 | 906226.67 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY | | 8 | │ │ ├─Table Scan | lineitem | 59785766 | 24001.52 | Cond: (L_PARTKEY > 18000000) | | 9 | │ │ └─Table Scan | orders | 150000000 | 6000.00 | | | 10 | │ └─Table Scan | customer | 15000000 | 600.00 | | | 11 | └─Table Scan | lineitem | 600037902 | 24001.52 | | +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+The result indicates that the total query cost is reduced from 7935739 to 2800433 after the cost-based semi-join pushdown feature is enabled.
Re-execute the SQL statement.
SELECT c_name, sum(l_quantity) FROM customer, orders, lineitem WHERE o_orderkey IN ( SELECT l_orderkey FROM lineitem WHERE l_partkey > 18000000 ) AND c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY c_name ORDER BY c_name LIMIT 10;Sample result:
+--------------------+-----------------+ | c_name | sum(l_quantity) | +--------------------+-----------------+ | Customer#000000001 | 172.00 | | Customer#000000002 | 663.00 | | Customer#000000004 | 174.00 | | Customer#000000005 | 488.00 | | Customer#000000007 | 1135.00 | | Customer#000000008 | 440.00 | | Customer#000000010 | 625.00 | | Customer#000000011 | 143.00 | | Customer#000000013 | 1032.00 | | Customer#000000014 | 564.00 | +--------------------+-----------------+ 10 rows in set (13.74 sec)The result indicates that the execution duration is reduced by approximately 40%.
Query the performance profiling data of the SQL statement.
/*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;Sample result:
+----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+ | ID | Operator | Name | A-Rows | A-Cost | Execution Time(s) | Extra Info | +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+ | 1 | Select Statement | | 0 | 4318488.35 | 0 | | | 2 | └─Limit | | 10 | 4318488.35 | 0.002 | Offset=0 Limit=10 | | 3 | └─Sort | | 0 | 4318488.34 | 3.076 | Sort Key: c_name ASC | | 4 | └─Hash Groupby | | 9813586 | 4302394.89 | 163.149 | Group Key: customer.C_NAME | | 5 | └─Hash Join | | 239598134 | 3976485.91 | 151.253 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY | | 6 | ├─Hash Join | | 49393149 | 1321335.54 | 55.392 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY | | 7 | │ ├─Hash Right Semi Join | | 49393149 | 954805.16 | 52.552 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY | | 8 | │ │ ├─Table Scan | lineitem | 60013756 | 24001.52 | 2.791 | Cond: (L_PARTKEY > 18000000) | | 9 | │ │ └─Table Scan | orders | 150000000 | 6000 | 0.152 | | | 10 | │ └─Table Scan | customer | 15000000 | 600 | 0.028 | | | 11 | └─Table Scan | lineitem | 600037902 | 24001.52 | 0.642 | | +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+The result indicates that after the semi-join subquery is pushed down, the previous large table join operation is eliminated, which significantly reduces the query duration.