全部產品
Search
文件中心

PolarDB:分析列存索引查詢效能

更新時間:Jul 06, 2024

PolarDB MySQL版新增列存索引查詢效能分析功能。該功能通過採集執行SQL語句時的耗時等資訊,並結合EXPLAIN獲得的查詢計劃一併返回給使用者。能夠協助使用者瞭解執行SQL語句時的耗時細節,並且能夠輔助分析慢SQL。

版本要求

叢集版本需為PolarDB MySQL版8.0.1.1.42及以上,您可以通過查詢版本號碼來確認叢集版本。

使用說明

您需要將參數imci_analyze_query的值設定為ON,來開啟列存索引查詢效能分析功能。

參數名稱

層級

說明

imci_analyze_query

Session

列存索引查詢效能分析功能控制開關。取值範圍如下:

  • OFF(預設):關閉列存索引查詢效能分析功能。

  • ON:開啟列存索引查詢效能分析功能。

樣本

簡單查詢樣本

以在TPC-H Schema上執行的一個簡單查詢語句為例,SQL語句如下:

SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;
  1. 首先,您可以通過EXPLAIN得到該查詢最佳化工具估計的各個運算元的執行代價以及返回行數資訊。

    EXPLAIN SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;

    查詢結果如下:

    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
    | ID | Operator               | Name     | E-Rows | E-Cost   | Extra Info                                                    |
    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
    |  1 | Select Statement       |          |        |          | IMCI Execution Plan (max_dop = 1, max_query_mem = unlimited)  |
    |  2 | └─Hash Groupby         |          | 6      | 51218.50 | Group Key: lineitem.l_shipmode                                |
    |  3 |   └─Compute Scalar     |          | 1869   | 50191.00 |                                                               |
    |  4 |     └─Hash Join        |          | 1869   | 31000.00 | Join Cond: lineitem.l_orderkey = orders.o_orderkey            |
    |  5 |       ├─Table Scan     | lineitem | 2000   | 0.00     |                                                               |
    |  6 |       └─Table Scan     | orders   | 2000   | 0.00     |                                                               |
    +----+------------------------+----------+--------+----------+---------------------------------------------------------------+
  2. 開啟效能分析功能。

    SET imci_analyze_query = ON;
  3. 執行SQL語句。

    SELECT l_shipmode, COUNT(*) FROM lineitem, orders WHERE l_orderkey = o_orderkey GROUP BY l_shipmode;

    執行結果如下:

    +------------+----------+
    | l_shipmode | COUNT(*) |
    +------------+----------+
    | REG AIR    |      283 |
    | SHIP       |      269 |
    | FOB        |      299 |
    | RAIL       |      289 |
    | TRUCK      |      314 |
    | MAIL       |      274 |
    | AIR        |      272 |
    +------------+----------+
    7 rows in set (0.05 sec)

    效能分析結果會存放在information_schema.imci_sql_profiling表中。在將參數imci_analyze_query的值設定為ON時,該表將存放最近一條使用列存索引執行的查詢語句的效能分析資訊,通過查詢這張表,就可以獲得執行的查詢語句的效能資訊。

  4. 查詢SQL語句的效能資訊。

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    
    說明

    /*ROUTE_TO_LAST_USED*/用於指示智能代理將查詢路由到上一條語句執行的節點上。否則可能會因為路由不到查詢執行的節點,導致查詢不到結果或結果不正確。

    查詢結果如下:

    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+
    | ID | Operator               | Name     | A-Rows | A-Cost   | Execution Time(s) | Extra Info                                                                                             |
    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+
    |  1 | Select Statement       |          |      0 | 52609.51 |                 0 | IMCI Execution Plan (max_dop = 1, real_dop = 1, max_query_mem = unlimited, real_query_mem = unlimited) |
    |  2 | └─Hash Groupby         |          |      7 |  52609.5 |             0.002 | Group Key: lineitem.l_shipmode                                                                         |
    |  3 |   └─Compute Scalar     |          |   2000 |    51501 |                 0 |                                                                                                        |
    |  4 |     └─Hash Join        |          |   2000 |    31000 |             0.007 | Join Cond: lineitem.l_orderkey = orders.o_orderkey                                                     |
    |  5 |       ├─Table Scan     | lineitem |   2000 |        0 |             0.001 |                                                                                                        |
    |  6 |       └─Table Scan     | orders   |   2000 |        0 |                 0 |                                                                                                        |
    +----+------------------------+----------+--------+----------+-------------------+--------------------------------------------------------------------------------------------------------+

    可以看到,Hash JoinHash Groupby對應的Extra Info資訊中標明了對應的串連條件以及分組列,而第1行的Extra Info則展示了執行查詢操作時使用的CPU和記憶體資訊。

  5. (可選)您可以在information_schema.imci_sql_profiling表上執行彙總等操作。如查看該查詢語句的執行時間等。

    /*ROUTE_TO_LAST_USED*/SELECT SUM(`Execution Time(s)`) AS TOTAL_TIME FROM information_schema.imci_sql_profiling;

    執行結果如下:

    +----------------------+
    | TOTAL_TIME           |
    +----------------------+
    | 0.010000000000000002 |
    +----------------------+
    1 row in set (0.00 sec)

    可以看到,該查詢語句的執行耗時為10ms。

複雜查詢樣本

以在TPC-H SF100資料集上執行一個複雜查詢為例,通過EXPLAIN和查詢分析功能分析查詢耗時分布以及效能。SQL語句如下:

SELECT
    c_name,
    sum(l_quantity)
FROM
    customer,
    orders,
    lineitem
WHERE
    o_orderkey IN (
        SELECT
            l_orderkey
        FROM
            lineitem
        WHERE 
            l_partkey > 18000000
    )
    AND c_custkey = o_custkey
    AND o_orderkey = l_orderkey
GROUP BY
    c_name
ORDER BY
    c_name
LIMIT 10;
  1. 通過EXPLAIN查看該查詢語句的執行計畫。

    EXPLAIN SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    執行結果如下:

    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    | ID | Operator                         | Name     | E-Rows    | E-Cost     | Extra Info                                                    |
    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    |  1 | Select Statement                 |          |           |            | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) |
    |  2 | └─Limit                          |          | 10        | 7935739.96 | Offset=0 Limit=10                                             |
    |  3 |   └─Sort                         |          | 10        | 7935739.96 | Sort Key: c_name ASC                                          |
    |  4 |     └─Hash Groupby               |          | 1503700   | 7933273.99 | Group Key: customer.C_NAME                                    |
    |  5 |       └─Hash Right Semi Join     |          | 54010545  | 7865930.26 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY            |
    |  6 |         ├─Table Scan             | lineitem | 59785766  | 24001.52   | Cond: (L_PARTKEY > 18000000)                                  |
    |  7 |         └─Hash Join              |          | 538776190 | 5488090.33 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY            |
    |  8 |           ├─Hash Join            |          | 181006430 | 668535.99  | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY              |
    |  9 |           │ ├─Table Scan         | customer | 15000000  | 600.00     |                                                               |
    | 10 |           │ └─Table Scan         | orders   | 150000000 | 6000.00    |                                                               |
    | 11 |           └─Table Scan           | lineitem | 600037902 | 24001.52   |                                                               |
    +----+----------------------------------+----------+-----------+------------+---------------------------------------------------------------+

    該SQL語句的執行代價為7935739。其中,ID為7的Hash Join會消耗大量CPU(其代價為5488090,佔總代價約70%)。若要驗證這條查詢語句的效能問題,您可以開啟效能分析功能,開啟後再執行該SQL語句,並通過查看SQL語句的查詢效能資訊來分析具體原因。

  2. 開啟效能分析功能。

    SET imci_analyze_query = ON;
  3. 執行SQL語句。

    SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    執行結果如下:

    +--------------------+-----------------+
    | c_name             | sum(l_quantity) |
    +--------------------+-----------------+
    | Customer#000000001 |          172.00 |
    | Customer#000000002 |          663.00 |
    | Customer#000000004 |          174.00 |
    | Customer#000000005 |          488.00 |
    | Customer#000000007 |         1135.00 |
    | Customer#000000008 |          440.00 |
    | Customer#000000010 |          625.00 |
    | Customer#000000011 |          143.00 |
    | Customer#000000013 |         1032.00 |
    | Customer#000000014 |          564.00 |
    +--------------------+-----------------+
    10 rows in set (21.37 sec)
  4. 查詢SQL語句的效能資訊。

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    
    說明

    /*ROUTE_TO_LAST_USED*/用於指示智能代理將查詢路由到上一條語句執行的節點上。否則可能會因為路由到錯誤節點而查詢不到結果。

    查詢結果如下:

    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    | ID | Operator                         | Name     | A-Rows    | A-Cost     | Execution Time(s) | Extra Info                                         |
    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    |  1 | Select Statement                 |          |         0 | 8336856.67 |                 0 |                                                    |
    |  2 | └─Limit                          |          |        10 | 8336856.67 |             0.002 | Offset=0 Limit=10                                  |
    |  3 |   └─Sort                         |          |         0 | 8336856.67 |             2.275 | Sort Key: c_name ASC                               |
    |  4 |     └─Hash Groupby               |          |   9813586 | 8320763.22 |           160.083 | Group Key: customer.C_NAME                         |
    |  5 |       └─Hash Right Semi Join     |          | 239598134 | 7994854.23 |            98.174 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY |
    |  6 |         ├─Table Scan             | lineitem |  60013756 |   24001.52 |              3.28 | Cond: (L_PARTKEY > 18000000)                       |
    |  7 |         └─Hash Join              |          | 600037902 | 5156677.35 |           301.503 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY |
    |  8 |           ├─Hash Join            |          | 150000000 |  629777.96 |            97.201 | Join Cond: customer.C_CUSTKEY = orders.O_CUSTKEY   |
    |  9 |           │ ├─Table Scan         | customer |  15000000 |        600 |             3.321 |                                                    |
    | 10 |           │ └─Table Scan         | orders   | 150000000 |       6000 |             0.241 |                                                    |
    | 11 |           └─Table Scan           | lineitem | 600037902 |   24001.52 |             0.661 |                                                    |
    +----+----------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+

    根據查詢的效能資訊可以看出:Hash Join(7)耗費了約一半的時間。這是因為lineitem表與Hash Join(8)表的結果都很大,從而導致join的資料量很大,因此耗時很長。在Hash Join中,構建雜湊表的開銷通常遠遠大於雜湊尋找,因此只需要減小Join(8)的結果大小,就能有效減少查詢的耗時。

    查看SQL語句和執行計畫,若最佳化器將o_orderkey in (...)轉換為半串連子查詢,且能夠將該條件下推,則能提高該SQL語句的查詢效能。

  5. 開啟基於代價的半串連下推功能。

    SET imci_optimizer_switch = 'semijoin_pushdown=on';
  6. 通過EXPLAIN查看該查詢語句的執行計畫。

    EXPLAIN SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    執行結果如下:

    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    | ID | Operator                               | Name     | E-Rows    | E-Cost     | Extra Info                                                    |
    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+
    |  1 | Select Statement                       |          |           |            | IMCI Execution Plan (max_dop = 32, max_query_mem = unlimited) |
    |  2 | └─Limit                                |          | 10        | 2800433.74 | Offset=0 Limit=10                                             |
    |  3 |   └─Sort                               |          | 10        | 2800433.74 | Sort Key: c_name ASC                                          |
    |  4 |     └─Hash Groupby                     |          | 14567321  | 2776544.58 | Group Key: customer.C_NAME                                    |
    |  5 |       └─Hash Join                      |          | 57918330  | 2631846.75 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY            |
    |  6 |         ├─Hash Join                    |          | 14567321  | 1014041.92 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY              |
    |  7 |         │ ├─Hash Right Semi Join       |          | 12071937  | 906226.67  | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY            |
    |  8 |         │ │ ├─Table Scan               | lineitem | 59785766  | 24001.52   | Cond: (L_PARTKEY > 18000000)                                  |
    |  9 |         │ │ └─Table Scan               | orders   | 150000000 | 6000.00    |                                                               |
    | 10 |         │ └─Table Scan                 | customer | 15000000  | 600.00     |                                                               |
    | 11 |         └─Table Scan                   | lineitem | 600037902 | 24001.52   |                                                               |
    +----+----------------------------------------+----------+-----------+------------+---------------------------------------------------------------+

    可以看出,開啟半串連下推功能後,查詢總代價從7935739下降至2800433。

  7. 重新執行該SQL語句。

    SELECT
        c_name,
        sum(l_quantity)
    FROM
        customer,
        orders,
        lineitem
    WHERE
        o_orderkey IN (
            SELECT
                l_orderkey
            FROM
                lineitem
            WHERE 
                l_partkey > 18000000
        )
        AND c_custkey = o_custkey
        AND o_orderkey = l_orderkey
    GROUP BY
        c_name
    ORDER BY
        c_name
    LIMIT 10;

    執行結果如下:

    +--------------------+-----------------+
    | c_name             | sum(l_quantity) |
    +--------------------+-----------------+
    | Customer#000000001 |          172.00 |
    | Customer#000000002 |          663.00 |
    | Customer#000000004 |          174.00 |
    | Customer#000000005 |          488.00 |
    | Customer#000000007 |         1135.00 |
    | Customer#000000008 |          440.00 |
    | Customer#000000010 |          625.00 |
    | Customer#000000011 |          143.00 |
    | Customer#000000013 |         1032.00 |
    | Customer#000000014 |          564.00 |
    +--------------------+-----------------+
    10 rows in set (13.74 sec)

    可以看出,執行耗時降低了約40%。

  8. 查詢SQL語句的效能資訊。

    /*ROUTE_TO_LAST_USED*/SELECT * FROM information_schema.imci_sql_profiling;    

    執行結果如下:

    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    | ID | Operator                               | Name     | A-Rows    | A-Cost     | Execution Time(s) | Extra Info                                         |
    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+
    |  1 | Select Statement                       |          |         0 | 4318488.35 |                 0 |                                                    |
    |  2 | └─Limit                                |          |        10 | 4318488.35 |             0.002 | Offset=0 Limit=10                                  |
    |  3 |   └─Sort                               |          |         0 | 4318488.34 |             3.076 | Sort Key: c_name ASC                               |
    |  4 |     └─Hash Groupby                     |          |   9813586 | 4302394.89 |           163.149 | Group Key: customer.C_NAME                         |
    |  5 |       └─Hash Join                      |          | 239598134 | 3976485.91 |           151.253 | Join Cond: orders.O_ORDERKEY = lineitem.L_ORDERKEY |
    |  6 |         ├─Hash Join                    |          |  49393149 | 1321335.54 |            55.392 | Join Cond: orders.O_CUSTKEY = customer.C_CUSTKEY   |
    |  7 |         │ ├─Hash Right Semi Join       |          |  49393149 |  954805.16 |            52.552 | Join Cond: lineitem.L_ORDERKEY = orders.O_ORDERKEY |
    |  8 |         │ │ ├─Table Scan               | lineitem |  60013756 |   24001.52 |             2.791 | Cond: (L_PARTKEY > 18000000)                       |
    |  9 |         │ │ └─Table Scan               | orders   | 150000000 |       6000 |             0.152 |                                                    |
    | 10 |         │ └─Table Scan                 | customer |  15000000 |        600 |             0.028 |                                                    |
    | 11 |         └─Table Scan                   | lineitem | 600037902 |   24001.52 |             0.642 |                                                    |
    +----+----------------------------------------+----------+-----------+------------+-------------------+----------------------------------------------------+

    從效能分析結果可以看出,半串連被下推,之前的大表Join被消除,查詢耗時顯著減少。