AnalyticDB for MySQL provides the SQL diagnostics feature to display execution plans of SQL queries in hierarchy charts. The hierarchy chart for an execution plan consists of two layers. The first layer is the stage layer, and the second layer is the operator layer. This topic describes how to use the stage and operator layers for execution plan hierarchy charts to analyze queries.

Execution plan hierarchy chart at the stage layer

An execution plan hierarchy chart consists of multiple stages at the stage layer, where data flows from bottom up. First, the stage that contains scan operators scans data. Then, the data is processed by intermediate stage nodes at different layers. Finally, the root node at the upmost layer returns the query results to the client.

1
The execution plan hierarchy chart at the stage layer contains the following information:
  • Basic information

    Each rectangle in the preceding figure represents a stage and contains information of the stage, including the stage ID, data output type, and duration or consumed memory. Memory information is displayed after you select By Memory.

  • Number of output rows

    The number on the line between two adjacent stages indicates the number of rows output from an upstream stage to a downstream stage. The larger the number of output rows, the thicker the line between stages.

  • Data output method
    The method used to transfer data between two adjacent stages from the upstream to the downstream. The following table describes the data output methods supported by AnalyticDB for MySQL.
    Data output method Description
    Broadcast The data of each compute node in an upstream stage is copied to all compute nodes of a downstream stage. 2
    Repartition The data of each compute node in an upstream stage is partitioned based on specific rules and then distributed to the specified compute nodes of a downstream stage. 3
    Gather The data of each compute node in an upstream stage is concentrated on a specific compute node in a downstream stage. 4
  • View the details of the stages that rank top 10 in memory usage or execution duration
    The right-side Top 10 Nodes in Descending Order by Duration or Memory tab displays the IDs and corresponding proportions of the top 10 stages. The top 10 stages have the largest proportion of the execution duration to the total query duration or the largest proportion of the used memory to the total query memory.
    Note
    • By default, By Duration is selected. You can also select By Memory in the upper-right corner of the execution plan hierarchy chart.
    • The stages whose memory usage is less than 1% or the stages that have an execution duration proportion of less than 1% are not displayed on the Top 10 Nodes in Descending Order by Duration or Memory tab.
    • The sum of the execution duration proportions or memory usage of all stages in a query may not be 100% due to differences in statistical methods.
  • Diagnostic Results
    Click a stage such as Stage[1] in the execution plan hierarchy chart to view the following diagnostics details of the stage in the right-side Diagnostic Results section:
    • Stage Diagnostics: provides a detailed description of the stage diagnosis results, including the diagnosed issues and the corresponding optimization solutions. These issues may be large amounts of broadcast data or data skew.
    • Operator Diagnostics: provides an overview of faulty operators in the current stage and their corresponding issues. The detailed descriptions and optimization solutions are available only in the execution plan hierarchy chart at the operator layer. For more information, see Execution plan hierarchy chart at the operator layer.

    For more information about stage diagnosis results, see Stage-level diagnostic results.

  • Statistics

    The Statistics section below the Diagnostic Results section shows the metric statistics for the stage that you want to view.

    Metric Description
    Peak Memory The maximum memory consumed by the stage. The system selects Bytes, KB, MB, GB, or TB as the unit based on the actual amount of memory consumed.
    Total Duration The cumulative execution duration consumed by all nodes and threads of all operators in the stage memory. The system selects ms, s, m, or h as the unit based on the actual duration.
    Note This cumulative duration cannot be compared with the total duration of the current query.
    Output Rows The number of rows output from the stage.
    Amount of Output Data The amount of data output from the stage. The system selects Bytes, KB, MB, GB, or TB as the unit based on the actual amount of data.
    Input Rows The number of rows input to the stage.
    Amount of Input Data The amount of data input to the stage. The system selects Bytes, KB, MB, GB, or TB as the unit based on the actual amount of data.
    Scanned Rows The number of rows scanned by the stage.
    Note This parameter is displayed only when the stage contains scan operators.
    Amount of Scanned Data The amount of data scanned by the stage. The system selects Bytes, KB, MB, GB, or TB as the unit based on the actual amount of data.
    Note This parameter is displayed only when the stage contains scan operators.

Execution plan hierarchy chart at the operator layer

An execution plan hierarchy chart consists of multiple operators at the operator layer, where data flows from bottom up. First, the most upstream operators (TableScan and RemoteSource) scan data or receive network data. Then, the data is processed by intermediate stage nodes at different layers. Finally, the root node (StageOutput or Output) at the upmost layer returns the query results to the client.

You can move the pointer over the stage that you want to view and click View Stage Plans in the information box that appears. Then, you can go to the plan details page of the stage and view the execution plan hierarchy chart at the operator layer. 3
The execution plan hierarchy chart at the operator layer contains the following information.1
  • Basic information

    Each rectangle in the preceding figure represents an operator and contains information of the operator, including the operator name, ID, and properties (such as the join conditions and algorithms of the JOIN operator), and duration or consumed memory. Memory information is displayed after you select By Memory.

  • Number of output rows

    The number on the line between two adjacent operators indicates the number of rows output from an upstream operator to a downstream operator. The larger the number of output rows, the thicker the line between operators.

  • View the details of the operators that rank top 10 in memory usage or execution duration
    The right-side Top 10 Nodes in Descending Order by Duration or Memory tab displays the IDs and corresponding proportions of the top 10 operators. The top 10 operators have the largest proportion of the execution duration to the total query duration or the largest proportion of the used memory to the total query memory.
    Note
    • By default, By Duration is selected. You can also select By Memory in the upper-right corner of the execution plan hierarchy chart.
    • The operators whose memory usage is less than 1% or the operators that have an execution duration proportion of less than 1% are not displayed on the Top 10 Nodes in Descending Order by Duration or Memory tab.
    • The sum of the execution duration proportions or memory usage of all operators in a stage may not be equal to 100% due to differences in statistical methods.
  • Diagnostic Results
    Click an operator such as Join[572] in the execution plan hierarchy chart to view the diagnostics details of the operator in the right-side Diagnostic Results section. The diagnostics details include diagnosed issues and the corresponding optimization solutions. These issues may be data skew or large right tables in joins. For more information, see Operator-level diagnosis results. 7
  • Statistics

    The Statistics section below the Diagnostic Results section shows the metric statistics for the operator that you want to view.

    Metric Description
    Peak Memory The maximum memory consumed by the operator. The system selects Bytes, KB, MB, GB, or TB as the unit based on the actual amount of memory consumed.
    Time Consumed The average duration of the operator with a specific degree of concurrency. The system selects ms, s, m, or h as the unit based on the actual duration.
    Note This duration can be compared with the duration of the current query.
    Output Rows The number of rows output from the operator.
    Amount of Output Data The amount of data output from the operator. The system selects Bytes, KB, MB, GB, or TB as the unit based on the actual amount of data.
    Input Rows The number of rows input to the operator.
    Amount of Input Data The amount of data input to the operator. The system selects Bytes, KB, MB, GB, or TB as the unit based on the actual amount of data.
    Builder Statistics The statistics for the builder, including the builder type, maximum memory, duration, number of input rows, number of output rows, and data amount. The following builder types are available:
    • HashBuiler: builds hash tables to complete hash joins.
    • SetBuilder: builds sets to complete semi joins.
    • NestLoopBuilder: completes nested-loop joins (NLJs).
    Note This metric is displayed only for the JOIN operator.
    Properties The properties of the operator. Different operators have different properties. For example, the properties of the JOIN operator include the join type and the join method.