All Products
Search
Document Center

AnalyticDB:Use stage and task details to analyze queries

Last Updated:Mar 28, 2026

When a complex SQL query runs slowly in AnalyticDB for MySQL, the bottleneck is often a single stage that blocks downstream stages or a task that receives far more data than its peers. The Stages & Tasks tab breaks down every query by stage and task so you can pinpoint the bottleneck, detect data skew, and identify whether time is lost to computation, queuing, or network latency.

Important

Task details are available only for queries that take more than 1 second to complete.

How it works

AnalyticDB for MySQL divides each query into stages that run on worker and executor nodes. Stages that have no dependencies can run in parallel; stages that depend on the output of earlier stages must run in series. Serial dependencies between stages are the most common reason complex SQL queries are slow.

Within a stage, multiple tasks run concurrently across nodes. Each task processes a partition of the stage's input data.

For each stage, AnalyticDB for MySQL tracks:

  • Input and output volume (rows and bytes)

  • Peak memory and cumulative CPU time across all operators

  • Execution status

For each task within a stage, AnalyticDB for MySQL tracks additional metrics that help you identify data skew, long tail issues, and where time is spent—computation versus queuing or network latency.

View stage and task details in the console

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters. On the Clusters page, click an edition tab, find the cluster, and click the cluster ID.

  2. In the left-side navigation pane, click Diagnostics and Optimization.

  3. On the SQL Queries tab, find the query to diagnose and click Diagnose in the Actions column.

  4. Click the Stages & Tasks tab to see stage-level details. For a description of each column, see Stage parametersimage

    image

  5. Click a stage ID to drill down into the tasks for that stage. For a description of each column, see Task parameters.

Diagnose common problems

Use the table below to match symptoms to the relevant parameters and next steps.

SymptomWhere to lookNext step
Query is slow overallStage list: sort by Cumulative durationClick into the top stage to inspect its tasks
One task in a stage is much slower than others (long tail)Task list: compare Interval between start and end time across tasksCheck Execution node—if the same node is slow across multiple queries, investigate it for resource contention
Tasks in a stage receive unequal amounts of data (data skew)Task list: sort by Amount of input dataTrace back to the upstream stage and review GROUP or JOIN distribution fields
A stage reads much more data from the source table than expectedTask list: sort by Amount of read table dataCheck distribution field settings in the console. See Data modeling diagnostics
A stage uses high memory or queries fail with memory errorsTask list: compare Peak memory across tasksUnequal peak memory indicates data skew; check input data distribution
Most of a task's elapsed time is not spent computingTask Computing time ratio is lowLook for slow operators in the execution plan
A task spends most of its elapsed time computingTask Computing time ratio is highFocus on queuing delays and network latency as secondary factors

Stage parameters

Sort by Cumulative duration to identify the stages that consume the most CPU time. Click into those stages to inspect individual tasks.

ParameterDescription
Stage IDThe unique identifier of the stage. Corresponds to the stage ID in the execution plan tree.
StatusThe execution status of the stage. Valid values: Completed (all tasks finished), Failed (one or more tasks failed), Running (tasks are still running), Canceled (one or more tasks were canceled).
Rows of input dataThe number of rows input to the stage.
Amount of input dataThe amount of data input to the stage.
Rows of output dataThe number of rows output from the stage.
Amount of output dataThe amount of data output from the stage.
Peak memoryThe peak memory usage of the stage.
Cumulative durationThe total time consumed to execute all operators in the stage. Use this to identify stages with high CPU cost. When comparing with query duration, account for concurrent execution—a stage with high cumulative duration may still complete quickly if its tasks run in parallel.

Task parameters

Use the task list to diagnose data skew, long tail issues, and compute efficiency.

ParameterDescription
Task IDThe unique identifier of the task, in the format {stage_id}.{task_id}. For example, 2.24 means task 24 in stage 2.
StatusThe execution status of the task. Valid values: Completed, Failed, Terminated, Canceled.
Amount of input dataThe number of rows and the amount of data input to the task. Sort this column across all tasks to check for data skew on the stage's input. If one task receives significantly more data than others, the skew is likely caused by inappropriate GROUP or JOIN distribution fields. Trace back to the upstream stage to investigate.
Amount of output dataThe number of rows and the amount of data output from the task. To understand why output volume is high, check whether combined fields exist in the GROUP or JOIN clause—for example, a.id = b.id joins on one field, while a.id = b.id AND a.age = b.age joins on two. Inspect the Aggregation or Join node in the operator plan tree for the current stage.
Peak memoryThe peak memory usage of the task. Peak memory is proportional to input data volume. If queries fail, compare peak memory across tasks to detect unbalanced input data distribution as the root cause.
Duration of reading table dataThe cumulative time all TableScan operators in the stage spent reading table data. This is a cross-node, cross-thread cumulative value and cannot be directly compared to query duration. Compare it with Cumulative duration to determine what fraction of a stage's compute time is spent on data scans versus other processing.
Amount of read table dataThe number of rows and the amount of data read by all TableScan operators in the stage. Sort this column to detect data skew on source table reads. If skew exists, check whether it is caused by distribution field settings in the console. For more information, see Data modeling diagnostics.
Created atThe time when the task was created.
Queuing durationThe time the task waited in the queue before execution started.
Ended atThe time when the task ended.
Interval between start and end timeThe elapsed time between task creation and task completion. Compare this with overall query duration to determine how much this stage contributes to total query time. For example, if query duration is 6 seconds and a task's interval is 4 seconds, that stage is the primary driver of latency. For a worked example, see Calculate task duration and concurrency.
Cumulative durationThe total time consumed by all threads across all tasks in the stage. For a worked example, see Calculate task duration and concurrency.
Computing time ratioThe ratio of actual compute time to the task lifecycle. Calculated as: (Cumulative duration / Subtask concurrency) / Interval between start and end time. A low ratio means the interval is long relative to compute time—look for slow operators in the execution plan. A high ratio means most time is spent on actual computation—focus on queuing delays and network latency as secondary factors. For a worked example, see Calculate task duration and concurrency.
Subtask concurrencyThe number of threads concurrently executing the task on a node.
Execution nodeThe IP address of the node running the task. If long tail issues (where some tasks take significantly longer than others) repeatedly occur on the same node across multiple queries, investigate that node for resource contention or hardware issues.

Calculate task duration and concurrency

This example walks through computing the interval between start and end time, cumulative duration, computing time ratio, and subtask concurrency for Task 2.1.

Task 2.1 belongs to Stage 2, which consists of four operators: StageOutput, Join, TableScan, and RemoteSource. The following figure shows the operator tree for Stage 2.

image

These operators execute concurrently across multiple nodes, in the direction of the arrows. On the node at IP address 192.168.12.23, Task 2.1 runs with four concurrent threads. Each thread takes 5s, 5s, 6s, and 6s respectively to compute data, as shown below.

image

From these values:

  • Cumulative duration: 5s + 5s + 6s + 6s = 22s

  • Interval between start and end time: 10s

  • Computing time ratio: (22s / 4) / 10s = 0.55

The ratio of 0.55 means about 55% of the task's elapsed time was spent on computation. The remaining 45% reflects queuing, network latency, or synchronization overhead.

API reference

Use the following API operations to retrieve stage and task details programmatically.

OperationDescription
DescribeDiagnosisSQLInfoRetrieves execution details for an SQL statement.
DescribeDiagnosisTasksRetrieves execution details for distributed tasks in a specific stage of a query.