Understanding the factors that affect query performance - AnalyticDB

Query performance in AnalyticDB for MySQL depends on six factors: cluster specifications, node count, data distribution, data size, query concurrency, and query complexity. Knowing how each factor works helps you choose the right cluster size, design data schemas that support parallel execution, and identify the root cause of slow queries.

How AnalyticDB for MySQL executes queries

AnalyticDB for MySQL uses a distributed data processing architecture. When a query arrives, the engine breaks it into stages and executes those stages in parallel across multiple nodes. Each node processes its share of the data in memory. Anything that disrupts even distribution of work or saturates a shared resource degrades overall performance.

Cluster specifications

Each cluster specification defines a fixed combination of CPU cores, memory size, and storage media. These attributes determine the per-node processing ceiling for different types of work:

JOIN-heavy or aggregation-heavy queries are CPU- and memory-bound. A specification with more CPU cores and larger memory reduces processing time for these workloads.
Data scan–intensive or simple aggregation queries are disk I/O–bound. A specification with faster storage media improves throughput for these workloads.

Select a specification that matches your dominant query pattern. For available options, see Specifications.

Number of nodes

More nodes provide more parallel processing capacity. Because AnalyticDB for MySQL splits each query into stages that run simultaneously on different nodes, adding nodes increases the total work the cluster can complete per unit of time.

Size your cluster based on query volume and latency requirements. For guidance, see Create a cluster.

Data distribution

Parallel execution is only as effective as data distribution allows. When data is evenly spread across storage nodes, all nodes finish their subtasks at roughly the same time. When data is skewed toward a subset of nodes, those nodes become bottlenecks—their subtasks take significantly longer, stalling the entire query.

Ensure that your data is distributed as evenly as possible across storage nodes to maximize parallel efficiency.

Data size

AnalyticDB for MySQL processes query data in memory rather than writing intermediate results to disk. When the amount of data involved in a query is large, the query holds memory and CPU resources for an extended period, reducing the cluster's capacity to handle other queries at the same time.

Large tables also create pressure on disk I/O: index filtering and reading detailed rows generate competition among concurrent queries for storage bandwidth, slowing down all affected queries.

Query concurrency

The cluster can handle only as many simultaneous queries as its node resources allow. When the number of concurrent queries exceeds that limit, excess queries queue in the backend and wait for resources to free up. The more congested the queue, the longer individual query latency grows.

The concurrency limit is determined by both cluster specifications and cluster size.

Query complexity

Complex queries place disproportionate demands on cluster resources:

Broad filter conditions: If a filter cannot eliminate most rows early, the engine reads large amounts of raw data from storage nodes before it can apply the condition.
Multiple JOINs: Each JOIN may require shuffling data between nodes over the network. Chains of JOINs multiply the number of transfers, increasing the risk of network congestion.
Large number of GROUP BY columns: Using a large number of columns in a GROUP BY clause holds large amounts of memory for aggregation state, which competes with memory needed by other query stages.