FAQs of elastic parallel query - PolarDB - Alibaba Cloud Documentation Center

This topic provides answers to some commonly asked questions about elastic parallel queries.

Can I use hints to enable the elastic parallel query feature for the PolarDB cluster endpoint in read and write mode?

We recommend that you enable the elastic parallel query feature on the Configure Nodes page in the PolarDB console. If you want to enable the elastic parallel query feature at the SQL statement level, you can set the query degree of parallelism to n by using /*+ PARALLEL(n) */ or /*+ SET_VAR(max_parallel_degree=n) */. The two hints are different:

When the /*+ PARALLEL(n) */ hint is used, the elastic parallel query feature is forcibly enabled regardless of whether queries are routed to the primary node or how much data is involved.
When the /*+ SET_VAR(max_parallel_degree=n) */ hint is used, whether the elastic parallel query feature is enabled depends on whether queries are routed to the primary node, and the query cost and rows of data evaluated by the optimizer.

For more information, see Parallel hints.

Are elastic parallel query parameters that I specify by using hints in a SQL statement in conflict with those that I configure in the console?

The parameters are not in conflict. The parameters specified by hints are valid for the current SQL statement and have higher priority than the global parameters configured in the console. However, we recommend that you configure elastic parallel query parameters on the Configure Nodes page in the PolarDB console. For more information, see Configure PolarProxy.

Can I use the elastic parallel query feature for clusters of low specifications such as general-purpose clusters of 4 cores and 8 GB of memory?

By default, the elastic parallel query feature is disabled for clusters that have less than 8 cores. If you determine that read-only nodes have a lot of idle computing resources, you can enable parallel queries depending on workloads. However, we recommend that you do not set the degree of parallelism high than the number of cores. In addition, you must increase the value of the max_parallel_workers parameter to release the limits on the elastic parallel query feature. We recommend that you set this parameter to twice the number of CPU cores. For more information, see DOP policies.

Do I use a partitioned table or the elastic parallel query feature if I have hundreds of millions rows of data in a single PolarDB table?

The elastic parallel query feature can coexist with partitioned tables. Partitioned tables facilitate data management. The elastic parallel query feature accelerates complex queries, and can be used on both partitioned tables and non-partitioned tables. For more information about partitioned tables and the elastic parallel query feature, see Partitioned tables and Elastic parallel query.

Can I enable the elastic parallel query feature when the PolarDB endpoint for read/write splitting is connected?

Yes.

Does PolarDB automatically allocate threads to the newly added CPU cores for parallel queries if the number of CPU cores is increased in scaling?

Yes, PolarDB automatically allocates threads to newly added CPU cores.

Do parallel queries vary with cluster types and kernel versions?

Different cluster types may have different hardware resources (such as the number of CPU cores), which may affect the degree of parallelism. Different PolarDB kernel versions affect the usage of parallel queries. The elastic parallel query feature is only supported on PolarDB for MySQL 8.0, but not on PolarDB for MySQL 5.6 or PolarDB for MySQL 5.7.

What do I do if the max_parallel_degree parameter is not found but I want to enable the elastic parallel query feature for a PolarDB cluster?

First, check whether the elastic parallel query feature is supported on the current cluster. Your PolarDB cluster must be of PolarDB for MySQL 8.0 and the revision version must meet the following requirements:

Single-node elastic parallel query: 8.0.1.0.5 or later, or 8.0.2.1.4.1 or later.
Multi-node elastic parallel query: 8.0.2.2.6 or later.

To enable the elastic parallel query feature, we recommend that you perform the following operations: Click Modify in the Cluster Endpoints section of the Overview page. In the Configure Nodes dialog box, set the degree of parallelism and parallel query engine. For more information, see Configure PolarProxy.

What do I do if a large amount of data is involved in the GROUP BY or ORDER BY operation on a PolarDB cluster and a very large temporary table is created?

The root cause for this issue is that large and dense data is generated in the temporary table in queries. To solve this issue, you can enable the multi-node elastic parallel query feature. For more information, see Overview.

What are appropriate degree of parallelism values for different cluster specifications when I enable the elastic parallel query feature?

We recommend that you set the initial value of the degree of parallelism to 1/4 of the number of CPU cores in the cluster. For example, if the cluster has 16 cores, we recommend that you set the initial value of the degree of parallelism to 4. If the workloads of the cluster are low after the elastic parallel query feature is enabled, you can increase the degree of parallelism.

For PolarDB for MySQL 8.0.1, only the single-node elastic parallel query feature is supported. We recommend that you do not set the degree of parallelism to more than 1/2 of the number of CPU cores.
For PolarDB for MySQL 8.0.2, the multi-node elastic parallel query feature is also supported. If multiple read-only nodes are used, you can increase the degree of parallelism of a single node. For example, if your cluster has two read-only nodes which each use eight CPU cores, you can set the degree of parallelism to 4. In this case, eight threads are started to execute a single query on the two nodes to accelerate the query.

How do I adjust the degree of parallelism based on workloads after the elastic parallel query feature is enabled on PolarDB?

For example, if a node in a cluster has 16 CPU cores, you can set the initial value of the degree of parallelism to 4. After the elastic parallel query feature is enabled, observe workload changes of the cluster in several complete business cycles. If the cluster has idle resources, adjust the degree of parallelism to 6, and then observe workload changes of the cluster again. You can adjust the degree of parallelism gradually in this way.

Note

You can check the CPU utilization and memory usage to determine whether the cluster has idle resources. If the CPU utilization is not higher than 50% and the memory usage is not higher than 80%, you can determine that the cluster has idle resources. On a PolarDB for MySQL 8.0.2 cluster, a single query can be distributed to multiple nodes to further improve query efficiency. Therefore, you can increase the degree of parallelism of a single node.

Does a large value of the degree of parallelism have negative impacts?

The elastic parallel query feature is used to concurrently use multiple CPU cores to accelerate queries. The elastic parallel query feature occupies additional computing resources. A large value of the degree of parallelism may cause high cluster workloads. We recommend that you set the value of the degree of parallelism to 1/4 of the number of CPU cores in the cluster. Do not set a degree of parallelism higher than the number of CPU cores. If a PolarDB for MySQL 8.0.2 cluster has multiple read-only nodes, you can enable the multi-node elastic parallel query feature and increase the degree of parallelism value.

Are all queries accelerated after I enable the elastic parallel query feature?

Not all queries are executed in parallel. In the following cases, queries are not executed in parallel:

Parallel queries are not supported. For more information, see Limits on parallel queries.
The rows of scanned data or costs fall short of the threshold for parallel queries. The records_threshold_for_parallelism and cost_threshold_for_parallelism parameters define the record and cost thresholds for parallel queries.
If the workloads of the cluster are too high, queries are no longer executed in parallel. For more information, see DOP control based on system resource usage.

What are the differences between the REPLICA_AUTO and TYPICAL values of the parallel_degree_policy parameter?

REPLICA_AUTO indicates that the elastic parallel query feature is disabled on the primary node and only enabled on read-only nodes. Whether queries are executed in parallel depends on real-time workloads. TYPICAL indicates that the elastic parallel query feature is enabled on the primary node and that real-time workloads are not considered.

Note

If parallel_degree_policy is set to TYPICAL, PolarDB ignores the resource usage of databases (such as CPU utilization) and sets the degree of parallelism (DOP) to the value of the max_parallel_degree parameter.

How do I determine whether a query is executed in parallel after I enable the elastic parallel query feature?

You can execute the EXPLAIN statement to check whether a query is executed in parallel. If a query is executed in parallel, the Parallel Scan (4 workers) information is included in the Extra field. For more information, see View elastic parallel query execution plans.

Note

If the cluster endpoint is in read and write mode, the EXPLAIN statement is routed to the primary node because the elastic parallel query feature is disabled on the primary node by default. The results of the EXPLAIN may be inaccurate. To obtain accurate results, you must add hints to the EXPLAIN statement, such as /* FORCE_SLAVE */ explain select count(*) from t1.

Why is the query performance not improved significantly or not improved at all while the EXPLAIN statement indicates that a SQL query is executed in parallel?

A parallel query is not accelerated much due to the following reasons:

You can view the execution plan by executing the explain /*+ FORCE_SLAVE() */ SELECT ... statement to determine whether the execution plan is complete and whether the query is executed in parallel. If the Parallel Scan information is displayed in the EXPLAIN result, it is a parallel query.
In a parallel query, the large query task is split into multiple small query tasks for concurrent execution. If only a small amount of data is involved in the query, the query task is only split into a few subtasks. For example, if the degree of parallelism is set to 4 but the data volume involved is low, only two subtasks are split at most. In this case, the acceleration result depends on the maximum number of subtasks. For a PolarDB for MySQL 8.0.1 cluster, the number of workers following the Parallel Scan keywords indicates the number of parallel threads. For a PolarDB for MySQL 8.0.2 cluster, the number following the with parallel partitions keywords is the number of generated shards. If the number of shards is less than the max_parallel_degree value, some workers are idle and the query performance is not improved as expected.
One leader thread and multiple worker threads are involved in the execution of a parallel query. The worker threads implement parallel computing, and the leader thread aggregates and returns the results. Not all operators in a query statement are pushed down to the worker threads for execution. Some operators are executed on the leader thread. The leader thread takes more time to execute operators due to the single-thread bottleneck. This affects the acceleration result.
The workloads of the cluster are unstable. When you execute the EXPLAIN statement, the workloads are low and you determine that parallel queries can be used. If the workloads become high when a query is executed, sequential execution is actually used. In this case, you can use performance monitoring to check whether workloads jitter.

Why are the results of parallel queries unstable when PolarDB uses offset or limit queries and sorting conditions are not specified?

If no sorting conditions are specified, the query results are expected to be unstable in terms of query semantics. If the following case occurs: The query results are stable before the elastic parallel query feature is enabled, but the query results become unstable after the elastic parallel query feature is enabled. PolarDB does not ensure that the query results are stable. The query results are stable only if queries always hit the same index. Otherwise, PolarDB cannot guarantee stable query results.

What happens if the elastic parallel query feature is disabled on the primary node?

The elastic parallel query feature consumes system resources. Therefore, enabling the elastic parallel query feature increases workloads on the primary node. By default, the elastic parallel query feature is not enabled on the primary node. If both the data to be written and the required write throughput are low, you can also enable the elastic parallel query feature on the primary node, specify the degree of parallelism, and set the parallel_degree_policy parameter to AUTO or TYPICAL.

Why is the query performance degraded when I enable the elastic parallel query feature and set the innodb_adaptive_hash_index parameter to on?

Parallel queries cause more threads to frequently query the same B-tree. The built hash indexes may fail from time to time and cause contention issues. These may deteriorate the query performance. For more information, see Overview.