Use Bucket Shuffle Join to optimize queries - ApsaraDB for SelectDB

This topic describes how to use Bucket Shuffle Join to optimize queries in ApsaraDB for SelectDB. This reduces the time consumed to transmit data between nodes and the memory overhead of Join queries, and improves the query performance.

Overview

Bucket Shuffle Join provides local optimization for specific Join queries to reduce the time of data transmission between nodes and accelerate the queries. For information about the design, implementation, and effects of Bucket Shuffle Join, see ISSUE 4394.

Terms

Left table: the left table in a Join query. You can perform the probe operation to change the order of the table by using Join Reorder.
Right table: the right table in a Join query. You can perform the build operation to change the order of the table by using Join Reorder.

How it works

The common distributed Join execution methods supported by ApsaraDB for SelectDB include Shuffle Join and Broadcast Join. These two Join execution methods result in high network overheads.

For example, a Join query is performed between Table A and Table B and the hash join algorithm is used. The query overhead varies based on the Join execution method. The following section describes the types of Join execution methods:

Broadcast Join: If data is distributed to three HashJoinNodes in Table A, you must send the full data of Table B to the three HashJoinNodes. In this case, the network and the memory overheads of the query are three times the data amount of Table B.
Shuffle Join: Shuffle Join hashes data in Table A and Table B, and distributes the data among the cluster nodes. In this case, the network overhead of the query is the sum of the data amounts of Table A and Table B, and the memory overhead of the query is the data amount of Table B.

The frontend (FE) node stores the data distribution information of each ApsaraDB for SelectDB table. If the Join statement hits the data distribution column of a table, Bucket Shuffle Join reduces the network and memory overheads generated by the Join statement based on the data distribution information.

The preceding figure shows how Bucket Shuffle Join works. A Join query is performed between Table A and Table B, and the equivalent expression of the Join query hits the data distribution column of Table A. In this case, Bucket Shuffle Join sends the data of Table B to the storage nodes of Table A based on the data distribution information of Table A. Bucket Shuffle Join generates the following overheads:

Network overhead: the data amount of Table B. This overhead is less than the network overhead generated by common Join execution methods. The network overhead generated by Broadcast Join is three times the data amount of Table B, and the network overhead generated by Shuffle Join is the sum of the data amounts of Table A and Table B.
Memory overhead: the data amount of Table B. This overhead is less than the memory overhead generated by common Join execution methods. The memory overhead generated by Broadcast Join is three times the data amount of Table B, and the memory overhead generated by Shuffle Join is the sum of the data amounts of Table A and Table B.

Compared with Broadcast Join and Shuffle Join, Bucket Shuffle Join improves the query performance and reduces the time consumed to transmit data between nodes and the memory overhead of queries. Compared with the original Join execution methods of ApsaraDB for SelectDB, Bucket Shuffle Join has the following advantages:

Bucket Shuffle Join reduces the network and memory overheads of Join queries to improve the performance of the queries, especially when the FE node can perform partition pruning and bucket pruning on the left table.
Bucket Shuffle Join is different from Colocate Join. If Bucket Shuffle Join is used, you do not need to know the data distribution information of tables or perform the corresponding modifications. Bucket Shuffle Join does not have mandatory requirements for data distribution of tables. This prevents data skew issues.
Bucket Shuffle Join provides more optimization directions for Join Reorder.

Usage

Specify the session variable

Set the enable_bucket_shuffle_join session variable to true. The FE node automatically plans queries whose Join execution method can be converted to Bucket Shuffle Join.

set enable_bucket_shuffle_join = true;

When the FE node plans distributed queries, Join execution methods are selected based on the following priority: Colocate Join > Bucket Shuffle Join > Broadcast Join > Shuffle Join. However, if you use a hint to specify a Join execution method, the preceding priority does not take effect and the specified Join execution method is used. Example:

SELECT * FROM test JOIN [shuffle] baseall ON test.k1 = baseall.k1;

View the Join execution method

You can run the explain command to check whether a Join query uses Bucket Shuffle Join:

|   2:HASH JOIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|   |  join op: INNER JOIN (BUCKET_SHUFFLE)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|   |  hash predicates:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|   |  colocate: false, reason: table not in the same group                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|   |  equal join conjunct: `test`.`k1` = `baseall`.`k1`

BUCKET_SHUFFLE indicates that the Join execution method is Bucket Shuffle Join.

Planning rules of Bucket Shuffle Join

In most cases, to improve the query performance by using Bucket Shuffle Join, you need to only set the session variable to true without the need to perform modifications based on the data distribution information. However, if you are familiar with the planning rules of Bucket Shuffle Join, you can write SQL statements in a more efficient manner.

Bucket Shuffle Join takes effect only if an equivalent Join condition is used. This is because Bucket Shuffle Join and Colocation Join determine the data distribution based on hash calculations.
The equivalent Join condition contains the bucket columns of two tables. If the bucket column of the left table is in the equivalent Join condition, Bucket Shuffle Join may be used for the query.
The calculated hash value varies based on the data type. Therefore, the data type of the bucket column in the left table must be the same as the data type of the bucket column in the right table. If the bucket columns of two tables in the equivalent Join condition are of different data types, planning cannot be performed.
Bucket Shuffle Join only applies to online analytical processing (OLAP) tables that are native to ApsaraDB for SelectDB. If an external table such as an Open Database Connectivity (ODBC), MySQL, or Elasticsearch table is used as the left table, planning cannot be performed.
For a partitioned table, the data distribution rules for each partition may vary. Bucket Shuffle Join takes effect when the left table contains only one partition. Therefore, you must use the where condition when you execute SQL statements to ensure that the partition pruning policy can take effect.
If the left table is a Colocate table, the data distribution rules for each partition are definite. In this case, Bucket Shuffle Join performs better on Colocate tables.