Optimize join queries in ApsaraDB for SelectDB - ApsaraDB for SelectDB

This topic describes how to optimize join queries in ApsaraDB for SelectDB and provides suggestions to improve the query speed.

Supported physical operators

ApsaraDB for SelectDB supports the following two physical join operators for join queries in a standalone engine.

Hash Join: You can create a hash table in the right table based on the column that exists in both the left and right tables. Then, ApsaraDB for SelectDB uses the hash table to perform join calculations in the left table in a streaming manner. This applies only to join queries that use equivalent conditions.
Nest Loop Join: ApsaraDB for SelectDB performs join calculations in two for loops. This applies to join queries that use non-equivalent conditions. For example, you can set conditions with operators such as ">" or "<" to query data, or obtain the cartesian product. This operator is a common join operator. However, it has poor performance.

Shuffle methods

ApsaraDB for SelectDB is a distributed massively parallel processor (MPP) database. When you perform join queries on ApsaraDB for SelectDB tables, you must shuffle data, and then join the data by using physical operators. ApsaraDB for SelectDB supports the following four shuffle methods. This section also provides examples on how to shuffle data by using each shuffle method.

In this example, Table S and Table R are joined. N indicates the number of nodes that are involved in the join calculation. T indicates the number of data records in a table.

Broadcast Join
This shuffle method requires that the full data of Table S is sent to Table R. Each node involved in the join calculation has the full data of Table S, which is T(R). This is a common shuffle method and supports the Hash Join and Nest Loop Join operators. The network overhead of this shuffle method is calculated by using the following formula: N × T(R).
The data of Table S is not moved, and the data of Table R is sent to the scan node that scans the data of Table S.
Shuffle Join
When you use the Hash Join operator to perform a join query, you can calculate the hash values of data in Table S and Table R based on the Join column and send the same hash values to the same node of a distributed system. Then, you can use the distributed system to improve the speed of the join query. The network overhead of this shuffle method is calculated based on the following formula: T(S) + T(R). However, this shuffle method supports only the Hash Join operator because the hash values of data are calculated based on the join column.
The data of Table S and Table R is calculated based on partitions. The results are sent to nodes of different partitions.
Bucket Shuffle Join
The data of ApsaraDB for SelectDB tables is stored in buckets based on the hash values of the data. This way, you can shuffle the data that you want to join based on the bucket columns in tables. For example, two tables (Table S and Table R) are to be joined, and the join column is the bucket column of Table S. In this case, the data of Table S does not need to be moved, and the join calculation can be performed by moving the data of Table R.
The network overhead of this shuffle method is T(R), which indicates that the join calculation can be performed by shuffling only the data of Table R. For more information about how to use the Bucket Shuffle Join method, see Bucket Shuffle Join.
The data of Table S is not moved, and the data of Table R is sent to the node that scans the data of Table S based on the results that are calculated based on partitions.
Colocation Join
For multiple tables that are associated with each other, the number of shards must be the same for each table when the tables are created. This way, you can skip data shuffle in queries, and directly perform join calculations. This improves the query performance. For more information about Colocation Join, see Colocation Join.
The data is partitioned in advance. You do not need to consider the network overhead. You can directly perform join calculations on your on-premises computer.

The following table describes the four shuffle methods.

Shuffle method	Network overhead	Physical operator	Scenario
BroadCast	N × T(R)	Hash Join or Nest Loop Join	Common scenarios
Shuffle	T(S) + T(R)	Hash Join	Common scenarios
Bucket Shuffle	T(R)	Hash Join	Distributed columns of the left table exist in the join condition, and only the data in a single partition of the left table is used for a join query.
Colocation	0	Hash Join	Distributed columns of the left table exist in the join condition, and the left and right tables belong to the same Colocate Group.

The preceding shuffle methods are sorted in descending order by flexibility. If a shuffle method has higher requirements for data distribution, the method provides higher performance for join calculation.

Runtime Filter

A Runtime Filter is dynamically generated during an execution plan. During a join query, HashJoinNode converts the right table into a filter condition and pushes the filter condition down to ScanNode. Then, ScanNode filters and prunes data when the left table is scanned. This method greatly reduces the amount of time required for data reading and computing during queries, and improves the query performance. For more information about Runtime Filter, see Runtime Filter.

Join Reorder

In multi-table join scenarios, the order of joins greatly affects the performance of the entire query.

For example, three tables are joined, as shown in the following figure. ScanA, ScanB, and ScanC indicate the data of Table A, Table B, and Table C that is scanned based on query conditions.

In the left part of the figure, the scanned data of Table A and Table B is joined and 2,000 rows of intermediate results are generated. Then, the intermediate results are joined with the scanned data of Table C.

In the right part of the figure, the order of joins is adjusted. The scanned data of Table A and Table C is joined, and 100 rows of intermediate results are generated. Then, the intermediate results are joined with the scanned data of Table B. The final join result is the same, but the generated intermediate results in the left figure are 20 times the results in the right figure. This causes a large performance gap.

ApsaraDB for SelectDB supports the rule-based Join Recorder algorithm. The following items describe the logic of this algorithm:

Join large tables with small tables to generate a smaller amount of intermediate results.
Put the tables that have join conditions forward before the From in the SELECT statement to filter data of tables that have join conditions.
The Hash Join operator has a higher priority than the Nest Loop Join operator. This is because the execution speed of the Hash Join operator is significantly faster than that of the Nest Loop Join operator.

Solutions to join query optimization

The following section describes the steps involved in the process of optimizing join queries. To optimize join queries, perform the following steps:

Use the profile provided by ApsaraDB for SelectDB to troubleshoot a query. The profile records various information in the entire query, which is very important for performance optimization.
Understand the join mechanism of ApsaraDB for SelectDB and analyze the reasons for a slow query.
Use session variables to modify some behaviors of join operations to optimize join operations.
Check the query plan to determine whether the optimization takes effect.

The preceding four steps describe the standard optimization process of join operations. If the query speed is not improved after you perform the preceding four steps, you may need to rewrite the SQL statements for data join, or adjust data distribution and recheck whether data is properly distributed. In this case, you need to manually modify all join statements that are used in a query. However, this method requires relatively high costs, and is used for further analysis if the preceding method does not improve the query speed.

Suggestions on optimizing join queries

The following section provides suggestions on how to optimize join queries in ApsaraDB for SelectDB:

When you perform a join query, select columns of the same type to reduce data cast, or select columns of simple types to accelerate the join calculation.
Select the key column to join data. For more information, see Runtime Filter. Late materialization can be effectively implemented by joining data of key columns.
Perform joins among large tables in Colocation mode. This is because joining large tables causes a large amount of network overhead, which increases the costs of data shuffle.
Properly use Runtime Filter. Runtime Filter is suitable for scenarios in which a high filter rate is required in joins. However, it also has some side effects. You must determine whether to use Runtime Filter based on the columns that are involved in SQL statements.
If multiple tables are joined, you must determine the rationality of the join. Make sure that the left table is a large table and the right table is a small table. In this case, the Hash Join operator performs better than the Nest Loop Join operator. You can rewrite SQL statements and use the hint method to adjust the order of joins.