This topic describes how to use hash joins in parallel queries in PolarDB.
The version of the Cluster Edition cluster is ApsaraDB PolarDB MySQL-compatible edition 8.0 and the revision version is 220.127.116.11.0 or later. For more information about how to check the version, see Query the kernel version number.
Use hash joins in parallel queries
PolarDB allows you to execute only the
EXPLAIN FORMAT=TREEstatement to check whether hash joins are used.
In the following example, two tables are created and data is inserted into the tables.
CREATE TABLE t1 (c1 INT, c2 INT); CREATE TABLE t2 (c1 INT, c2 INT); INSERT INTO t1 VALUES (1,1),(2,2),(3,3),(5,5); INSERT INTO t2 VALUES (1,1),(2,2),(3,3),(5,5);
The following example shows the execution plan for these two tables when hash joins are used:
EXPLAIN FORMAT=TREE EXPLAIN | -> Gather: 4 worker(s) (cost=10.60 rows=4) -> Inner hash join (t2.c1 = t1.c1) (cost=0.81 rows=1) -> Table scan on t2 (cost=0.09 rows=4) -> Hash -> Table scan on t1 with 0 parallel partitions (cost=0.16 rows=1)
In the preceding execution plan, the degree of parallelism (DOP) of 4 is used. This indicates that PolarDB uses four workers to perform parallel queries. Four workers scan table t1 in parallel. Each worker uses only part of the data in table t1 to create a hash table. Then, a JOIN operation is performed between the four hash tables and table t2. A leader gathers the joined results to generate the final query results.