This topic describes how to use hash joins for parallel queries in PolarDB.
The version of the Cluster cluster is PolarDB for MySQL 8.0 and the revision version is 188.8.131.52.0 or later. For more information about how to check the version, see Query the kernel version number.
Use hash joins in parallel queries
PolarDB allows you to execute only the
EXPLAIN FORMAT=TREEstatement to check whether hash joins are used.
In the following example, two tables are created and data is inserted into the tables.
CREATE TABLE t1 (c1 INT, c2 INT); CREATE TABLE t2 (c1 INT, c2 INT); INSERT INTO t1 VALUES (1,1),(2,2),(3,3),(5,5); INSERT INTO t2 VALUES (1,1),(2,2),(3,3),(5,5);
The following example shows the execution plan for these two tables when hash joins are used:
EXPLAIN FORMAT=TREE EXPLAIN -> Gather (slice: 1; workers: 4) (cost=10.82 rows=4) -> Parallel inner hash join (t2.c2 = t1.c1) (cost=0.57 rows=1) -> Parallel table scan on t2, with parallel partitions: 1 (cost=0.03 rows=1) -> Parallel hash -> Parallel table scan on t1, with parallel partitions: 1 (cost=0.16 rows=1)
In the preceding execution plan, the degree of parallelism (DOP) is set to 4. This indicates that PolarDB uses four workers to perform parallel queries. Four workers scan table t1 in parallel. Each worker uses only part of the data in table t1 to create a hash table. Then, a JOIN operation is performed between the four hash tables and table t2. A leader gathers the joined results to generate the final query results.