This topic describes how to use hash joins in parallel queries in PolarDB.

Prerequisites

The version of the Cluster Edition cluster is PolarDB for MySQL 8.0 and the revision version is 8.0.2.1.0 or later. For more information about how to check the version, see Query the kernel version.

Use hash joins in parallel queries

  • Statement

    PolarDB allows you to execute only the EXPLAIN FORMAT=TREE statement to check whether hash joins are used.

  • Examples

    In the following example, two tables are created and data is inserted into the tables.

    CREATE TABLE t1 (c1 INT, c2 INT);
    CREATE TABLE t2 (c1 INT, c2 INT);
    INSERT INTO t1 VALUES (1,1),(2,2),(3,3),(5,5);
    INSERT INTO t2 VALUES (1,1),(2,2),(3,3),(5,5);

    The following example shows the execution plan for these two tables when hash joins are used:

    EXPLAIN FORMAT=TREE
    
     EXPLAIN
     | -> Gather: 4 worker(s)  (cost=10.60 rows=4)
        -> Inner hash join (t2.c1 = t1.c1)  (cost=0.81 rows=1)
            -> Table scan on t2  (cost=0.09 rows=4)
            -> Hash
                -> Table scan on t1 with 0 parallel partitions  (cost=0.16 rows=1)

    In the preceding execution plan, the degree of parallelism (DOP) of 4 is used. This indicates that PolarDB uses four workers to perform parallel queries. Four workers scan table t1 in parallel. Each worker uses only part of the data in table t1 to create a hash table. Then, a JOIN operation is performed between the four hash tables and table t2. A leader gathers the joined results to generate the final query results.