How to Implement Join Operation on Distributed Databases

By Qimu

For the most part, JOIN operations are applied to relational databases through Nested-Loop Join, Hash Join, and Sort Merge Join. In online transaction processing (OLTP) scenarios, the most commonly used approach is Index Nested-Loop Join based on the index point queries, thus enabling Join to return result in a very short period of time. This is also how most developers feel about Join.

As a distributed database, PolarDB-X not only is syntactically compatible with MySQL, but also aims to maintain a consistent experience with standalone databases. Under distributed scenarios, both two tables of Join can be distributed. Therefore, the corresponding data must be obtained through multiple network requests. The question is, how can we achieve this efficiently?

Join Implementation in MySQL

Let's first see how Join is done on a standalone database. MySQL supports only a few Join algorithms:

Nested-Loop Join (NL Join)
Batched Key Access Join (BKA Join)
Block Nested-Loop Join (Version < 8.0.20)
Hash Join (Version > = 8.0.18)

If join key columns in any of the tables on both sides of the Join operation have indexes, MySQL usually uses BKA Join or NL Join based on indexes, which are applicable in most cases. If there are no indexes available on both sides of the Join, MySQL has to choose Block Nested-Loop Join or Hash Join as the next priority depending on MySQL versions. Today, we mainly focus on NL Join and BKA Join.

Nested-Loop Join is the simplest form of the Join operation and can be seen as a two-level For loop. For each row in the foreign tables or inner tables with cycle check (both known as the driving table), they are output as Join results if the Join conditions are met. If the join key has an index available in the inner table, then its loop can be greatly simplified by querying the index to get the rows that can be joined, without traversing the entire table. This indexed NL Join is also called Index Nested-Loop Join.

# Nested-Loop Join
for outer_row in outer_table:
   for inner_row in inner_table:
      if join_condition is True:
         output (outer_row, inner_row)

# Index Nested-Loop Join
for outer_row in outer_table:
   for inner_row in inner_index.lookup(outer_join_key):
      if join_condition is True:
         output (outer_row, inner_row)

In the following example, the orders table is joined with the customer table by the primary key c_custkey. Then, MySQL uses the Index NL Join to complete the Join operation.

/* Query 1 */
SELECT o_orderkey, o_custkey, c_name
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderkey BETWEEN 1001 AND 1005

BKA Join is considered as a performance-optimized Index Nested-Loop Join. The reason why it is called Batched is that its implementation uses the Multi-Range Read (MRR) interface provided by the storage engine to perform index queries in batches. Moreover, it converts random indexed retrieval to sequential retrieval queries by primary key sorting, accelerating the disk I/O of index query to a certain extent.

In the following example, the join key hits a secondary index, and columns of the SELECT statement include columns that are not included in the secondary index. Therefore, indexing and retrieval need to be performed to obtain complete Join results.

/* Query 2 */
SELECT c_name, c_custkey, o_orderkey, o_totalprice
FROM customer JOIN orders ON c_cutkey = o_custkey
WHERE c_custkey BETWEEN 13 AND 15

Generally, the data volume driven by Join in OLTP queries is not large, and Join often has matching indexes. If this is the case, the costs of NL Join and BKA Join are linearly related to the data volume on the driving side, thus results can be quickly computed.

Lookup Join in PolarDB-X

The architecture of PolarDB-X is quite different from that of MySQL, which can be divided into the SQL layer and the storage layer. The compute node in the SQL layer needs to compute the shard where the data is stored, and then pull the required data from multiple data nodes.

For Join queries, if the join key and the shard key are the same, they can be pushed down to the DN layer for execution. Otherwise, the Join operation will be performed on the CN node. PolarDB-X supports a variety of Join algorithms, including Lookup Join, Nested-Loop Join, Hash Join, Sort-Merge Join, and other execution methods. The most commonly used one in OLTP queries is the Lookup Join, which is similar to BKA Join in MySQL. This article mainly introduces Lookup Join. Other Joins such as Hash Join and Nested-Loop Join will be described in future articles.

In addition to the functional requirements of Join, the following two performance requirements should be also considered in the design of the Lookup Join statement of PolarDB-X:

Batch. In a distributed database, each CN-to-DN query goes through a network remote procedure call (RPC), with a latency extremely greater than the local call of MySQL. Therefore, batch processing is more important.
Concurrency. As the data may be distributed on multiple DN nodes, it will introduce a lot of unnecessary waiting if it is traversed in sequence. The best practice is to query all DN nodes concurrently. By doing so, each batch of data only needs one network round-trip.

The Lookup Join execution process is as follows without indexed retrieval

Pull a batch of data from the driver. Generally, the data volume is small. If the data volume is large, the size of each batch is limited by the number of shards at the lookup side and whether sharding is available. The batch size directly affects query performance. If the batch size is very small, the number of RPCs increases. However, if the batch size is too large, the amount of data temporarily stored in the memory expands, and an out of memory (OOM) error may occur at high concurrency. By default, 50 values on average and 300 values at most are queried for each shard.
Compute the shard corresponding to each row of data in a batch. The lookup side is a shard table. Data in each row of the driving table needs to be stored in different shards. Only shards that contain data need to participate in the Join operation. If no values are routed to a shard, the shard does not need to perform lookup join.
Concurrently request for all shards that require lookup join, and create a hash table with the join key as the key and cache the table in the memory.
Similar to Hash Join, the Hash table is used to find the row to Join for each row on the driving side. Depending on the Join types, 0 row, 1 row, or multiple rows may be found.

/* Query 1 */
SELECT o_orderkey, o_custkey, c_name
FROM orders JOIN customer ON o_cutkey = c_custkey
WHERE o_orderkey BETWEEN 1001 AND 1005

Here are some interesting details about this process. For example, when more than one column (such as X = A AND Y = B) is found to perform lookup join, how to deal with them? For this purpose, a multi-column IN condition can be formed based on row-expression. What if NULL appears under the multi-column IN condition? What are the ways to process Anti-Join? These will not be detailed here. You are welcome to exchange comments.

For the vast majority of TP queries, Lookup Join only need to be performed once, minimizing the latency.

Global Indexes and Lookup Join

In addition, PolarDB-X also supports global indexes. Users can create global indexes for shard tables to accelerate queries of index keys. Similar to a local index, if a query contains columns that are not covered by the index, a global index also needs to be retrieved from the table. This is exactly the same as that in the Lookup Join method described in the previous section. Indexing and retrieval can be considered as a special 1:1 JOIN operation.

The Join dependent on global indexes is more complex. Recalling BKA Join in MySQL, two times of lookup joins are required:

Query a global index table by using the join key for the first time for the Join operation.
Query primary table with the primary key in the global index for the second time for indexing and retrieval.
Create a hash table with the primary key as the key based on the retrieval results, and join with the results of the second time, thus obtaining overall joined data.
Create a hash table of complete joined data the join key as the key, and join it with the data of the first time to get the final Join result.

/* Query 2 */
SELECT c_name, c_custkey, o_orderkey, o_totalprice
FROM customer JOIN orders ON c_cutkey = o_custkey
WHERE c_custkey BETWEEN 13 AND 15

Most OLTP data queries can be completed in a single batch, with a total latency of three rounds-trip operations. It is not difficult to prove that this is the best in distributed cases.

In addition, PolarDB-X also allows users to manually add more columns to the coverage column of the global index, sacrificing some write performance for better read performance. If all columns are overwritten, there is no need to retrieve the table. Instead, just two round-trip operations are required. PolarDB-X further encourages users to push down Join through a well-designed shard key as much as possible.

Community

How to Implement Join Operation on Distributed Databases

Join Implementation in MySQL

Lookup Join in PolarDB-X

Global Indexes and Lookup Join

References

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for MySQL

PolarDB for PostgreSQL

PolarDB for Xscale

Database for FinTech Solution