All Products
Search
Document Center

PolarDB:Use Hash Match to perform IMCI-enabled operations

Last Updated:Mar 28, 2026

The In-Memory Column Index (IMCI) executor in PolarDB represents intermediate results as line numbers by default. For large queries whose data exceeds available memory, this approach causes excessive random and repeated I/O, degrading execution efficiency. To address this, the IMCI executor introduces a set of operators that materialize intermediate results. This topic describes the Hash Match operator—the materialized counterpart of Hash Join—and explains how it implements intermediate result materialization.

Execution plan

The Hash Match operator runs in two phases: the build phase and the probe phase.

  • Build phase: Join predicates serve as keys to hash each row of the left table into a hash table.

  • Probe phase: The right table is scanned, rows are matched against the hash table using the same join predicates, and the result set is emitted based on the join type.

To avoid building one oversized hash table that causes severe conflicts or requires continuous resizing, the left table is partitioned by specific rules. Each partition gets its own independent hash table. During the probe phase, the system looks up only the hash table in the partition where each right-table row belongs.

Build phase

The build feature is implemented in doOpen, which runs two sub-phases sequentially: doBuild and doMerge. Each sub-phase uses a thread group for concurrent processing.

doBuild

Each worker thread builds independent hash tables for each partition using the left table data, producing a worker x partition matrix of hash maps:

Worker\PartitionPartition0Partition1...PartitionN
Worker0HashMap00HashMap01...HashMap0N
Worker1HashMap10HashMap11...HashMap1N
...............
WorkerMHashMapM0HashMapM1...HashMapMN

Each hash table entry is a UInt64 value that encodes the position of the chunk corresponding to that key. The UInt64 is split into three bit-level fields:

  • UInt16: the worker thread that owns the chunk

  • UInt16: the chunk offset

  • UInt32: the array index within the chunk

Worker threads fetch tuples from the left table in parallel and insert them into the hash table for their assigned worker and partition—without locking. This continues until all left-table data is processed.

doMerge

After doBuild, each partition has one hash table per worker thread. The probe phase requires a single hash table per partition, so doMerge consolidates all per-worker hash tables in a partition into one:

Build\PartitionPartition0Partition1...PartitionN
MergeHashMap0HashMap1...HashMapN

Each partition is merged by a single worker thread independently, eliminating lock synchronization. Because the number of partitions exceeds the number of worker threads, the workload is evenly distributed across workers.

Spill to disk in the build phase

Each worker thread receives an equal share of the total memory quota. When available memory cannot hold all hash tables and chunks, the operator spills entire partitions to disk, starting from the highest-numbered partition, until the remaining in-memory partitions can be processed normally. If no partition fits in memory, an OutOfMemoryError is thrown.

During doBuild, if a worker thread cannot add a key-value pair to a hash table or store a chunk due to insufficient memory, it spills the data for its highest-numbered partition to disk:

  1. Chunks are written to a temporary file and their memory is released.

  2. The hash table for that partition is deleted.

  3. When the partition is needed later, the chunks are read from the temporary file and the hash table is rebuilt.

The highest spilled partition number is visible to all worker threads. If a worker's highest spilled partition exceeds the current global maximum, it updates the global value and releases memory accordingly. For partitions above the current highest in-memory number, worker threads skip building hash tables and instead write data directly to chunks, spilling to disk when a chunk is full.

Probe phase

The probe phase reads the right table and matches rows against the hash tables built in the build phase, following the same partition rules. It is driven by memory fetch operations from parent nodes, with a thread group handling concurrent processing.

doFetch

Worker threads fetch right-table data and look up matching tuples in the hash table for each row's partition. Matched tuples are processed according to the join type. This repeats until all worker threads have consumed the right table.

Spill to disk in the probe phase

When the build phase spills some partitions to disk, the probe phase handles both in-memory and on-disk partitions separately.

During doFetch, after a worker thread fetches a tuple from the right table:

  • If the tuple's partition is in memory, the worker searches the hash table directly.

  • If the partition is on disk, the tuple is stored in that partition's chunk. When the chunk is full, it is flushed to disk and the chunk memory is released.

After all workers finish reading the right table, in-memory partitions are processed and their memory is freed. On-disk partitions are then processed one at a time, each handled independently by a single worker thread (no locking required). Because the number of partitions exceeds the number of worker threads, workload remains balanced.

Processing each on-disk partition follows the same build-then-probe pattern:

  1. Build: Worker threads read left-table partition data from temporary files, deserialize chunks, and build the hash table.

  2. Probe: Worker threads read right-table partition data from temporary files, deserialize chunks, and search for matches.

After all partitions are processed, the Hash Match operator completes. Although in-memory and on-disk partition processing is described separately here, both paths share a single unified code implementation.

Probe process steps

All probe operations go through three steps, each executed by the probe function:

  1. ProbeMem: Read right-table data and route each tuple to either in-memory processing (hash table lookup) or disk storage (written to a temporary file for later processing in ProbeDisk).

  2. ProbeLeft: Handle LEFT JOIN variants (LEFT OUTER JOIN, LEFT SEMI JOIN, LEFT ANTI SEMI JOIN). Traverse all key-value pairs in the hash table and filter matched or unmatched tuples.

  3. ProbeDisk: Process on-disk partitions one at a time by loading chunks from temporary files and running the probe function. For LEFT JOIN operations, the JOIN logic is also applied to each disk partition.

Join logic

The Hash Match operator supports the following join types:

Join typeOutputPostFilter supported
INNER JOINMatched left + right tuplesNo
LEFT OUTER JOINMatched left + right tuples; unmatched left tuples with NULL for rightYes
RIGHT OUTER JOINMatched left + right tuples; unmatched right tuples with NULL for leftYes
LEFT SEMI JOINLeft tuples (with NULL/TRUE/FALSE) based on truth tableYes
LEFT ANTI SEMI JOINLeft tuples for non-matching rows based on truth tableYes
RIGHT SEMI JOINRight tuples (with NULL/TRUE/FALSE) based on truth tableYes
RIGHT ANTI SEMI JOINRight tuples for non-matching rows based on truth tableYes

All join types share the same build phase structure, with minor differences in NULL value handling. The probe phase logic differs per join type.

Inner

When a right-table tuple is non-NULL and matches the hash table built from the left table, both left and right tuples are emitted.

LeftOuter

When a right-table tuple is non-NULL and matches, both left and right tuples are emitted. Unmatched left-table tuples are emitted with NULL in the right-table positions. If a PostFilter is present, matched hash table entries are verified by PostFilter before output.

RightOuter

When a right-table tuple is non-NULL and matches, both left and right tuples are emitted. Unmatched right-table tuples are emitted with NULL in the left-table positions. If a PostFilter is present, matched hash table entries are verified by PostFilter before output.

LeftSemi

LEFT SEMI JOIN follows a similar structure to LEFT OUTER JOIN, but only left-table tuples are emitted—not the right-table tuples. The output depends on match results and whether semi_probe_ is set, according to this truth table:

//+------------------------------+--------------+----------------+
//|           matched            | semi_probe_  | ! semi_probe_  |
//+------------------------------+--------------+----------------+
//| normal true                  | (left, TRUE) |   (left, ONLY) |
//+------------------------------+--------------+----------------+
//+------------------------------+--------------+----------------+
//|        ! matched             | semi_probe_  | ! semi_probe_  |
//+------------------------------+--------------+----------------+
//|NULL v.s. (empty set)         |              |                |
//|e.g., NULL IN (empty set)     | (left, FALSE)|  NO_OUTPUT     |
//+------------------------------+--------------+----------------+
//|NULL v.s. (set)               |              |                |
//|e.g., NULL IN (1, 2, 3)       | (left, NULL) |  NO_OUTPUT     |
//+------------------------------+--------------+----------------+
//|left_row v.s. (set with NULL) |              |                |
//|e.g., 10 IN (1, NULL, 3)      | (left, NULL) |  NO_OUTPUT     |
//+------------------------------+--------------+----------------+
//|normal false                  |              |                |
//|e.g., 10 IN (1, 2, 3)         | (left, FALSE)|  NO_OUTPUT     |
//+------------------------------+--------------+----------------+

If a PostFilter is present, matched hash table entries are verified by PostFilter.

LeftAntiSemi

LEFT ANTI SEMI JOIN is similar to LEFT OUTER JOIN. Only left-table tuples are emitted, based on the following truth table:

//+------------------------------+----------------+
//|        ! matched             | ! semi_probe_  |
//+------------------------------+----------------+
//|NULL v.s. (empty set)         |                |
//|e.g., NULL NOT IN (empty set) |   (left, ONLY) |
//+------------------------------+----------------+
//|NULL v.s. (set)               |                |
//|e.g., NULL NOT IN (1, 2, 3)   |   (left, ONLY) |
//+------------------------------+----------------+
//|left_row v.s. (set with NULL) |                |
//|e.g., 10 NOT IN (1, NULL, 3)  |   (left, ONLY) |
//+------------------------------+----------------+
//|normal false                  |                |
//|e.g., 10 NOT IN (1, 2, 3)     |   (left, ONLY) |
//+------------------------------+----------------+

If a PostFilter is present, matched hash table entries are verified by PostFilter.

RightSemi

RIGHT SEMI JOIN is similar to RIGHT OUTER JOIN. Only right-table tuples are emitted, based on this truth table:

//+------------------------------+--------------+----------------+
//|           matched            | semi_probe_  | ! semi_probe_  |
//+------------------------------+--------------+----------------+
//| normal true                  | (right, TRUE)|  (right, ONLY) |
//+------------------------------+--------------+----------------+
//+------------------------------+--------------+----------------+
//|        ! matched             | semi_probe_  | ! semi_probe_  |
//+------------------------------+--------------+----------------+
//|NULL v.s. (empty set)         |              |                |
//|e.g., NULL IN (empty set)     |(right, FALSE)|  NO_OUTPUT     |
//+------------------------------+--------------+----------------+
//|NULL v.s. (set)               |              |                |
//|e.g., NULL IN (1, 2, 3)       |(right, NULL) |  NO_OUTPUT     |
//+------------------------------+--------------+----------------+
//|left_row v.s. (set with NULL) |              |                |
//|e.g., 10 IN (1, NULL, 3)      |(right, NULL) |  NO_OUTPUT     |
//+------------------------------+--------------+----------------+
//|normal false                  |              |                |
//|e.g., 10 IN (1, 2, 3)         |(right, FALSE)|  NO_OUTPUT     |
//+------------------------------+--------------+----------------+

If a PostFilter is present, matched hash table entries are verified by PostFilter.

RightAntiSemi

RIGHT SEMI JOIN is similar to RIGHT OUTER JOIN. Only right-table tuples are emitted, based on the following truth table:

//+------------------------------+----------------+
//|        ! matched             | ! semi_probe_  |
//+------------------------------+----------------+
//|NULL v.s. (empty set)         |                |
//|e.g., NULL NOT IN (empty set) |  (right, ONLY) |
//+------------------------------+----------------+
//|NULL v.s. (set)               |                |
//|e.g., NULL NOT IN (1, 2, 3)   |  (right, ONLY) |
//+------------------------------+----------------+
//|left_row v.s. (set with NULL) |                |
//|e.g., 10 NOT IN (1, NULL, 3)  |  (right, ONLY) |
//+------------------------------+----------------+
//|normal false                  |                |
//|e.g., 10 NOT IN (1, 2, 3)     |  (right, ONLY) |
//+------------------------------+----------------+

If a PostFilter is present, matched hash table entries are verified by PostFilter before output.

Implementation of the Hash Match operator

The Hash Match operator unifies memory partition processing, disk partition processing, multiple join types, and the PostFilter feature into a single implementation.

HashMap

The HashMap exposes two core operations:

size_t PutValue(uint64_t hash_code, const char *key_buf, uint64_t key_len, const uint64_t tuple);
ValueIterator FindValue(uint64_t hash_code, const char *key_data, const uint64_t key_len, const bool need_mark = false);

Two iterators traverse the hash table:

enum IteratorType { Normal = 0, NoneMark = 1, Mark = 2, END };

class TableIterator {
 public:
  void Next();
  bool IsValid() const { return valid_; }
  ValueIterator GetIterator(IteratorType type);

 private:
  IteratorType type_ = IteratorType::END;
};

class ValueIterator {
  struct Listener {
    virtual void BlockEvent() {}
  };
  void SetListener(Listener *listener) { listener_ = listener; }

  void Next();
  bool IsValid() const { return valid_; }

 private:
  IteratorType type_ = IteratorType::Normal;
  Listener *listener_ = nullptr;
};
  • `TableIterator`: traverses all key-value pairs in the hash table; used by LEFT OUTER JOIN, LEFT ANTI SEMI JOIN, and LEFT SEMI JOIN.

  • `ValueIterator`: traverses all data blocks for a given key-value pair.

Both iterators support three iterative models—Normal, NoneMark, and Mark—to handle the different requirements of each join type.

Info

HMInfo holds global state shared by all worker threads, including in-memory partition numbers and partition objects. Each partition stores:

  • The merged hash table for that partition

  • Chunk sets for the left and right tables

  • Temporary files for the left and right tables

One temporary file is created per partition. Workers perform atomic reads and writes using the pread function, the pwrite function, and an offset atomic variable.

Local info

HMLocalInfo holds private state for each worker thread, including its local in-memory partition number and its left and right HMLocalPartition objects. Each HMLocalPartition stores the worker's hash table for that partition, the chunk set, and the chunk currently being written.

Fetcher

HashMatchFetcher abstracts all data-fetch operations used in the build and probe phases:

class HashMatchFetcher final {
  bool Fetch(Context &context, TupleChunk *&mat_chunk);
  // fetch from left or right child
  bool FetchMem(Context &context);
  // fetch from info chunks (include load from temp files)
  bool FetchDisk(Context &context, TupleChunk *&mat_chunk);

  size_t part_index_ = 0;
  TupleChunk chunk_;
};

FetchMem fetches from the left or right child operator. FetchDisk reads and deserializes chunk objects from Info or LocalInfo memory, including loading from temporary files.

Builder

HashMatchBuilder manages the build phase for both memory and disk partitions:

class HashMatchBuilder {
  void Build();
  virtual void ChunkResult(const size_t offset, const bool is_null,
                           const size_t part_index, const uint64_t hash_val,
                           const char *key_data, const size_t key_len) = 0;
  virtual void ChunkDone() = 0;

  HashMatchFetcher fetcher_;
};

class HashMatchMemBuilder final : public HashMatchBuilder {
  void ChunkResult(const size_t offset, const bool is_null,
                   const size_t part_index, const uint64_t hash_val,
                   const char *key_data, const size_t key_len) override;
  void ChunkDone() override;

  TupleChunk origin_chunk_;
};

class HashMatchDiskBuilder final : public HashMatchBuilder {
  void ChunkResult(const size_t offset, const bool is_null,
                   const size_t part_index, const uint64_t hash_val,
                   const char *key_data, const size_t key_len) override;
  void ChunkDone() override;

  const size_t part_index_ = 0;
};
  • `MemBuilder`: fetches left-table data and stores it in the chunk set. For in-memory partitions, writes to the hash table. For disk partitions, spills chunk data to disk.

  • `DiskBuilder`: reads the chunk set from a temporary file and builds the hash table for that partition.

image..png

Prober

HashMatchProber executes ProbeMem, ProbeLeft, and ProbeDisk through a single probe interface:

class HashMatchProber final {
 public:
  void ProbeResult(TupleChunk *tpchunk, size_t &chunk_off, const size_t chunk_size);
  bool ProbeIter(Context &context, TupleChunk *tpchunk, size_t &chunk_off, const size_t chunk_size);
  bool Probe(Context &context, TupleChunk *tpchunk, size_t &chunk_off, const size_t chunk_size, const bool disk);

 private:
  const HashMatch &join_;
  HMInfo *info_ = nullptr;
  HMLocalInfo *local_info_ = nullptr;
  size_t part_index_ = 0;

  PostFilter filter_;

  LeftIterator lit_;
  RightIterator rit_;
  TraverseIterator tit_;  // used for probe left
};

HashMatchProber::PostFilter handles joins that carry a PostFilter condition. Results from the probe phase are passed through PostFilter before being emitted:

struct PostFilter final {
  bool Evaluate();
  bool Probe(TupleChunk *tpchunk, size_t &chunk_off, const size_t chunk_size);

  const HashMatchProber &prober_;
  const RTExprTreePtr &post_expr_;
  const HashMatchExpr &left_expr_;
  const HashMatchExpr &right_expr_;
  std::shared_ptr<Expressions::ExprEnv> post_env_ = nullptr;
};

HashMatchProber uses three iterator types, all derived from a common Iterator base:

struct Iterator {
  virtual void InitExpr() {}
  virtual void FiniExpr() {}
  virtual void Init(const size_t part_index);
  virtual void Fini();
  virtual bool Valid(Context &context) { return false; }
  virtual void Next() = 0;

  HashMatchProber &prober_;
  PostFilter &filter_;
};

`LeftIterator` uses HashMap::ValueIterator to traverse all values for a specific key, then locates the corresponding chunk tuple. This provides all tuples for a given key to the join logic and PostFilter:

// for Probe
struct LeftIterator final : public Iterator, public ValueIterator::Listener {
  void BlockEvent() override;

  bool Valid(Context &context) override;
  void Next() override;
  bool Find(const size_t part_index, const uint64_t hash_val,
            const char *key_data, const uint64_t key_len);

  ValueIterator it_;
};

`RightIterator` continuously fetches chunk sets from the right table or temporary files using HashMatchFetcher, traversing all chunks and tuples. All join types use RightIterator in ProbeMem or ProbeDisk to fetch chunks, then look up the corresponding hash table partition. When a match is found, a LeftIterator is created to traverse all tuples for that key. LEFT JOINs additionally require a ProbeLeft pass:

// for Probe
struct RightIterator : public Iterator {
  bool Valid(Context &context) override;
  void Next() override;

  HashMatchFetcher fetcher_;
  TupleChunk origin_chunk_;
  size_t chunk_size_ = 0;
};

`TraverseIterator` is used for LEFT JOIN variants (LEFT OUTER JOIN, LEFT SEMI JOIN, LEFT ANTI SEMI JOIN). It uses HashMap::TableIterator to scan the entire hash table and filter matched or unmatched keys, then uses LeftIterator to retrieve all tuples for each key. After ProbeMem or ProbeDisk marks matched entries in the hash table, ProbeLeft uses TraverseIterator to filter the appropriate key-value pairs:

// for ProbeLeft
struct TraverseIterator final : public Iterator {
  bool Valid(Context &context) override;
  void Next() override;

  TableIterator tit_;
  LeftIterator lit_;
  IteratorType it_type_ = IteratorType::END;
};
image..png

Performance test

The following test compares Hash Join and Hash Match on TPC-H Q14 with a 1 TB dataset. Both operators use similar algorithms; the key difference is that Hash Match materializes intermediate results while Hash Join does not.

select
    100.00 * sum(case
                when p_type like 'PROMO%'
                then l_extendedprice * (1 - l_discount)
                else 0
                end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1995-09-01'
    and l_shipdate < date '1995-09-01' + interval '1' month;

With a Least Recently Used (LRU) cache and 100 GB executor memory:

Query (TPC-H, 1 TB)Hash JoinHash Match
Q1423.96 s12.56 s

With a Least Recently Used (LRU) cache and 32 GB executor memory:

Query (TPC-H, 1 TB)Hash JoinHash Match
Q14> 10 minutes35.73 s

At 100 GB executor memory, Hash Match completes Q14 in roughly half the time of Hash Join. The advantage is more pronounced under memory pressure: at 32 GB, Hash Join exceeds 10 minutes due to excessive I/O from non-materialized intermediate results, while Hash Match—by spilling partitions to disk in an organized manner—finishes in 35.73 seconds.