Use Runtime Filter in Hologres - Hologres - Alibaba Cloud Documentation Center

Hologres V2.0 and later support the Runtime Filter feature. This feature automatically optimizes the filter operation during multi-table join queries and helps improve the performance of join queries. This topic describes how to use the Runtime Filter feature in Hologres.

Background information

Scenarios

Hologres V2.0 and later support the Runtime Filter feature. This feature is suitable for scenarios where two or more tables are joined for queries, especially for scenarios where large tables are joined with small tables. After the Runtime Filter feature is enabled, the optimizer and execution engine automatically optimize the filter operation during queries, without the need of manual configurations. After the optimization, the I/O overhead is reduced and the performance of join queries is improved.

How it works

Before you understand the principles of Runtime Filter, you must understand the join process. In this example, the following SQL statement is executed to join two tables:

select * from test1 join test2 on test1.x = test2.x;

The following execution plan is used.

image..png

In the preceding execution plan, a hash table is built based on the test2 table, and data in the hash table is used to match data in the test1 table. Then, a result is returned. This join process involves the following concepts:

Build table: During a hash join of tables or subqueries, the table or subquery based on which a hash table is built is called the build table. The build table corresponds to the hash node in the execution plan.
Probe table: The other side of the hash join is the probe table. Data in the probe table is read and matched with data in the hash table that is built based on the build table.

In most cases, if the execution plan is valid, the small table is the build table and the large table is the probe table.

During hash joins, the Runtime Filter feature generates a lightweight filter based on the distribution of data in the build table and sends the filter to the probe table to prune data in the probe table. This helps reduce the amount of data in the probe table that needs to be joined and transmitted over the network. This way, the join performance is improved.

Therefore, the Runtime Filter feature delivers better performance for scenarios in which large and small tables are joined and the amounts of data in the tables vary greatly.

Limits and trigger conditions

Limits

Only Hologres V2.0 and later support the Runtime Filter feature.
In Hologres V2.0, the Runtime Filter feature can be triggered only when tables are joined based on one field. In Hologres V2.1 and later, the Runtime Filter feature allows you to join tables based on multiple fields. If the multiple fields meet the trigger conditions, Runtime Filter is triggered.

Trigger conditions

Hologres supports high-performance joins. The Runtime Filter feature is automatically triggered at the underlying layer if the following conditions are met for an SQL query:

The probe table contains 100,000 or more rows of data.
The amount of data to be scanned in the build table divided by the amount of data to be scanned in the probe table is less than or equal to 0.1. A smaller value makes it easier to trigger Runtime Filter.
The amount of data to be joined in the build table divided by the amount of data to be joined in the probe table is less than or equal to 0.1. A small value makes it easier to trigger Runtime Filter.

Examples

In the following examples, Runtime Filter is triggered to deliver better query performance.

Example 1: Join tables based on values of a single column.
```
begin; 
create table test1(x int, y int);
call set_table_property('test1', 'distribution_key', 'x');

create table test2(x int, y int);
call set_table_property('test2', 'distribution_key', 'x');
end;

insert into test1 select t, t from generate_series(1, 100000) t;
insert into test2 select t, t from generate_series(1, 1000) t;
analyze test1;
analyze test2;

explain select * from test1 join test2 on test1.x = test2.x;
```
The following execution plan is used.
The preceding execution plan shows that the test2 table has only 1,000 rows of data and the test1 table has 100,000 rows of data. The test2 table is the build table, and the test1 table is the probe table. The ratio of the to-be-scanned data amounts between the build table and the probe table is 0.01, which is less than 0.1. The ratio of the to-be-joined data amounts between the build table and the probe table is 0.01, which is less than 0.1. The default conditions for triggering Runtime Filter are met, and the Hologres engine automatically uses the Runtime Filter feature. The probe table (test1 table) contains the Runtime Filter Target Expr node, which indicates that the Runtime Filter feature is used on the probe table.

Example 2: Join tables based on values of multiple columns. This feature is supported in Hologres V2.1 and later.

create table t1(id1 int not null, id2 int not null);
create table t2(id1 int not null, id2 int not null);

-- Create two tables named t1 and t2. The t1 table contains less data than the t2 table. The t1 table is the build table and is broadcast.
insert into t1 select t, t+100 from generate_series(1, 10000) t;
insert into t2 select t/10, t/10+100 from generate_series(1, 10000000) t;

-- Update the execution plan.
analyze t1, t2;

-- Verify the Runtime Filter feature when tables are joined based on the values of multiple columns.
explain select * from t1 join t2 on t1.id1=t2.id1 and t1.id2=t2.id2;

The following execution plan is used.