Join operations are common in distributed systems. A join operation is both time- and resource-consuming because data is shuffled during the join process. This is the case especially when a large amount of data is involved. MaxCompute allows you to optimize shuffle operations by using the equi-join attribute of join operations.
Optimization principle
select * from (table1) A join (table2) B on A.a= B.b;In this example, MaxCompute can generate a filter based on data in Table A by using the equi-join attribute of the join operation. Then, MaxCompute can filter data in Table B before the shuffle or join operation starts. MaxCompute can even apply the filter conditions to the underlying storage to filter data at the source. This feature is called dynamic filtering.
The following figure shows the execution plans of the preceding SQL statement before and after dynamic filtering is enabled.

Scenario
Dynamic filtering is implemented based on the equi-join attribute of join operations. MaxCompute uses this feature to dynamically generate a filter during the running process of a job and filter data before a shuffle or join operation starts. This accelerates queries. The feature is suitable for the join between a dimension table and a fact table.
Dynamic bloom filter and range filter
As shown in the preceding figure, the original execution plan of a job does not contain a filter. After you enable dynamic filtering, MaxCompute automatically generates a filter based on the equi-join attribute of the join operation. The filter is used to check whether elements in Table B exist in datasets generated by Table A and filter out those that do not exist in the datasets.
A dynamic bloom filter can help you effectively reduce the time and resources consumed by a join operation. MaxCompute also supports dynamic range filters. A dynamic range filter uses the min and max metrics to filter data.

- Dynamic filter producer (DFP) operator: the producer of dynamic filters. It generates
a dynamic bloom filter based on data in the smaller table of a join operation. It
also obtains the values of the
minandmaxmetrics of the join key to generate a dynamic range filter. Then, the producer sends the filters to the consumer. - Dynamic filter consumer (DFC) operator: the consumer of dynamic filters. It uses the received dynamic bloom filter and range filter to filter data in the larger table of the join operation. The dynamic range filter tries to apply the filter conditions to the underlying storage. If this operation succeeds, it filters data at the source.
- A JOIN B: Either Table A or Table B can serve as the producer or consumer.
- A LEFT JOIN B: Table A can serve only as the producer, and Table B can serve only as the consumer.
- A RIGHT JOIN B: Table A can serve only as the consumer, and Table B can serve only as the producer.
- A FULL OUTER JOIN B: Dynamic filtering is not supported.
For information about how to enable dynamic filtering, see Enable dynamic filtering.
Dynamic partition pruning
The preceding example shows the optimization of join operations between non-partitioned tables. The join key in the example is not a partition key column. If the join key is a partition key column of a partitioned table, a dynamic bloom filter or range filter can still be used. The difference is that, before data in the partitioned table is filtered, MaxCompute reads data in all the partitions of the table. The reading process is time-consuming. In this case, you can use the dynamic partition pruning feature to prune the partitions that do not need to be read before MaxCompute starts to read data.
-- Table A is a non-partitioned table, and the value of column a in the table is 20200701.
-- Table B is a partitioned table. Its partition key column is ds, and this column contains three values (partition names): 20200701, 20200702, and 20200703.
select * from (table1) A join (table2) B on A.a= B.ds;After you enable dynamic partition pruning, the optimizer of MaxCompute checks whether the larger table of a join operation is a partitioned table. If it is a partitioned table, MaxCompute generates a dynamic bloom filter based on data in the smaller table. Then, MaxCompute checks the partition list, aggregates the partitions that need to be read, and prunes the partitions that do not need to be read. If all partitions to be read in a running process are pruned, the process is not scheduled.
In this example, the 20200702 and 20200703 partitions of Table B are pruned because column a of Table A has only one value, 20200701. This helps save resources and reduce the running duration of the job.
For information about how to enable dynamic partition pruning, see Enable dynamic partition pruning.
Enable dynamic filtering
- Method 1: Run the following command to forcibly enable dynamic filtering at the session
level. The command must be committed together with an SQL statement.
set odps.optimizer.force.dynamic.filter=true;Note Although you can use this method at the project level, we recommend that you use it at the session level. If you use it at the project level, dynamic filtering is enabled for all join jobs of a project. As a result, if data cannot be filtered in a join operation, the join efficiency is low.If this method is used, dynamic filters are generated for all join operations for which the feature is enabled.
- Method 2: Run the following command to optionally enable dynamic filtering at the
session level:
set odps.optimizer.enable.dynamic.filter=true;If this method is used, the optimizer estimates whether dynamic filtering can produce benefits in terms of resource or time consumption. If it can produce benefits, MaxCompute generates a dynamic filter. Otherwise, it does not generate a dynamic filter.
Note This method depends on metadata statistics, such as the ndv metric. For more information about metadata statistics, see Collect information for the optimizer of MaxCompute. Metadata statistics are the estimation results of the optimizer and may be inaccurate. Therefore, sometimes a dynamic filter may fail to be generated as expected. - Method 3: Use a hint in an SQL statement to enable dynamic filtering.
The hint must be in the
/*+dynamicfilter(Producer, Consumer1[, Consumer2,...])*/format. Each producer can generate dynamic filters for multiple consumers. Example:select /*+dynamicfilter(A, B)*/ * from (table1) A join (table2) B on A.a= B.b;
Enable dynamic partition pruning
Run the following command to enable dynamic partition pruning at the session level. The command must be committed together with an SQL statement.
set odps.optimizer.dynamic.filter.dpp.enable=true;Verify the dynamic filtering result
Make sure that you have enabled dynamic filtering and dynamic partition pruning by using the methods described in Enable dynamic filtering and Enable dynamic partition pruning. After you run a join job, view the Logview information of the job. If an operator similar to DynamicFilterConsumer1 appears on the Logview page, a dynamic filter has been generated and used to filter data.
