All Products
Search
Document Center

MaxCompute:SEMI JOIN

Last Updated:Jul 19, 2023

MaxCompute supports two types of SEMI JOIN operations: LEFT SEMI JOIN and LEFT ANTI JOIN. In a SEMI JOIN operation, data in the right table does not appear in the result set and is only used to filter data in the left table. This can improve query performance. This topic describes how to use LEFT SEMI JOIN and LEFT ANTI JOIN.

Description

MaxCompute supports the following types of SEMI JOIN operations:

  • LEFT SEMI JOIN

    A LEFT SEMI JOIN operation returns rows from the left table that has matching rows in the right table. For example, if one row of data in the left table meets a specified condition and the data appears in the right table, the data is included in the result set.

    In MaxCompute, the usage of LEFT SEMI JOIN is similar to that of IN SUBQUERY. For more information about IN SUBQUERY, see IN SUBQUERY. You can choose one of the two operations.

  • LEFT ANTI JOIN

    A LEFT ANTI JOIN operation returns rows from the left table that does not have matching rows in the right table. For example, if one row of data in the left table meets a specified condition and the data does not appear in the right table, the data is included in the result set.

    In MaxCompute, the usage of LEFT ANTI JOIN is similar to that of NOT IN SUBQUERY. For more information about NOT IN SUBQUERY, see NOT IN SUBQUERY.

SEMI JOIN supports MAPJOIN hints. These hints improve the performance of LEFT SEMI JOIN and LEFT ANTI JOIN. For more information about MAPJOIN hints, see MAPJOIN hints.

Sample data

Sample source data is provided for you to better understand the examples in this topic. The following statements describe how to create the sale_detail and sale_detail_sj tables and insert data into the tables.

-- Create a partitioned table named sale_detail. 
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

create table if not exists sale_detail_sj
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

-- Add partitions to the sale_detail table. 
alter table sale_detail add partition (sale_date='2013', region='china');
alter table sale_detail_sj add partition (sale_date='2013', region='china');

-- Insert data into the sale_detail table. 
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail_sj partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2),('s2','c2',100.2);

Query data in the sale_detail and sale_detail_sj table,Sample statement:

set odps.sql.allow.fullscan=true;
select * from sale_detail;
-- The following result is returned: 
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+
select * from sale_detail_sj;
-- The following result is returned: 
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s5         | c2          | 100.2       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+
                

Examples

  • Example 1: Query data of the total_price column from the sale_detail table and obtain the data that appears in the total_price column of the sale_detail_sj table. Sample statement:

    select * from sale_detail a left semi join sale_detail_sj b on a.total_price=b.total_price;

    The following result is returned:

    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s2         | c2          | 100.2       | 2013       | china      |
    | s1         | c1          | 100.1       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+

    Some data of the total_price column in the sale_detail table appears in the total_price column of the sale_detail_sj table and such data is returned.

  • Example 2: Query data of the total_price column from the sale_detail table and obtain the data that does not appear in the total_price column of the sale_detail_sj table. Sample statement:

    select * from sale_detail a left anti join sale_detail_sj b on a.total_price=b.total_price;

    The following result is returned:

    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s3         | c3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+

    Some data of the total_price column in the sale_detail table does not appear in the total_price column of the sale_detail_sj table and such data is returned.