MaxCompute supports two types of SEMI JOIN operations: LEFT SEMI JOIN and LEFT ANTI JOIN. In SEMI JOIN, data in the right table does not appear in the result set and is only used to filter data in the left table. This topic describes how to use LEFT SEMI JOIN and LEFT ANTI JOIN.

Description

MaxCompute supports the following types of SEMI JOIN operations:
  • LEFT SEMI JOIN

    A LEFT SEMI JOIN operation returns rows from the left table that has matching rows in the right table. For example, if one row of data in the left table meets a specified condition and the data appears in the right table, the data is included in the result set.

    In MaxCompute, the usage of LEFT SEMI JOIN is similar to that of IN SUBQUERY. For more information about IN SUBQUERY, see IN SUBQUERY. You can choose one of the two operations.

  • LEFT ANTI JOIN

    A LEFT ANTI JOIN operation returns rows from the left table that does not have matching rows in the right table. For example, if one row of data in the left table meets a specified condition and the data does not appear in the right table, the data is included in the result set.

    In MaxCompute, the usage of LEFT ANTI JOIN is similar to that of NOT IN SUBQUERY. For more information about NOT IN SUBQUERY, see NOT IN SUBQUERY.

SEMI JOIN supports MAPJOIN hints. These hints improve the performance of LEFT SEMI JOIN and LEFT ANTI JOIN. For more information about MAPJOIN hints, see MAPJOIN hints.

Sample data

This topic provides sample source data and sample statements that are related to the data. This helps you understand how to prepare source data. The following statements describe how to create the sale_detail and sale_detail_sj tables and insert data into the tables.
-- Create the sale_detail and sale_detail_sj tables. The two tables are partitioned tables.
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

create table if not exists sale_detail_sj
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

-- Add one partition to each table.
alter table sale_detail add partition (sale_date='2013', region='china');
alter table sale_detail_sj add partition (sale_date='2013', region='china');

-- Insert data into the two tables.
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail_sj partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2),('s2','c2',100.2);

Examples

  • Example 1: Query data of the total_price column from the sale_detail table and obtain the data that appears in the total_price column of the sale_detail_sj table. Sample statement:
    select * from sale_detail a left semi join sale_detail_sj b on a.total_price=b.total_price;
    The following result is returned:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s2         | c2          | 100.2       | 2013       | china      |
    | s1         | c1          | 100.1       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+

    Some data of the total_price column in the sale_detail table appears in the total_price column of the sale_detail_sj table and such data is returned.

  • Example 2: Query data of the total_price column from the sale_detail table and obtain the data that does not appear in the total_price column of the sale_detail_sj table. Sample statement:
    select * from sale_detail a left anti join sale_detail_sj b on a.total_price=b.total_price;
    The following result is returned:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | null       | c5          | NULL        | 2014       | shanghai   |
    | s3         | c3          | 100.3       | 2013       | china      |
    | s6         | c6          | 100.4       | 2014       | shanghai   |
    | s7         | c7          | 100.5       | 2014       | shanghai   |
    +------------+-------------+-------------+------------+------------+

    Some data of the total_price column in the sale_detail table does not appear in the total_price column of the sale_detail_sj table and such data is returned.