You can use a JOIN operation to join two tables and obtain query results from the joined tables. In MaxCompute, JOIN operations include LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, and INNER JOIN. MaxCompute supports multiple JOIN operations in an SQL statement, but does not support CROSS JOIN. A CROSS JOIN operation joins two tables without specifying conditions in the ON clause and returns the cartesian product of the input tables.

The command syntax is as follows:
join_table:
        table_reference join table_factor [join_condition]
        | table_reference {left outer|right outer|full outer|inner} join table_reference join_condition
    table_reference:
        table_factor
        | join_table
    table_factor:
        tbl_name [alias]
        | table_subquery alias
        | ( table_references )
    join_condition:
        on equality_expression ( and equality_expression )
Note
  • equality_expression indicates an equality expression.
  • If partition pruning conditions are specified in the WHERE clause, partition pruning takes effect for both tables. If partition pruning conditions are specified in the ON clause, partition pruning takes effect for the right table but not the left table. A full table scan is run for the left table. For more information, see Reasonableness evaluation of partition pruning.
  • LEFT OUTER JOIN: the LEFT JOIN operation. It returns all queried rows from the left table, including those that do not match any rows in the right table.
    select a.shop_name as ashop, b.shop_name as bshop from shop a
            left outer join sale_detail b on a.shop_name=b.shop_name;
        -- Both the shop and sale_detail tables have the shop_name column. Therefore, aliases are assigned to the shop_name columns in the SELECT clause to distinguish between the columns.
    Note If the values in some rows of the right table are duplicate, we recommend that you do not use LEFT JOIN consecutively many times. Otherwise, data bloat may occur during the JOIN operation and interrupt your jobs.
  • RIGHT OUTER JOIN: the RIGHT JOIN operation. It returns all queried rows from the right table, including those that do not match any rows in the left table.
    select a.shop_name as ashop, b.shop_name as bshop from shop a
            right outer join sale_detail b on a.shop_name=b.shop_name;
  • FULL OUTER JOIN: the FULL JOIN operation. It returns all queried rows from both the left and right tables.
    select a.shop_name as ashop, b.shop_name as bshop from shop a
            full outer join sale_detail b on a.shop_name=b.shop_name;

In the following example, the left table is test_table_a and the right table is test_table_b. The JOIN clause is used to join tables. The WHERE clause is used to specify filter conditions. To query rows whose value of the origin column is equal to that of the id column and whose data timestamp is greater than 20180101, you can use LEFT JOIN to return all queried rows in the left table test_table_a. Normally, if you use LEFT JOIN, it returns all queried rows in the left table. If you use RIGHT JOIN, it returns all queried rows in the right table.

An error example is as follows:
SELECT s.id
        ,s.name
        ,s.origin
        ,d.value
FROM    test_table_a s
LEFT JOIN   test_table_b d
ON      s.origin = d.id
WHERE   s.ds > "20180101" AND d.ds>"20180101";

If you use the JOIN clause before the WHERE clause, the JOIN operation is performed first. Then, the results obtained from the JOIN operation are filtered based on the conditions specified by the WHERE clause. The final results are the intersection of two tables but not all queried rows in the left table.

To return all queried rows in the left table, you can modify the SQL statement as follows:
SELECT s.id
        ,s.name
        ,s.origin
        ,d.value
FROM  (select * from  test_table_a where ds > "20180101" ) s
LEFT JOIN (select * from  test_table_b where ds > "20180101") d
ON s.origin = d.id;
INNER JOIN: returns a result table that combines the queried rows from two tables that meet the conditions specified by the ON clause. The keyword INNER can be omitted.
select a.shop_name from shop a inner join sale_detail b on a.shop_name=b.shop_name;
select a.shop_name from shop a join sale_detail b on a.shop_name=b.shop_name;
If you specify multiple conditions in the ON clause, you must specify equivalent conditions and use and to connect these conditions. You can specify a non-equivalent condition or use or to connect multiple conditions in the ON clause only for MAPJOIN.
select a.* from shop a full outer join sale_detail b on a.shop_name=b.shop_name
        full outer join sale_detail c on a.shop_name=c.shop_name;
    -- MaxCompute supports multiple JOIN operations in an SQL statement.
select a.* from shop a join sale_detail b on a.shop_name ! = b.shop_name;
    -- MaxCompute does not support a non-equivalent condition and returns an error.
MaxCompute supports the following implicit JOIN operation:
SELECT * FROM table1, table2 WHERE table1.id = table2.id;
-- The effect of this statement is equivalent to the following statement:
SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
MaxCompute also supports the NATURAL JOIN operation, in which the conditions used to join two tables are automatically determined based on the common fields in the two tables. MaxCompute supports OUTER NATURAL JOIN. You can use the USING clause so that the JOIN operation returns common fields only once. For example, to join the src table that contains the key1, key2, a1, and a2 fields and the src2 table that contains the key1, key2, b1, and b2 fields, you can run the following statement:
SELECT * FROM src NATURAL JOIN src2;
-- Both the src and src2 tables contain the key1 and key2 fields. In this case, the preceding statement is equivalent to the following statement:
SELECT src.key1 as key1, src.key2 as key2, src.a1, src.a2, src2.b1, src2.b2 FROM src INNER JOIN src2 ON src.key1 = src2.key1 AND src.key2 = src2.key2;
You can use parentheses () to specify the priority of JOIN operations. The JOIN operation enclosed in parentheses () has a higher priority. In the following example, the src2 JOIN src3 operation is performed first and the src JOIN operation is then performed on the obtained results.
SELECT * FROM src JOIN (src2 JOIN src3 on xxx) ON yyy;