The JOIN statement in EMR has the same semantics as a traditional JOIN statement for batch processing. This statement is used to combine data from two tables.

Syntax

tableReference [, tableReference ]* | tableexpression
[ joinType ] JOIN tableexpression [ joinCondition ];
Parameters in the syntax:
  • tableReference: a table name
  • tableexpression: an expression
  • joinCondition: a JOIN condition

Limits

When you perform JOIN operations on streaming data, some JOIN types are not supported. For more information, see Spark documentation.
Left table Right table JOIN type Supported
Streaming table Static table INNER JOIN Yes
LEFT OUTER JOIN Yes
RIGHT OUTER JOIN No
FULL OUTER JOIN No
Static table Streaming table INNER JOIN Yes
LEFT OUTER JOIN No
RIGHT OUTER JOIN Yes
FULL OUTER JOIN No
Streaming table Streaming table INNER JOIN Yes
LEFT OUTER JOIN Yes
RIGHT OUTER JOIN Yes
FULL OUTER JOIN No