The JOIN statement in EMR has the same semantics as a traditional JOIN statement for batch processing. This statement is used to combine data from two tables.
Syntax
tableReference [, tableReference ]* | tableexpression
[ joinType ] JOIN tableexpression [ joinCondition ];
Parameters in the syntax:
- tableReference: a table name
- tableexpression: an expression
- joinCondition: a JOIN condition
Limits
When you perform JOIN operations on streaming data, some JOIN types are not supported. For more information, see Spark documentation.
Left table | Right table | JOIN type | Supported |
---|---|---|---|
Streaming table | Static table | INNER JOIN | Yes |
LEFT OUTER JOIN | Yes | ||
RIGHT OUTER JOIN | No | ||
FULL OUTER JOIN | No | ||
Static table | Streaming table | INNER JOIN | Yes |
LEFT OUTER JOIN | No | ||
RIGHT OUTER JOIN | Yes | ||
FULL OUTER JOIN | No | ||
Streaming table | Streaming table | INNER JOIN | Yes |
LEFT OUTER JOIN | Yes | ||
RIGHT OUTER JOIN | Yes | ||
FULL OUTER JOIN | No |