This topic describes how to use the MERGE INTO statement in Spark Streaming SQL.
Syntax
mergeInto
: MERGE INTO target=tableIdentifier tableAlias
USING (source=tableIdentifier (timeTravel)? | '(' subquery = query ')') tableAlias
mergeCondition?
matchedClauses*
notMatchedClause?
mergeCondition
: ON condition=booleanExpression
matchedClauses
: deleteClause
| updateClause
notMatchedClause
: insertClause
deleteClause
: WHEN MATCHED (AND deleteCond=booleanExpression)? THEN deleteAction
| WHEN deleteCond=booleanExpression THEN deleteAction
updateClause
: WHEN MATCHED (AND updateCond=booleanExpression)? THEN updateAction
| WHEN updateCond=booleanExpression THEN updateAction
insertClause
: WHEN NOT MATCHED (AND insertCond=booleanExpression)? THEN insertAction
| WHEN insertCond=booleanExpression THEN insertAction
Example
- Define a source table and a target table.
-- source table source_table : id int, --primary key | name string, | opType string --data operation type -- target table target_table : id int, --primary key | name string
- Merge data into a Delta table.
MERGE INTO target_table t USING source_table s ON s.id = t.id WHEN MATCHED AND s.opType = 'delete' THEN DELETE WHEN MATCHED AND s.opType = 'update' THEN UPDATE SET id = s.id, name = s.name WHEN NOT MATCHED AND s.opType = 'insert' THEN INSERT (key, value) VALUES (key, value)
- Merge data into a Kudu table.
MERGE INTO target_table t USING source_table s WHEN s.opType = 'delete' THEN DELETE WHEN s.opType = 'update' THEN UPDATE SET * WHEN s.opType = 'insert' THEN INSERT *
Description
Take note of the following rules for the MERGE INTO statement:
- You can use a maximum of three clauses.
- When you use the WHEN NOT MATCHED clause, you can specify a maximum of two WHEN clauses, and you must place the WHEN NOT MATCHED clause at the end of the statement.
Take note of the following rules for clauses in the MERGE INTO statement:
- WHEN MATCHED:
- You can use only one UPDATE or DELETE operation.
- You can specify an optional condition in each WHEN MATCHED clause. If you use two WHEN MATCHED clauses, the first one must contain a condition.
- WHEN MATCHED clauses are executed in sequence. A data entry is updated or deleted based on only one WHEN MATCHED clause.
- You can use
UPDATE SET *
to indicateUPDATE SET column1 = source.column1 [, column2 = source.column2 ...]
. If the number of fields in the source table is different from that in the target table, a parsing exception is thrown.
- WHEN NOT MATCHED:
- You can use only INSERT operations and can specify an optional condition.
- You can use
INSERT *
to indicateINSERT (column1 [, column2 ...]) VALUES (source.value1 [, source.value2 ...])
. If the number of fields in the source table is different from that in the target table, a parsing exception is thrown.
If the target storage system supports the UPSERT operation, you can use a variant
of MERGE INTO to update or delete data. Take note of the following points on the variants
of MERGE INTO:
- You can use a maximum of three WHEN clauses.
mergeCondition
: You do not need to configure a merge condition.- You can omit the
MATCHED
andNOT MATCHED
keywords.