This topic describes the STREAM syntax in Spark SQL. The STREAM syntax is available in E-MapReduce V3.23.0 and later versions.

Why is the STREAM syntax required?

Before running a streaming query, you need to set parameters required for writeStream, such as checkpointLocation and outputMode. Currently, you can use the SET syntax to set these parameters for a query specified by the queryName parameter. However, this method has certain limitations. For example, you must set the parameters before running the query. To improve ease of use, E-MapReduce provides the STREAM syntax for you to set the parameters required for writeStream.

Note: E-MapReduce supports both the SET and STREAM syntaxes to set the parameters required for writeStream.

Syntax

CREATE STREAM queryName
OPTIONS (propertyName=propertyValue[,propertyName=propertyValue]*)
INSERT INTO tbName
queryStatement;

The following table lists the parameters required for writeStream.

Parameter Description Default value
checkpointLocation The directory of the checkpoint for the streaming query job. None
outputMode The output mode of the query result. Append
triggerType The execution mode of the streaming query. ProcessingTime
triggerIntervalMs The interval between streaming queries. Unit: milliseconds. 0

Example

CREATE STREAM job1
OPTIONS(
checkpointLocation='/tmp/spark',
outputMode='Append',
triggerType='ProcessingTime'
triggerIntervalMs='3000')
INSERT INTO LargeOrders
SELECT * FROM Orders WHERE units > 1000;