This topic describes the STREAM syntax in Spark SQL. The STREAM syntax is available in E-MapReduce V3.23.0 and later versions.
Why is the STREAM syntax required?
Before running a streaming query, you need to set parameters required for writeStream, such as checkpointLocation and outputMode. Currently, you can use the SET syntax to set these parameters for a query specified by the queryName parameter. However, this method has certain limitations. For example, you must set the parameters before running the query. To improve ease of use, E-MapReduce provides the STREAM syntax for you to set the parameters required for writeStream.
Note: E-MapReduce supports both the SET and STREAM syntaxes to set the parameters required for writeStream.
Syntax
CREATE STREAM queryName
OPTIONS (propertyName=propertyValue[,propertyName=propertyValue]*)
INSERT INTO tbName
queryStatement;
The following table lists the parameters required for writeStream.
Parameter | Description | Default value |
---|---|---|
checkpointLocation | The directory of the checkpoint for the streaming query job. | None |
outputMode | The output mode of the query result. | Append |
triggerType | The execution mode of the streaming query. | ProcessingTime |
triggerIntervalMs | The interval between streaming queries. Unit: milliseconds. | 0 |
Example
CREATE STREAM job1
OPTIONS(
checkpointLocation='/tmp/spark',
outputMode='Append',
triggerType='ProcessingTime'
triggerIntervalMs='3000')
INSERT INTO LargeOrders
SELECT * FROM Orders WHERE units > 1000;