This topic describes the built-in function involved in Spark Streaming SQL.

New built-in function

DELAY: indicates the maximum delay time for a specified time column. This built-in function implements the watermark semantics in Spark Structured Streaming.

  • Syntax
    WHERE delay ( colName ) < 'duration' 
  • Equivalence
    withWatermark("colName", "duration")
  • Example
    SELECT avg(inv_quantity_on_hand) qoh
    FROM kafka_inventory
    WHERE delay(inv_data_time) < '2 minutes'
    GROUP BY TUMBLING (inv_data_time, interval 1 minute)

Note

The preceding method of adding a built-in function to implement the watermark semantics, however, is somewhat restrictive. You can also implement the watermark semantics by declaring the semantics in the table or query definition. The watermark implementation method is still evolving. Follow this topic to get the latest updates.

Documentation

For more information, see Spark Stream SQL official documentation.