Realtime Compute aggregates data by using window functions based on the time attribute. To use window functions based on the event time in a job, you must define a watermark when you declare a source table.
WATERMARK [watermarkName] FOR <rowtime_field> AS withOffset(<rowtime_field>, offset)
|watermarkName||No||The name of the watermark.|
|<rowtime_field>||Yes||The column used to generate the watermark. <rowtime_field> is identified as the event time column and must be a column of the TIMESTAMP type defined in the table. You can use <rowtime_field> to define a window in the job code.|
|withOffset||Yes||The policy to generate a watermark. The watermark value is generated based on the following formula: <rowtime_field> - offset. The first parameter in withOffset must be <rowtime_field>.|
|offset||Yes||The offset between the watermark value and event time, in milliseconds.|
1501750584000 (2017-08-03 08:56:24.000). If you want to define a watermark based on the rowtime field and configure a 4-second offset, add the following definition:
WATERMARK FOR rowtime AS withOffset(rowtime, 4000)
1501750584000 - 4000 = 1501750580000 (2017-08-03 08:56:20.000). This means that all data whose timestamp is earlier than
1501750580000 (2017-08-03 08:56:20.000)has arrived.
- When you use an event time-based watermark, the rowtime field must be of the TIMESTAMP type. Currently, Realtime Compute supports 13-digit UNIX timestamps in milliseconds. If the rowtime field is of another type or the UNIX timestamp is not 13 digits in length, we recommend that you use a computed column to convert the time. For more information, see Computed column.
- The event time and processing time can only be declared in the source table. For more information, see Event time and Processing time.
- A watermark indicates that all the events whose timestamp t' is earlier than the watermark time t (t'< t ) have occurred. Currently, after the watermark time t takes effect, all subsequently received data records whose event time is earlier than t are discarded. In the future, Realtime Compute will allow you to change the configuration and ensure that subsequent data can be updated.
- Watermarks are important for out-of-order data streams because the watermarks help ensure that a window is correctly computed even if some events arrive late.
- If an operator has multiple input data streams for parallel processing, the event time of the data stream with the shortest time is used as the event time of the operator.