This topic describes the window functions, time attributes, and window types that Flink SQL supports.
Window functions
Flink SQL supports aggregation over infinite windows. You do not need to explicitly define windows in your SQL statements. Flink SQL also supports aggregation over a specific window. For example, to count the number of users who clicked a URL in the last minute, you can define a window to collect the data about user clicks that occur in the last minute. Then, you can compute the data in the window to obtain the result.
Flink SQL supports window aggregates and over aggregates. This topic describes window aggregates. Window aggregates support the windows that are defined based on the following two time attributes: event time and processing time. For each time attribute, Flink SQL supports three window types: tumbling window, sliding window, and session window.
Time attributes
- Event time: the event time that you provide in the table schema. In most cases, the event time is the original time when data is created.
- Processing time: the local system time at which the system processes an event.
Cascading windows
The event time attribute of the Rowtime column no longer takes effect after a window
operation is complete. You can use a helper function such as TUMBLE_ROWTIME
, HOP_ROWTIME
, or SESSION_ROWTIME
to obtain max(rowtime)
of the rowtime column in a window. You can use the obtained value as the rowtime
of the time window. The value is window_end - 1
and is of the TIMESTAMP data type. The TIMESTAMP value has the rowtime attribute.
For example, if the time span for a window is [00:00, 00:15]
, 00:14:59.999
is returned.
CREATE TABLE user_clicks(
username varchar,
click_url varchar,
ts timeStamp,
WATERMARK wk FOR ts as withOffset(ts, 2000) --Define a watermark for the rowtime.
) with (
type='datahub',
...
);
CREATE TABLE tumble_output(
window_start TIMESTAMP,
window_end TIMESTAMP,
username VARCHAR,
clicks BIGINT
) with (
type='print'
);
CREATE VIEW one_minute_window_output as
SELECT
// Use each TUMBLE_ROWTIME value as the aggregation time for the level-two window.
TUMBLE_ROWTIME(ts, INTERVAL '1' MINUTE) as rowtime,
username,
COUNT(click_url) as cnt
FROM user_clicks
GROUP BY TUMBLE(ts, INTERVAL '1' MINUTE), username;
INSERT INTO tumble_output
SELECT
TUMBLE_START(rowtime, INTERVAL '1' HOUR),
TUMBLE_END(rowtime, INTERVAL '1' HOUR),
username,
SUM(cnt)
FROM one_minute_window_output
GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), username;