The data aggregation node is akin to the window function in Flink SQL, aggregating messages within a parsing task based on the window. The resulting aggregated data can be leveraged for further analysis or output.
Instructions
The Flink SQL currently supports the tumbling time window function, identified as TUMBLE. For more information, see TUMBLE.
The data aggregation node offers fixed-length time windows of 10s, 15s, 30s, 1min, 5min, 15min, and 30min.
Should the necessary time window extend beyond 30 minutes, it is advisable to utilize the SQL analysis workbench within Analysis Insight, leveraging hourly scheduling to accomplish this. For more information, see Step 3: Set a job scheduling policy and publish.
For custom requirements, please contact technical support to submit your request.
Scenarios
Consider a park energy-saving system aiming to reduce energy consumption and costs. One subtask is to identify the meeting room with the highest air conditioner temperature (temperature) every minute. In this case, you would configure a data aggregation node with room ID (roomId) as the granularity, a 1-minute window length, and a MAX aggregation algorithm, outputting fields like room ID (roomId) and the highest temperature per minute (max_temperature).
Prerequisites
Ensure that data calculation expressions or data filtering filters are set up. For more information, see Configure data computing and data filtering.
Background information
For an overview of the data parsing feature, refer to Usage Notes.
For guidance on navigating the data parsing workspace, see Data Parsing Workspace Instructions.
Procedure
In the middle canvas, click the add icon following the current node.
In the node list that appears, single click the Data Aggregation node.
Single click the Data Aggregation node on the canvas. On the right-side configuration panel, set the data aggregation fields as detailed in the table below.
Configuration Item
Parameter
Description
Example
Basic Configuration
Grouping Field
Select the field for data partitioning within the window function, which serves as the aggregation granularity. This value remains unaffected by the window aggregation logic, such as product key (ProductKey), device name (DeviceName), etc.
Refer to "Scenarios": To calculate the highest temperature for each meeting room, choose room ID (roomId) as the grouping field.
Window Length
Choose the window length from the following options: 10s, 15s, 30s, 1min, 5min, 15min, 30min.
For windows exceeding 1 hour, employ hourly scheduling via the SQL workbench. For additional details, refer to Step 3: Set a job scheduling policy and publish.
For custom requirements, please contact technical support to submit your request.
Refer to "Scenarios": 1min.
Aggregated Field List
Configure the following:
Refer to "Scenarios":
Aggregated Field: Choose air conditioner temperature temperature.
Aggregation Result Field Name: Highest temperature per minute max_temperature.
Aggregate Operation: Select MAX.
Advanced Configuration
Latency Toleration
Unit of measurement: seconds.
To address message disorder that results in data being reported after the actual window, configuring this field enables the delayed data to be attributed to the appropriate window. This approach aligns with the watermark mechanism in Flink SQL. For more information, see Time attributes.
Use the default setting.
Below is a specific configuration example for "Scenarios":
To finalize the data aggregation node configuration, single click Save in the upper right corner of the data parsing workbench.
ImportantNote that the output fields of this node include the grouping fields and the aggregation result fields list. Fields from preceding nodes will not be passed through this node.
What to do next
Following the configuration of the adjacent message calculation node, proceed to set up other processing nodes or configure the target node to complete the entire parsing task.
For instructions on setting up value conversion, see Configuration Value Conversion.
For instructions on setting up adjacent message calculations, see Configure adjacent message computing.