The Flink SQL Streaming node in Data Studio lets you write standard SQL to define real-time stream processing logic. It supports event time and processing time, provides state management and fault tolerance, and integrates with systems such as Kafka and Hadoop Distributed File System (HDFS). It also provides detailed logs and performance monitoring tools.
Prerequisites
Before you begin, make sure you have:
-
Attached a Realtime Compute for Apache Flink computing resource in the Management Center. For more information, see Attach a computing resource.
-
Created a Flink SQL Streaming node. For more information, see Create a node for a scheduled workflow.
Step 1: Write the SQL code
In the SQL editing area on the Flink SQL Streaming node editing page, write your task code. To pass values dynamically at runtime, define variables using the ${variable_name} format in your SQL, then assign values in the Script Parameters section of the Real-time Configuration pane on the right.
The following example creates a source table and a sink table, then filters rows based on a parameterized name length:
-- Create the source table datagen_source.
CREATE TEMPORARY TABLE datagen_source(
name VARCHAR
) WITH (
'connector' = 'datagen'
);
-- Create the sink table blackhole_sink.
CREATE TEMPORARY TABLE blackhole_sink(
name VARCHAR
) WITH (
'connector' = 'blackhole'
);
-- Insert data from the source table into the sink table.
INSERT INTO blackhole_sink
SELECT
name
FROM datagen_source WHERE LENGTH(name) > ${name_length};
In this example, set name_length to 5 in Script Parameters. The job then filters out rows where the name is 5 characters or shorter.
Step 2: Configure the node
Configure Flink resources
Open the Real-time Configuration pane on the right side of the editing page and locate the Flink Resource Information section.
Choose a resource mode
Before setting individual parameters, decide which resource mode fits your use case:
| Mode | When to use |
|---|---|
| Basic Mode (default) | New users or straightforward jobs. Uses simplified settings so you can launch quickly without deep knowledge of Flink internals. |
| Expert Mode | Experienced users who need fine-grained control over performance and resource allocation for complex or high-throughput jobs. |
For background on Flink's architecture—including how JobManagers and TaskManagers interact—see Flink Architecture.
Basic mode parameters
| Parameter | Description |
|---|---|
| Flink Cluster | The fully managed Flink computing resource attached in the Management Center. |
| Flink Engine Version | Select the engine version for your job. |
| Resource Group | Select a Serverless resource group that can connect to the Flink network. |
| JobManager CPU | CPU allocated to the JobManager, which coordinates distributed task execution. Minimum: 0.5 cores. Recommended: 1 core. Maximum: 16 cores. Adjust based on cluster size and job complexity. |
| JobManager memory | Memory allocated to the JobManager. Recommended range: 2–64 GiB. Adjust based on cluster size and job requirements. |
| TaskManager CPU | CPU allocated to each TaskManager, which executes subtasks. Minimum: 0.5 cores. Recommended: 1 core. Maximum: 16 cores. |
| TaskManager Memory | Memory allocated to each TaskManager. Minimum: 2 GiB. Maximum: 64 GiB. |
| Concurrency | The number of parallel tasks in the job. Higher concurrency improves throughput but requires more resources. Set this based on your cluster capacity and job characteristics. |
| Slots per TaskManager | The number of task slots each TaskManager provides. Adjust to balance resource utilization and parallel processing. |
Expert mode parameters
| Parameter | Description |
|---|---|
| JobManager CPU | Minimum: 0.25 cores. Maximum: 16 cores. |
| JobManager memory | Recommended range: 1–64 GiB. |
| Slots per TaskManager | Configurable to optimize resource utilization. |
| Multi-SSG Mode | By default, all operators share one Slot Sharing Group (SSG), so you cannot set resources per operator individually. Enable Multi-SSG Mode to assign each operator its own slot and configure resources directly on that slot. |
For more details on resource configuration, see Configure job resources.
(Optional) Configure script parameters
To use the ${variable_name} variables defined in your SQL, click Add Parameter in the Script Parameters section of the Real-time Configuration pane, then fill in Parameter Name and Parameter Value.
(Optional) Configure Flink runtime parameters
In the Flink Running Parameters section of the Real-time Configuration pane, configure the following settings. For more information, see Configure job deployment information.
| Parameter | Description |
|---|---|
| System Checkpoint Interval | How often Flink takes a checkpoint snapshot of the job's state. Shorter intervals reduce recovery time after a failure but add overhead to normal processing. If left blank, checkpoints are disabled. |
| Minimum Interval Between System Checkpoints | The minimum wait time between two consecutive checkpoints. This prevents checkpoints from running back-to-back when checkpoint concurrency is set to 1, protecting throughput. |
| State Data TTL | How long state data is retained without being accessed or updated before it expires. Default: 36 hours. After expiry, the state is purged, which keeps memory usage in check. <br><br> Important
SQL operations such as unbounded stream joins and aggregations without windowing accumulate state continuously. Without a TTL, state can grow indefinitely, leading to memory pressure and higher costs. The default of 36 hours differs from the open-source Flink default of |
| Other Configurations | Additional Flink runtime parameters in key:value format. For example: taskmanager.network.memory.max:4g. |
After configuring the parameters, click Save.
Step 3: Publish and start the node
Publish the node
Publish the node to the Operation Center before running it. Follow the on-screen instructions to publish. For more information, see Publish a node or workflow.
Publishing also syncs the task to the Flink vvp space. After publishing, the task appears in the Flink vvp Operation Center under Job O&M.
Start the node
-
After publishing, click Go To O&M under Publish To Production Environment.
-
In the Operation Center, go to Task O&M > Real-time Task O&M > Real-time Computing Task.
-
Find the task and click Start in the Actions column. You can then view its running status.
What's next
-
Configure job resources — fine-tune JobManager and TaskManager allocation for production workloads
-
Configure job deployment information — full reference for Flink runtime parameters
-
Publish a node or workflow — publish workflows that include multiple nodes