Create a Flink SQL task - Dataphin - Alibaba Cloud Documentation Center

This topic describes how to create a Flink SQL task that uses the open source Flink real-time engine.

Prerequisites

Before you begin, ensure that the real-time engine is enabled for your project and that Flink is configured as the compute source. For more information, see Create a general project.

Permission description

Only super administrators, project administrators, and developers can create Flink SQL compute tasks.

Step 1: Create a Flink SQL task

On the Dataphin homepage, in the top menu bar, select Develop > Data Development.
From the top menu bar, select a project and an environment if your project uses the Dev-Prod mode.
In the navigation pane on the left, select Data Processing > Compute Task. In the list of compute tasks, click the icon and select Flink SQL.

In the Create Flink SQL Task dialog box, configure the task parameters.

Parameter	Description
Task Name	The naming conventions are as follows: It must contain only lowercase letters, digits, and underscores (_). It must be 4 to 63 characters long. It must be unique within the project. It must start with a letter.
Production Environment Resource Queue/Development Environment Resource Queue	If the Flink compute source for the project uses the Kubernetes deployment mode, you can select any resource group that is configured for real-time tasks, including resource groups in registered external clusters. If the Flink compute source for the project uses the YARN deployment mode, you can select the resource queue or resource group in the production environment cluster where the Flink SQL task is located. Note If your project is in Basic mode, you can configure only the production environment resource queue.
Production Environment Engine Version/Development Environment Engine Version	Select the Flink engine version for the task. Dataphin supports the following versions: 1.20.1 1.15.3 1.14.2 1.13.1 Note If your project is in Basic mode, you can configure only the engine version.
Storage Directory	Select the folder in which to store the task. If a folder does not exist, you can create a folder by performing the following steps: Click the icon above the compute task list on the left to open the Create Folder dialog box. In the Create Folder dialog box, enter a Name for the folder and select a location in Select Directory. Click OK.
Creation Method	You can select one of the following creation methods: Create Empty Task, Reference Sample Code, or Use Template. Create Empty Task: Creates an empty Flink SQL task. Reference Sample Code: Creates a task using the built-in sample code. Use Template: Creates a task based on a real-time computing task template.
Description	Enter a brief description of the Flink SQL task. The description can be up to 1,000 characters in length.

Click OK.

Step 2: Develop and precompile the Flink SQL task code

On the Flink SQL task code page, write your task code.
Dataphin lets you quickly create meta tables using native Data Definition Language (DDL) statements. If Dataphin detects a native create table/create temporary table statement, you can click the prompt icon in the editor to quickly create a meta table. For more information, see Flink SQL task development methods.
After you finish writing the code, click the Format button in the top menu bar to automatically format the SQL code.
Click Precompile in the top menu bar to check the syntax of the code and the required permissions.
If the precompilation is successful, the message Precompilation Successful is displayed. If the precompilation fails, the message Precompilation Failed is displayed. You can click Console at the bottom of the page to view the failure logs.

Step 3: Configure the Flink task

Click Configuration in the editor sidebar.
In the configuration dialog box, you can configure the Real-time Mode and Offline Mode for the Flink task.
Note
Dataphin real-time computing supports integrated stream and batch tasks using a unified computing engine. You can configure both Stream and Batch task settings in a single codebase to generate instances in different modes. To enable batch processing, enable offline mode on the task configuration page and configure settings such as resources and scheduling dependencies.
- Real-time Mode
  - Resource Configuration (Required): Configure the resource queues, engine versions, degree of parallelism, Task Manager count, Job Manager Memory, and Task Manager Memory for the task in the production and developer environments. For more information, see Configure Ververica Flink real-time mode resources.
  - Variable Configuration: Assign values to variables that are used in the compute task code. This allows the variables to be automatically replaced with their assigned values during execution. For more information, see Real-time mode variable configuration.
  - Checkpoint Configuration: Configure checkpoints for the Flink SQL task. Checkpoints allow the task to be restored to its pre-crash state if the program restarts after an unexpected failure. For more information, see Real-time mode Checkpoint configuration.
  - State Configuration: Configure the interval for automatically cleaning up data in the State. For more information, see Real-time mode State configuration.
  - Runtime parameters: Configure runtime parameters to control the execution behavior and performance of Flink applications. For more information, see Real-time mode runtime parameter configuration.
  - Dependency Files: Configure the resource files that the task depends on. For more information, see Real-time mode dependency file configuration.
  - Dependencies: Configuring dependencies helps you quickly understand upstream and downstream tasks when you troubleshoot and debug. For more information, see Real-time mode dependency configuration.
- Offline Mode (Beta)
  Important
  Offline mode is not supported if the real-time compute source for the project is an open source Flink cluster that uses the Kubernetes (k8s) deployment mode.
  - Resource Configuration (Required): Configure the resource queues, engine versions, degree of parallelism, Task Manager count, Job Manager Memory, and Task Manager Memory for the task in the production and developer environments. For more information, see Configure open source Flink offline mode resources.
  - Variable Configuration: Assign values to variables that are used in the compute task code. This allows the variables to be automatically replaced with their assigned values during execution. For more information, see Offline mode variable configuration.
  - Runtime parameters: Configure runtime parameters to control the execution behavior and performance of Flink applications. For more information, see Offline mode runtime parameter configuration.
  - Dependency Files: Configure the resource files that the Flink SQL task depends on. For more information, see Offline mode dependency file configuration.
  - Scheduling Configuration (Required): Define how nodes are periodically scheduled in the production environment. You can use the scheduling properties to configure the scheduling cycle and effective date for the task. For more information, see Offline mode scheduling configuration.
  - Dependencies (Required): Configuring dependencies helps you quickly understand upstream and downstream tasks when you troubleshoot and debug. For more information, see Offline mode dependency configuration.
Click OK.

Step 4: Debug the Flink task code

You can debug your Flink code in Dataphin. Click the Debug button in the top menu bar to sample data for the task and perform local debugging to verify that the code is correct.
In the debug configuration dialog box, select Real-time Mode - FLINK Stream Task (real-time mode debugging) or Offline Mode - FLINK Batch Task (offline mode debugging).
- Real-time mode debugging: This mode samples data from the corresponding real-time physical table and performs local debugging in Flink Stream mode. For more information, see Real-time mode debugging.
- Offline mode debugging: This mode samples data from the corresponding offline physical table and performs local debugging in Flink Batch mode. For more information, see Offline mode debugging.

Note

Currently, you can debug in only one mode at a time. After you select a mode, data is sampled from the corresponding table for debugging.

Step 5: Submit the Flink SQL task

Click the Submit button in the top menu bar.
In the Submit dialog box, review the Submission Content and Pre-check information, and enter Submission Remarks.
Click OK And Submit.

Note

If your project is in Dev-Prod mode, you must publish the Flink SQL task to the production environment. For more information, see Manage publishing tasks.

What to do next

After the task is successfully submitted, you can view and manage its operations and maintenance (O&M) in the Operation Center to ensure that the task runs properly. For more information, see View and manage real-time tasks.