This topic describes how to develop an SQL job in fully managed Flink and also describes the limits on the job development.
- SQL jobs that are published by using the SQL editor support only Flink 1.11, Flink 1.12, and Flink 1.13.
- For more information about the upstream and downstream storage supported by SQL, see Upstream and downstream storage.
To help you write and manage Flink SQL jobs and make job development more efficient, fully managed Flink offers you a set of Flink SQL features. You can use these features to manage metadata, register user-defined functions (UDFs), and use the SQL editor.
- Log on to the console of fully managed Flink and create a job.
- Log on to the Realtime Compute for Apache Flink console.
- On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
- In the left-side navigation pane, click Draft Editor.
- Click Create.
- In the New Draft dialog box, configure the parameters of the job. The following table describes the
Parameter Description Name The name of the job.Note The job name must be unique in the current project. Type The following file types are supported by streaming jobs and batch jobs:
Note Batch jobs are supported by Ververica Platform (VVP) 2.4.1 and later minor versions and Ververica Runtime (VVR) 3.0.1 and later minor versions.
Deployment Target The cluster in which the job is deployed. You must select a cluster type before you can select a cluster. The following cluster types are supported:
Note If you need to enable the SQL preview feature, you must select Session Clusters and turn on Use for SQL Editor previews. For more information, see Debug a job and Configure a development and test environment (session cluster).
- Per-Job Clusters: suitable for jobs that consume a large number of resources or jobs that run in a continuous and stable manner. This is the default value. Each job requires an independent JobManager to achieve resource isolation between jobs. Therefore, the resource utilization of JobManagers for jobs that involve a small amount of data is low.
- Session Clusters: suitable for jobs that consume few resources or jobs that start and stop frequently. Multiple jobs can reuse the same JobManager. This improves the resource utilization of JobManager.
Storage Location The folder in which the code file of the job is saved.
You can click the icon to the right of an existing folder to create a subfolder.
- Click OK.
- On the Draft Editor page, write data definition language (DDL) and data manipulation
language (DML) statements. Sample statements:
-- Create a source table named datagen_source. CREATE TEMPORARY TABLE datagen_source( name VARCHAR ) WITH ( 'connector' = 'datagen' ); -- Create a result table named blackhole_sink. CREATE TEMPORARY TABLE blackhole_sink( name VARCHAR ) WITH ( 'connector' = 'blackhole' ); -- Insert data from the source table datagen_source into the result table blackhole_sink. INSERT INTO blackhole_sink SELECT name from datagen_source;
- On the right side of the Draft Editor page, click Advanced and enter the configuration information. The following table describes the parameters.
Section Parameter Description Basic Configurations Deployment Target You can change the cluster that you selected when you created the job to another one. Additional Dependencies If you want to add more dependency files, select a file or enter a valid file address in this field.Note Session clusters do not support the configuration of the Additional Dependencies parameter. Only Per-Job clusters support the configuration of the Additional Dependencies parameter. Configurations Engine Version You can view the engine version of Flink that is used by the job. Edit Labels You can specify labels for jobs. This way, you can easily search for jobs by using the labels. Behavior Max Job Creation Attempts The number of retries allowed after the instance fails to be created. Stop with Drain When the Stop with Drain feature is enabled, all event time-based windows are triggered when you manually stop a job. This applies to manual suspension. Flink Configuration Checkpointing Interval The interval at which checkpoints are scheduled. If you do not configure this parameter, the checkpointing feature is disabled. Min Time Between Checkpoints The minimum interval between two checkpoints. If the maximum parallelism of checkpoints is 1, this parameter specifies the minimum interval between two checkpoints. Enabled Unaligned Checkpoints If you turn on Enabled Unaligned Checkpoints, the running time of a checkpoint is significantly reduced when backpressure exists. However, this increases the state size of a single checkpoint. Flink Restart Strategy Configuration If a task fails and the checkpointing feature is disabled, JobManager cannot be restarted. If the checkpointing feature is enabled, JobManager is restarted. Valid values:
- Failure Rate: JobManager is restarted if the number of failures within the specified interval exceeds the upper limit.
- Fixed Delay: JobManager is restarted at a fixed interval.
- No Restarts: JobManager is not restarted. This is the default value.
Additional Configuration Other Flink settings, such as
Logging Log Archiving By default, Allow Log Archives is turned on. The Log Archive Expires parameter is set to 7. Unit: days. After Allow Log Archives is turned on in the Logging section, you can view the logs of a historical job instance on the Logs tab. For more information, see View the logs of a historical job instance.Note
- In VVR 3.X, only VVR 3.0.7 and later minor versions allow you to turn on Allow Log Archives in the Logging section of the Advanced tab for a job.
- In VVR 4.X, only VVR 4.0.11 and later minor versions allow you to turn on Allow Log Archives in the Logging section of the Advanced tab for a job.
Root Log Level You can specify the following log levels. The levels are listed in ascending order of urgency.
- TRACE: records finer-grained information than DEBUG logs.
- DEBUG: records the status of the system.
- INFO: records important system information.
- WARN: records the information about potential issues.
- ERROR: records the information about errors and exceptions that occur.
Log Levels Enter the log name and log level. Logging Profile The log template. You can use the system template or configure a custom template.
- On the right side of the Draft Editor page, click the Resources tab and configure the parameters. You can select one of the following configuration modes:
- Basic: the resource configuration mode provided by Apache Flink. In this mode, you can
configure the following parameters.
Parameter Description Parallelism The global parallelism for the job. Default value: 1. Job Manager CPUs Default value: 1. Job Manager Memory Minimum value: 1 GiB. We recommend that you use GiB or MiB as the unit. For example, you can set this parameter to 1024 MiB or 1.5 GiB. Task Manager CPUs Default value: 1. Task Manager Memory Minimum value: 1 GiB. We recommend that you use GiB or MiB as the unit. For example, you can set this parameter to 1024 MiB or 1.5 GiB.
- Expert (BETA): a new resource configuration mode that is provided by fully managed Flink.
In this configuration mode, you can control the resources used by jobs in a fine-grained
manner to meet your business requirements for high job throughput.
The system automatically runs jobs on native Kubernetes based on your resource configurations. The system also automatically determines the specifications and number of TaskManagers based on slot specifications and job parallelism.Note Only SQL jobs support the expert mode.
- Auto (BETA): an automatic configuration mode that is based on the expert mode. In this
configuration mode, jobs use the resource configurations that are configured in expert
mode. Autopilot is also enabled.
In auto configuration mode, you do not need to configure related resources. When you run a job, Autopilot automatically generates resource configurations for the job and adjusts the resource configurations based on the status of the job. This optimizes resource utilization of the job without affecting the health of the job. For more information about Autopilot, see Configure Autopilot.
- Basic: the resource configuration mode provided by Apache Flink. In this mode, you can configure the following parameters.
- Click Save.
- Click Validate.
- Click Publish. After the job development and syntax check are complete, you can publish the job to publish the data to the production environment.