Debug a deployment - Realtime Compute for Apache Flink - Alibaba Cloud Documentation Center

You can enable the debugging feature to simulate deployment running, check outputs, and verify the business logic of SELECT and INSERT statements. This feature improves development efficiency and reduces the risks of poor data quality. This topic describes how to debug a Flink SQL deployment.

Background information

The deployment debugging feature allows you to verify the correctness of the deployment logic in the console of fully managed Flink. During the debugging process, data is not written to the result table regardless of the type of the result table. When you use the deployment debugging feature, you can use the upstream online data or specify debugging data. You can debug complex deployments that include multiple SELECT or INSERT statements. SQL query statements allow you to use UPSERT statements that contain update operations, such as count(*).

Limits

To use the deployment debugging feature, you must create a session cluster.
You can debug only SQL deployments.
You cannot debug deployments in which the CREATE TABLE AS or CREATE DATABASE AS statement is executed.
MySQL CDC source tables are not written in append-only mode. Therefore, you cannot debug data of MySQL CDC source tables for session clusters of VVR 4.0.8 or an earlier version.
By default, fully managed Flink reads a maximum of 1,000 data records. If the number of data records that are read by fully managed Flink reaches the upper limit, fully managed Flink stops reading data.

Precautions

When you create a session cluster, the cluster resources are consumed. The resource consumption is based on the configurations that you select when you create the cluster.
An addition of 0.5 compute units (CUs) are consumed after a session cluster that uses Ververica Runtime (VVR) 3.0.4 or later is run.
Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. If you debug a deployment that is deployed in a session cluster, the resource utilization of the JobManager increases. If you use a session cluster in the production environment, the reuse mechanism of the JobManager negatively affects the stability among deployments. The following stability issues may occur:
- If the JobManager is faulty, all deployments of a cluster that runs on the JobManager are affected.
- If a TaskManager is faulty, the deployments that have tasks running on the TaskManager are affected.
- If processes are not isolated for tasks that run on the same TaskManager, the tasks may be affected by each other.
If the session cluster uses the default configurations, take note of the following points:
- For a single small deployment, we recommend that the total number of such deployments in a cluster be no more than 100.
- For complex deployments, we recommend that the number of parallel deployments be no more than 512, and the number of clusters in which 64 medium-sized deployments run in parallel be no more than 32. Otherwise, issues such as heartbeat timeout may occur and the stability of the cluster may be affected. In this case, you must increase the heartbeat interval and heartbeat timeout period.
- If you want to run more tasks at the same time, you must increase the resource configuration of the session cluster.

Procedure

Step 1: Create a session cluster

Go to the Session Clusters page.
1. Log on to the Realtime Compute for Apache Flink console.
2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
3. In the left-side navigation pane, click Session Clusters.
In the upper-left corner of the Session Clusters page, click Create Session Cluster.

Configure the parameters.

The following table describes the parameters.

Section	Parameter	Description
Standard	Name	The name of the session cluster that you want to create.
	Deployment Target	The queue in which the deployment is deployed. For more information about how to create a queue, see Manage queues
	State	The desired state of the cluster. Valid values: RUNNING: The cluster keeps running after it is configured. STOPPED: The cluster is stopped after it is configured, and the deployments that are deployed in the cluster are also stopped.
	Label key	You can configure labels for deployments in the Labels section. This allows you to find a deployment on the Overview page in an efficient manner.
	Label value	N/A.
Configuration	Engine version	The version of the Flink engine that is used by the current deployment. For more information about engine versions, see Engine version and Lifecycle policies. We recommend that you use a recommended version or a stable version. Engine versions are classified into the following types: Recommend: the latest minor version of the latest major version. Stable: the latest minor version of a major version that is still in the service period of the product. Defects in previous versions are fixed in such a version. Normal: other minor versions that are still in the service period of the product. Deprecated: the version that exceeds the service period of the product.
	Flink Restart Strategy	Valid values: Failure Rate: The JobManager is restarted if the number of failures within the specified interval exceeds the upper limit. If you select this option, you must configure the Failure Rate Interval, Max failures per interval, and Delay Between Restart Attempts parameters. Fixed Delay: The JobManager is restarted at a fixed interval. If you select this option, you must configure the Number of Restart Attempts and Delay Between Restart Attempts parameters. No Restarts: Deployments are not restarted if the deployments fail. Important If you leave this parameter empty, the default Apache Flink restart policy is used. In this case, if a task fails and checkpointing is disabled, the JobManager is not restarted. If you enable checkpointing, the JobManager is restarted.
	Other Configuration	Configure other Flink settings, such as `taskmanager.numberOfTaskSlots: 1`.
Resources	Number of Task Managers	By default, the value is the same as the parallelism.
	JobManager CPU Cores	Default value: 1.
	JobManager Memory	Minimum value: 1. Unit: GiB. Recommended value: 4. We recommend that you use GiB or MiB as the unit. For example, you can set this parameter to 1024 MiB or 1.5 GiB. We recommend that you configure JobManager resources and heartbeat-related parameters for the JobManager. When you configure this parameter, take note of the following points: The JobManager provides features, such as TaskManager heartbeat, task serialization, and resource scheduling. Therefore, we recommend that the resource configuration for the JobManager be no less than the default configuration. Handle this issue based on the workload on your cluster. To ensure cluster stability, you must prevent heartbeat timeout that is caused by the busy main thread of the JobManager. Therefore, we recommend that you set the heartbeat interval to at least 10 seconds and the heartbeat timeout period to at least 50 seconds. The heartbeat interval is specified by the heartbeat.interval parameter and the heartbeat timeout period is specified by the heartbeat.timeout parameter. You can increase the values of these parameters based on the number of TaskManagers and the increase in the number of deployments.
	TaskManager CPU Cores	Default value: 2.
	TaskManager Memory	Minimum value: 1. Unit: GiB. Recommended value: 8. We recommend that you use GiB or MiB as the unit. For example, you can set this parameter to 1024 MiB or 1.5 GiB. We recommend that you specify the number of slots for each TaskManager and the amount of resources that are available for TaskManagers. The number of slots is specified by the taskmanager.numberOfTaskSlots parameter. When you configure this parameter, take note of the following points: For a single small deployment, we recommend that you set the CPU-to-memory ratio of a single slot to 1:4 and configure at least 1 CPU core and 2 GiB of memory for each slot. For a complex deployment, we recommend that you configure at least 1 CPU core and 4 GiB of memory for each slot. If you use the default resource configuration, you can configure two slots for each TaskManager. We recommend that you use the default resource configuration for each TaskManager and set the number of slots to 2. Important If the resources configured for a TaskManager are insufficient, the stability of the deployments that run on the TaskManager is affected. In addition, the slots of the TaskManager cannot bear the overhead of the TaskManager because of insufficient slots. As a result, the resource utilization is reduced. If you configure a large number of resources for a TaskManager, a large number of deployments run on the TaskManager. If the TaskManager is faulty, all the deployments are affected.
Logging	Root Log Level	The following log levels are supported and listed in ascending order of urgency. TRACE: records finer-grained information than DEBUG logs. DEBUG: records the status of the system. INFO: records important system information. WARN: records the information about potential issues. ERROR: records the information about errors and exceptions that occur.
	Log Levels	The name and level of the log.
	Logging Profile	The log template. You can use a system template or configure a custom template.

Note

For more information about the options related to the integration between Flink and resource orchestration frameworks such as Kubernetes and Yarn, see Resource Orchestration Frameworks.

In the upper-left corner of the Session Clusters page, click Create Session Cluster.
After a session cluster is created, you can select the session cluster in the Debug dialog box when you debug a deployment or select the session cluster in the Deploy draft dialog box when you deploy a draft.

Step 2: Debug a deployment

Create an SQL deployment and write code for the deployment. For more information, see Develop an SQL draft.
In the upper-right corner of the SQL Editor page, click Debug. In the Debug dialog box, select a session cluster from the Session Cluster drop-down list. Then, click Next.

Configure debugging data.

If you use online data for debugging, click Confirm.

If you use debugging data to debug a deployment, click Download mock data template, enter the debugging data in the template, and then click Upload mock data to upload the debugging data. 使用调试数据

The following table describes the parameters in the Debug Mock Data step.

Parameter	Description
Download mock data template	You can download the debugging data template to edit data. The template is adapted to the data structure of the source table.
Upload mock data	If you want to debug a deployment by using debugging data, you can download the debugging data template, upload the data after you edit the template, and then select Use mock data. Limits on debugging data files: Only a CSV file is supported. A CSV file must contain a table header, such as id (INT). A CSV file can contain a maximum of 1,000 data records but cannot be greater than 1 MB.
Data Preview	After you upload the debugging data, click the icon on the left side of the name of the source table to preview the data and download the debugging data.
Code Preview	The deployment debugging feature automatically modifies the DDL statements in source tables and result tables. However, this feature does not change the code in deployments. You can preview code details in the lower part of Code Preview.

Click Confirm.
After you click Confirm, the debugging result appears in the lower part of the SQL script editor.

References

For more information about how to deploy a draft after you develop the draft or debug the deployment for the draft, see Create a deployment.
For more information about how to start a deployment for a draft after the draft is deployed, see Start a deployment.
For more information about how to debug a Flink JAR deployment and Flink Python deployment, see Develop a JAR draft or Debug a deployment.
For more information about how to create and deploy an SQL draft and start the deployment for the draft, see Getting started with a Flink SQL deployment.