Getting started with a Flink JAR deployment - Realtime Compute for Apache Flink

This topic describes how to create and start a JAR streaming deployment and a JAR batch deployment in the console of fully managed Flink.

Prerequisites

Your RAM user or RAM role has the required permissions on the console of fully managed Flink when you log on to the console of fully managed Flink by using the RAM user or the Alibaba Cloud account that is assigned the RAM role. For more information, see Permission management.
A workspace of fully managed Flink is created. For more information, see Activate fully managed Flink.

Step 1: Develop a JAR package

JAR packages cannot be developed in the console of fully managed Flink. Therefore, you need to develop JAR packages in your on-premises environment. For more information about the methods that are used to develop DataStream API drafts, how to debug DataStream API drafts, and how to use a connector, see Develop a JAR draft.

This topic describes how to perform various operations on a Flink JAR streaming deployment and a JAR batch deployment in the console of fully managed Flink. The test JAR package and input data file are provided for subsequent operations.

You can click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package.
Note
This test JAR package is used to collect the number of times a word appears. If you want to analyze the source code, click FlinkQuickStart.zip to download the package and compile the code.
You can click Shakespeare to download the input data file Shakespeare.

Step 2: Upload the test JAR package and data file

Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Artifacts.
In the upper-left corner of the Artifacts page, click Upload Artifact and select the JAR package that you want to upload.

Note

In this example, the input data file, the test JAR package, and the output data file are stored in a bucket named flink-test-oss in the Object Storage Service (OSS) console. The uploaded files are stored in the oss://flink-test-oss/artifacts/namespaces/flink-test-default directory.

Step 3: Create a JAR deployment

Streaming deployment

Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.

In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.

Parameter	Description	Example
Deployment Type	Select JAR or PYTHON.	JAR
Deployment Mode	Select Stream Mode or Batch Mode.	Stream Mode
Deployment Name	Enter the name of the JAR deployment that you want to create.	flink-streaming-test-jar
Engine Version	The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.	vvr-6.0.7-flink-1.15
JAR URI	Click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package, and click the icon on the right side of the JAR URI field to select the test JAR package and upload the package.	oss://flink-test-oss/artifacts/namespaces/flink-test-default/FlinkQuickStart-1.0-SNAPSHOT.jar
Entry Point Class	The entry point class of the program. If you do not specify a main class for the JAR package, enter a standard path in the Entry Point Class field. Note In this example, the test JAR package contains both streaming deployment code and batch deployment code. Therefore, you must configure this parameter to specify a program entry point for the streaming deployment.	org.example.WordCountStreaming
Entry Point Main Arguments	You can enter parameters and call the parameters in the main method. In this example, you can enter the OSS directory in which you want to store the input data file.	--input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare
Additional Dependencies	You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.	Not required
Deployment Target	Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster. Note Metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.	default-queue
Description	Optional. You can enter the description for the deployment.	Not required
Label	After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.	Not required
More Setting	If you turn on the switch, you must configure the following parameters: Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication. principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.	Not required

Note

For more information about how to configure the parameters, see Create a deployment.

Click Deploy.

Batch deployment

Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.

In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.

Parameter	Description	Example
Deployment Type	Select JAR or PYTHON.	JAR
Deployment Mode	Select Stream Mode or Batch Mode.	Batch Mode
Deployment Name	Enter the name of the JAR deployment that you want to create.	flink-batch-test-jar
Engine Version	The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.	vvr-6.0.7-flink-1.15
JAR URI	Click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package, and click the icon on the right side of the JAR URI field to select the test JAR package and upload the package.	oss://flink-test-oss/artifacts/namespaces/flink-test-default/FlinkQuickStart-1.0-SNAPSHOT.jar
Entry Point Class	The entry point class of the program. If you do not specify a main class for the JAR package, enter a standard path in the Entry Point Class field. Note In this example, the test JAR package contains both streaming deployment code and batch deployment code. Therefore, you must configure this parameter to specify a program entry point for the batch deployment.	org.example.WordCountBatch
Entry Point Main Arguments	The OSS directory in which you want to store the input data file. Note In this example, the output data file is stored in the same directory as the test JAR package. You need to only specify the directory and name of the output data file. You do not need to create an output data file in the specified directory in advance.	--input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare --output oss://flink-test-oss/artifacts/namespaces/flink-test-default/batch-quickstart-test-output.txt
Additional Dependencies	You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.	Not required
Deployment Target	Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster. Note Metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.	default-queue
Description	Optional. You can enter the description for the deployment.	Not required
Label	After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.	Not required
More Setting	If you turn on the switch, you must configure the following parameters: Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication. principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.	Not required

Click Deploy.

Step 3: Start the deployment and view the computing result

On the Deployments page in the console of fully managed Flink, find the desired deployment and click Start in the Actions column.
In the Start Job dialog box, configure the parameters.
For more information about the parameters that you must configure when you start your deployment, see Start a deployment.
In the Start Job dialog box, click Start.
After the deployment is started, the deployment status changes to RUNNING. This indicates that the deployment is running properly.
Important
If you want to start a batch deployment, you must change the deployment type from STREAM to BATCH from the drop-down list of the deployment type on the Deployments page. By default, the system displays streaming deployments.
View the computing result
Note
The Taskmanager.out log file contains a maximum of 2000 data records. Therefore, the number of data records in the computing result of a streaming deployment is different from the number of data records in the computing result of a batch deployment. For more information about the limits on the number of data records that the Taskmanager.out log file contains, see Print connector.
- Computing result of a streaming deployment
  On the Deployments page, click the name of the desired deployment. On the page that appears, click Exploration. On the Running Task Managers tab, click the value in the Path, ID column. On the page that appears, click the Log List tab. Find the log file whose name ends with .out in the Log Name column and click the name of the log file. Then, search for the shakespeare keyword in the log file to view the computing result.
- Computing result of a batch deployment
  Log on to the OSS console and view the computing result of a batch deployment in the directory in which the output data file is stored.
  In this example, the output data file is stored in the oss://flink-test-oss/artifacts/namespaces/flink-test-default/batch-quickstart-test-output.txt directory.

(Optional) Step 4: Cancel the deployment

If you modify your deployment, you must publish the deployment again, cancel the deployment, and then restart the deployment to make the modification take effect. If the deployment fails and cannot reuse the state data to recover, you must cancel and then restart the deployment. For more information about how to cancel a deployment, see Cancel a deployment.

References

After you deploy a JAR draft, you can configure automatic tuning for the draft to improve resource utilization. For more information about automatic tuning, see Configure automatic tuning.
For more information about how to create an SQL deployment, see Getting started with a Flink SQL deployment.
For more information about how to create a Python deployment, see Getting started with a Flink Python deployment.
For more information about how to ingest data into data warehouses in real time, see Ingest data into data warehouses in real time.
For more information about how to build a real-time data warehouse by using Realtime Compute for Apache Flink and Hologres, see Build a real-time data warehouse by using Realtime Compute for Apache Flink and Hologres.
For more information about how to build a streaming data lakehouse by using Realtime Compute for Apache Flink and Apache Paimon, see Build a streaming data lakehouse by using Realtime Compute for Apache Flink and Apache Paimon.