This topic describes how to create and start a JAR streaming deployment and a JAR batch deployment in the console of fully managed Flink.
Prerequisites
Your RAM user or RAM role has the required permissions on the console of fully managed Flink when you log on to the console of fully managed Flink by using the RAM user or the Alibaba Cloud account that is assigned the RAM role. For more information, see Permission management.
A workspace of fully managed Flink is created. For more information, see Activate fully managed Flink.
Step 1: Develop a JAR package
JAR packages cannot be developed in the console of fully managed Flink. Therefore, you need to develop JAR packages in your on-premises environment. For more information about the methods that are used to develop DataStream API drafts, how to debug DataStream API drafts, and how to use a connector, see Develop a JAR draft.
This topic describes how to perform various operations on a Flink JAR streaming deployment and a JAR batch deployment in the console of fully managed Flink. The test JAR package and input data file are provided for subsequent operations.
You can click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package.
NoteThis test JAR package is used to collect the number of times a word appears. If you want to analyze the source code, click FlinkQuickStart.zip to download the package and compile the code.
You can click Shakespeare to download the input data file Shakespeare.
Step 2: Upload the test JAR package and data file
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Artifacts.
In the upper-left corner of the Artifacts page, click Upload Artifact and select the JAR package that you want to upload.
In this example, the input data file, the test JAR package, and the output data file are stored in a bucket named flink-test-oss in the Object Storage Service (OSS) console. The uploaded files are stored in the oss://flink-test-oss/artifacts/namespaces/flink-test-default directory.
Step 3: Create a JAR deployment
Streaming deployment
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.
In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.
Parameter
Description
Example
Deployment Type
Select JAR or PYTHON.
JAR
Deployment Mode
Select Stream Mode or Batch Mode.
Stream Mode
Deployment Name
Enter the name of the JAR deployment that you want to create.
flink-streaming-test-jar
Engine Version
The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.
vvr-6.0.7-flink-1.15
JAR URI
Click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package, and click the icon on the right side of the JAR URI field to select the test JAR package and upload the package.
oss://flink-test-oss/artifacts/namespaces/flink-test-default/FlinkQuickStart-1.0-SNAPSHOT.jar
Entry Point Class
The entry point class of the program. If you do not specify a main class for the JAR package, enter a standard path in the Entry Point Class field.
NoteIn this example, the test JAR package contains both streaming deployment code and batch deployment code. Therefore, you must configure this parameter to specify a program entry point for the streaming deployment.
org.example.WordCountStreaming
Entry Point Main Arguments
You can enter parameters and call the parameters in the main method. In this example, you can enter the OSS directory in which you want to store the input data file.
--input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare
Additional Dependencies
You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.
Not required
Deployment Target
Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster.
NoteMetrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.
default-queue
Description
Optional. You can enter the description for the deployment.
Not required
Label
After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.
Not required
More Setting
If you turn on the switch, you must configure the following parameters:
Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication.
principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.
Not required
NoteFor more information about how to configure the parameters, see Create a deployment.
Click Deploy.
Batch deployment
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.
In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.
Parameter
Description
Example
Deployment Type
Select JAR or PYTHON.
JAR
Deployment Mode
Select Stream Mode or Batch Mode.
Batch Mode
Deployment Name
Enter the name of the JAR deployment that you want to create.
flink-batch-test-jar
Engine Version
The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.
vvr-6.0.7-flink-1.15
JAR URI
Click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package, and click the icon on the right side of the JAR URI field to select the test JAR package and upload the package.
oss://flink-test-oss/artifacts/namespaces/flink-test-default/FlinkQuickStart-1.0-SNAPSHOT.jar
Entry Point Class
The entry point class of the program. If you do not specify a main class for the JAR package, enter a standard path in the Entry Point Class field.
NoteIn this example, the test JAR package contains both streaming deployment code and batch deployment code. Therefore, you must configure this parameter to specify a program entry point for the batch deployment.
org.example.WordCountBatch
Entry Point Main Arguments
The OSS directory in which you want to store the input data file.
NoteIn this example, the output data file is stored in the same directory as the test JAR package. You need to only specify the directory and name of the output data file. You do not need to create an output data file in the specified directory in advance.
--input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare --output oss://flink-test-oss/artifacts/namespaces/flink-test-default/batch-quickstart-test-output.txt
Additional Dependencies
You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.
Not required
Deployment Target
Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster.
NoteMetrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.
default-queue
Description
Optional. You can enter the description for the deployment.
Not required
Label
After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.
Not required
More Setting
If you turn on the switch, you must configure the following parameters:
Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication.
principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.
Not required
Click Deploy.
Step 3: Start the deployment and view the computing result
On the Deployments page in the console of fully managed Flink, find the desired deployment and click Start in the Actions column.
In the Start Job dialog box, configure the parameters.
For more information about the parameters that you must configure when you start your deployment, see Start a deployment.
In the Start Job dialog box, click Start.
After the deployment is started, the deployment status changes to RUNNING. This indicates that the deployment is running properly.
ImportantIf you want to start a batch deployment, you must change the deployment type from STREAM to BATCH from the drop-down list of the deployment type on the Deployments page. By default, the system displays streaming deployments.
View the computing result
NoteThe Taskmanager.out log file contains a maximum of 2000 data records. Therefore, the number of data records in the computing result of a streaming deployment is different from the number of data records in the computing result of a batch deployment. For more information about the limits on the number of data records that the Taskmanager.out log file contains, see Print connector.
Computing result of a streaming deployment
On the Deployments page, click the name of the desired deployment. On the page that appears, click Exploration. On the Running Task Managers tab, click the value in the Path, ID column. On the page that appears, click the Log List tab. Find the log file whose name ends with .out in the Log Name column and click the name of the log file. Then, search for the shakespeare keyword in the log file to view the computing result.
Computing result of a batch deployment
Log on to the OSS console and view the computing result of a batch deployment in the directory in which the output data file is stored.
In this example, the output data file is stored in the oss://flink-test-oss/artifacts/namespaces/flink-test-default/batch-quickstart-test-output.txt directory.
(Optional) Step 4: Cancel the deployment
If you modify your deployment, you must publish the deployment again, cancel the deployment, and then restart the deployment to make the modification take effect. If the deployment fails and cannot reuse the state data to recover, you must cancel and then restart the deployment. For more information about how to cancel a deployment, see Cancel a deployment.
References
After you deploy a JAR draft, you can configure automatic tuning for the draft to improve resource utilization. For more information about automatic tuning, see Configure automatic tuning.
For more information about how to create an SQL deployment, see Getting started with a Flink SQL deployment.
For more information about how to create a Python deployment, see Getting started with a Flink Python deployment.
For more information about how to ingest data into data warehouses in real time, see Ingest data into data warehouses in real time.
For more information about how to build a real-time data warehouse by using Realtime Compute for Apache Flink and Hologres, see Build a real-time data warehouse by using Realtime Compute for Apache Flink and Hologres.
For more information about how to build a streaming data lakehouse by using Realtime Compute for Apache Flink and Apache Paimon, see Build a streaming data lakehouse by using Realtime Compute for Apache Flink and Apache Paimon.