All Products
Search
Document Center

Realtime Compute for Apache Flink:Getting started with a Flink JAR deployment

Last Updated:Mar 27, 2024

This topic describes how to create and start a JAR streaming deployment and a JAR batch deployment in the console of fully managed Flink.

Prerequisites

  • Your RAM user or RAM role has the required permissions on the console of fully managed Flink when you log on to the console of fully managed Flink by using the RAM user or the Alibaba Cloud account that is assigned the RAM role. For more information, see Permission management.

  • A workspace of fully managed Flink is created. For more information, see Activate fully managed Flink.

Step 1: Develop a JAR package

JAR packages cannot be developed in the console of fully managed Flink. Therefore, you need to develop JAR packages in your on-premises environment. For more information about the methods that are used to develop DataStream API drafts, how to debug DataStream API drafts, and how to use a connector, see Develop a JAR draft.

This topic describes how to perform various operations on a Flink JAR streaming deployment and a JAR batch deployment in the console of fully managed Flink. The test JAR package and input data file are provided for subsequent operations.

  • You can click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package.

    Note

    This test JAR package is used to collect the number of times a word appears. If you want to analyze the source code, click FlinkQuickStart.zip to download the package and compile the code.

  • You can click Shakespeare to download the input data file Shakespeare.

Step 2: Upload the test JAR package and data file

  1. Log on to the Realtime Compute for Apache Flink console.

  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Artifacts.

  4. In the upper-left corner of the Artifacts page, click Upload Artifact and select the JAR package that you want to upload.

Note

In this example, the input data file, the test JAR package, and the output data file are stored in a bucket named flink-test-oss in the Object Storage Service (OSS) console. The uploaded files are stored in the oss://flink-test-oss/artifacts/namespaces/flink-test-default directory.

Step 3: Create a JAR deployment

Streaming deployment

  1. Log on to the Realtime Compute for Apache Flink console.

  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.

  4. In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.

    Parameter

    Description

    Example

    Deployment Type

    Select JAR or PYTHON.

    JAR

    Deployment Mode

    Select Stream Mode or Batch Mode.

    Stream Mode

    Deployment Name

    Enter the name of the JAR deployment that you want to create.

    flink-streaming-test-jar

    Engine Version

    The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.

    vvr-6.0.7-flink-1.15

    JAR URI

    Click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package, and click the 上传 icon on the right side of the JAR URI field to select the test JAR package and upload the package.

    oss://flink-test-oss/artifacts/namespaces/flink-test-default/FlinkQuickStart-1.0-SNAPSHOT.jar

    Entry Point Class

    The entry point class of the program. If you do not specify a main class for the JAR package, enter a standard path in the Entry Point Class field.

    Note

    In this example, the test JAR package contains both streaming deployment code and batch deployment code. Therefore, you must configure this parameter to specify a program entry point for the streaming deployment.

    org.example.WordCountStreaming

    Entry Point Main Arguments

    You can enter parameters and call the parameters in the main method. In this example, you can enter the OSS directory in which you want to store the input data file.

    --input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare

    Additional Dependencies

    You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.

    Not required

    Deployment Target

    Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster.

    Note

    Metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.

    default-queue

    Description

    Optional. You can enter the description for the deployment.

    Not required

    Label

    After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.

    Not required

    More Setting

    If you turn on the switch, you must configure the following parameters:

    • Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication.

    • principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.

    Not required

    Note

    For more information about how to configure the parameters, see Create a deployment.

  5. Click Deploy.

Batch deployment

  1. Log on to the Realtime Compute for Apache Flink console.

  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.

  4. In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.

    Parameter

    Description

    Example

    Deployment Type

    Select JAR or PYTHON.

    JAR

    Deployment Mode

    Select Stream Mode or Batch Mode.

    Batch Mode

    Deployment Name

    Enter the name of the JAR deployment that you want to create.

    flink-batch-test-jar

    Engine Version

    The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.

    vvr-6.0.7-flink-1.15

    JAR URI

    Click FlinkQuickStart-1.0-SNAPSHOT.jar to download the test JAR package, and click the 上传 icon on the right side of the JAR URI field to select the test JAR package and upload the package.

    oss://flink-test-oss/artifacts/namespaces/flink-test-default/FlinkQuickStart-1.0-SNAPSHOT.jar

    Entry Point Class

    The entry point class of the program. If you do not specify a main class for the JAR package, enter a standard path in the Entry Point Class field.

    Note

    In this example, the test JAR package contains both streaming deployment code and batch deployment code. Therefore, you must configure this parameter to specify a program entry point for the batch deployment.

    org.example.WordCountBatch

    Entry Point Main Arguments

    The OSS directory in which you want to store the input data file.

    Note

    In this example, the output data file is stored in the same directory as the test JAR package. You need to only specify the directory and name of the output data file. You do not need to create an output data file in the specified directory in advance.

    --input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare --output oss://flink-test-oss/artifacts/namespaces/flink-test-default/batch-quickstart-test-output.txt

    Additional Dependencies

    You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.

    Not required

    Deployment Target

    Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster.

    Note

    Metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.

    default-queue

    Description

    Optional. You can enter the description for the deployment.

    Not required

    Label

    After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.

    Not required

    More Setting

    If you turn on the switch, you must configure the following parameters:

    • Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication.

    • principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.

    Not required

  5. Click Deploy.

Step 3: Start the deployment and view the computing result

  1. On the Deployments page in the console of fully managed Flink, find the desired deployment and click Start in the Actions column.

  2. In the Start Job dialog box, configure the parameters.

    For more information about the parameters that you must configure when you start your deployment, see Start a deployment.

  3. In the Start Job dialog box, click Start.

    After the deployment is started, the deployment status changes to RUNNING. This indicates that the deployment is running properly.

    Important

    If you want to start a batch deployment, you must change the deployment type from STREAM to BATCH from the drop-down list of the deployment type on the Deployments page. By default, the system displays streaming deployments.

  4. View the computing result

    Note

    The Taskmanager.out log file contains a maximum of 2000 data records. Therefore, the number of data records in the computing result of a streaming deployment is different from the number of data records in the computing result of a batch deployment. For more information about the limits on the number of data records that the Taskmanager.out log file contains, see Print connector.

    • Computing result of a streaming deployment

      On the Deployments page, click the name of the desired deployment. On the page that appears, click Exploration. On the Running Task Managers tab, click the value in the Path, ID column. On the page that appears, click the Log List tab. Find the log file whose name ends with .out in the Log Name column and click the name of the log file. Then, search for the shakespeare keyword in the log file to view the computing result.

      image.png

    • Computing result of a batch deployment

      Log on to the OSS console and view the computing result of a batch deployment in the directory in which the output data file is stored.

      In this example, the output data file is stored in the oss://flink-test-oss/artifacts/namespaces/flink-test-default/batch-quickstart-test-output.txt directory.批作业结果

(Optional) Step 4: Cancel the deployment

If you modify your deployment, you must publish the deployment again, cancel the deployment, and then restart the deployment to make the modification take effect. If the deployment fails and cannot reuse the state data to recover, you must cancel and then restart the deployment. For more information about how to cancel a deployment, see Cancel a deployment.

References