This topic describes how to submit a Flink job and view the job status in the E-MapReduce (EMR) console.

Background information

In a Dataflow cluster, Flink is deployed on YARN. You can use one of the following methods to submit a Flink job:
  • Submit a Flink job on the Data Platform tab in the EMR console.

    For more information, see Configure a VVR-based Flink job.

  • Use SSH to log on to your Dataflow cluster that is in Flink mode and run a command in the CLI to submit a Flink job.
    The following table describes the deployment modes that are supported by Flink on YARN in a Dataflow cluster.
    Deployment mode Description Advantage and disadvantage
    Session mode In this mode, a Flink cluster is created based on the resource parameters that you configured, and all jobs are submitted to the Flink cluster. The Flink cluster is not automatically released after the running of all jobs is complete.

    If an exception occurs on a job and a TaskManager is stopped, all other jobs that are running on the TaskManager fail. In addition, only one JobManager is deployed in the cluster. As a result, the load on the JobManager rises as the number of jobs increases.

    • Advantage: The time required to allocate resources for the submitted jobs is shorter than the time required in other modes.
    • Disadvantage: All jobs run on the Flink cluster. As a result, the jobs compete for resources and affect each other.

    This mode is suitable for jobs that require a short start time and a short runtime.

    Per-job cluster mode In this mode, each time you submit a Flink job, YARN starts a Flink cluster to run the job. After the running of the job is complete or if the job is canceled, the Flink cluster is released.
    • Advantage: Resources that are occupied by jobs are isolated. If an exception occurs on a job, the other jobs are not affected.

      Each JobManager runs one job. This prevents the JobManager from being overloaded by multiple jobs.

    • Disadvantage: Each time you run a job, YARN starts a dedicated Flink cluster. This operation results in high overheads.

    This mode is suitable for jobs that have a long runtime.

    Application mode In this mode, each time you submit a Flink application, YARN starts a Flink cluster to run the application. An application contains one or more jobs. After the running of the application is complete or if the application is canceled, the Flink cluster that runs the application is released.

    Different from the per-job cluster mode, the main() method in the JAR package of the application is implemented by the JobManager of the cluster.

    If the JAR package contains multiple jobs, all jobs will be run on the cluster.

    • Advantage: This mode helps reduce the workload on a client when the client submits jobs.
    • Disadvantage: Each time you run an application, YARN starts a dedicated Flink cluster. This is a time-consuming operation.
    You can select a mode to submit jobs and view the status of the jobs based on your business requirements. References:

Prerequisites

A Dataflow cluster that is in Flink mode is created. For more information, see Create a cluster.

Submit a job in session mode and view the job status

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
  2. Run the following command to start a YARN session:
    yarn-session.sh --detached
  3. Run the following command to submit a job:
    flink run /usr/lib/flink-current/examples/streaming/TopSpeedWindowing.jar
    Note The TopSpeedWindowing example in Flink is used in this topic. TopSpeedWindowing is a streaming job that runs for a long period of time.
    Information similar to the following output is returned, which contains the YARN application ID of the Flink job.Session
  4. Run the following command to view the job status:
    flink list -t yarn-session -Dyarn.application.id=<application_XXXX_YY>
    Note In this example, <application_XXXX_YY> is the application ID returned after the running of the job is complete.

    You can also view the job status on the web UI of Flink (VVR). For more information, see View the job status on the web UI of Flink (VVR).

Submit a job in per-job cluster mode and view the job status

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
  2. Run the following command to submit a job:
    flink run -t yarn-per-job --detached /usr/lib/flink-current/examples/streaming/TopSpeedWindowing.jar
    Information similar to the following output is returned, which contains the YARN application ID of the Flink job.Per-job cluster
    Run the following command to view the job status:
    flink list -t yarn-per-job -Dyarn.application.id=<application_XXXX_YY>
    Note In this example, <application_XXXX_YY> is the application ID returned after the running of the job is complete.
    Job status

    You can also view the job status on the web UI of Flink (VVR). For more information, see View the job status on the web UI of Flink (VVR).

Submit a job in application mode and view the job status

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
  2. Run the following command to submit a job:
    flink run-application -t yarn-application /usr/lib/flink-current/examples/streaming/TopSpeedWindowing.jar
    Information similar to the following output is returned, which contains the YARN application ID of the Flink job. Application
  3. Run the following command to view the job status:
    flink list -t yarn-application -Dyarn.application.id=<application_XXXX_YY>
    Note In this example, <application_XXXX_YY> is the application ID returned after the running of the job is complete.

    You can also view the job status on the web UI of Flink (VVR). For more information, see View the job status on the web UI of Flink (VVR).

View the job status on the web UI of Flink (VVR)

  1. Access the web UI of Flink (VVR).
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find your cluster and click Details in the Actions column.
    5. In the left-side navigation pane of the Cluster Overview page, click Connect Strings.
    6. On the Public Connect Strings page, click the URL in the Connect String column that corresponds to YARN UI.
  2. Click the ID of an application.
    Application ID
  3. Click the link of Tracking URL.
    Application information
    The Apache Flink Dashboard page appears. You can view the status of jobs on this page. Apache Flink Dashboard

References

For more information about Flink on YARN, see Apache Hadoop YARN.