This topic describes how to submit a Flink job and view the job status in the E-MapReduce (EMR) console.
Background information
- Submit a Flink job on the Data Platform tab in the EMR console.
For more information, see Configure a VVR-based Flink job.
- Use SSH to log on to your Dataflow cluster that is in Flink mode and run a command
in the CLI to submit a Flink job.
The following table describes the deployment modes that are supported by Flink on YARN in a Dataflow cluster.
Deployment mode Description Advantage and disadvantage Session mode In this mode, a Flink cluster is created based on the resource parameters that you configured, and all jobs are submitted to the Flink cluster. The Flink cluster is not automatically released after the running of all jobs is complete. If an exception occurs on a job and a TaskManager is stopped, all other jobs that are running on the TaskManager fail. In addition, only one JobManager is deployed in the cluster. As a result, the load on the JobManager rises as the number of jobs increases.
- Advantage: The time required to allocate resources for the submitted jobs is shorter than the time required in other modes.
- Disadvantage: All jobs run on the Flink cluster. As a result, the jobs compete for resources and affect each other.
This mode is suitable for jobs that require a short start time and a short runtime.
Per-job cluster mode In this mode, each time you submit a Flink job, YARN starts a Flink cluster to run the job. After the running of the job is complete or if the job is canceled, the Flink cluster is released. - Advantage: Resources that are occupied by jobs are isolated. If an exception occurs
on a job, the other jobs are not affected.
Each JobManager runs one job. This prevents the JobManager from being overloaded by multiple jobs.
- Disadvantage: Each time you run a job, YARN starts a dedicated Flink cluster. This operation results in high overheads.
This mode is suitable for jobs that have a long runtime.
Application mode In this mode, each time you submit a Flink application, YARN starts a Flink cluster to run the application. An application contains one or more jobs. After the running of the application is complete or if the application is canceled, the Flink cluster that runs the application is released. Different from the per-job cluster mode, the main() method in the JAR package of the application is implemented by the JobManager of the cluster.
If the JAR package contains multiple jobs, all jobs will be run on the cluster.
- Advantage: This mode helps reduce the workload on a client when the client submits jobs.
- Disadvantage: Each time you run an application, YARN starts a dedicated Flink cluster. This is a time-consuming operation.
You can select a mode to submit jobs and view the status of the jobs based on your business requirements. References:
Prerequisites
A Dataflow cluster that is in Flink mode is created. For more information, see Create a cluster.
Submit a job in session mode and view the job status
Submit a job in per-job cluster mode and view the job status
Submit a job in application mode and view the job status
View the job status on the web UI of Flink (VVR)
References
For more information about Flink on YARN, see Apache Hadoop YARN.