Deploy a job - Realtime Compute for Apache Flink - Alibaba Cloud Documentation Center

After you develop a job, you must deploy it. Deployment isolates the development environment from the production environment. A deployment does not affect running jobs and goes online only after the job is started or restarted. This topic describes how to deploy SQL, YAML, JAR, and Python jobs.

Prerequisites

A job has been developed.

For SQL jobs, see Job development map.
For YAML jobs, see Develop a Flink CDC data ingestion job (public preview).
For Python jobs, you must develop a Python package. For more information, see Develop a Python job.
For JAR jobs, you must develop a JAR package. For more information, see Develop a JAR job.

Upload resources

Before you deploy a job, you must upload the required JAR packages, Python job files, or Python dependencies to the Flink development console.

Log on to the Realtime Compute for Apache Flink console.
Find the target workspace and click Console in the Actions column.
In the navigation pane on the left, click File Management.
Click Upload Resource and select the JAR package, Python job file, or Python dependency to upload.

Note

If your job is a Python API job, upload the official JAR package of PyFlink. For more information about the download URLs of the official JAR packages, see PyFlink V1.11 and PyFlink V1.12.

Procedure

Log on to the Realtime Compute for Apache Flink console.

Find the target workspace and click Console in the Actions column. The steps to deploy a job vary based on the job type.

Deploy an SQL job

On the Data Development > ETL page, you can develop an SQL job. For more information, see Job development map.
Click Deploy.

Configure the parameters.

Parameter	Description
Note	Optional. Enter a description.
Job Tags	After you configure tags for a job, you can quickly find the job by Tag Key and Tag Value on the Operation Center > Job O&M page. You can create a maximum of three tags for a job.
Deployment Target	From the drop-down list, select a target resource queue or session cluster (not for production use). For more information, see Manage resource queues and Step 1: Create a session cluster. Note Jobs deployed to a session cluster do not support monitoring and alerting or auto-tuning. Do not use session clusters in a production environment. Session clusters can be used as a staging environment. For more information, see Debug a job.
Skip deep check before deployment	If you select this option, the deep check is skipped before deployment.

Click OK.
On the Job O&M page, you can view the deployed SQL job and start it as needed.

Deploy a YAML job

Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 8.0.9 or later supports deploying YAML jobs.

On the Data Development > Data Ingestion page, you can develop a YAML job. For more information, see Develop a Flink CDC data ingestion job (public preview).
Click Deploy.

Configure the parameters.

Parameter	Description
Description	Optional. Enter a description for the job.
Job Tags	After you configure tags for a job, you can quickly find the job by Tag Key and Tag Value on the Operation Center > Job O&M page. You can create a maximum of three tags for a job.
Deployment Target	From the drop-down list, select a target resource queue. For more information, see Manage resource queues.
Skip deep check before deployment	If you select this option, the deep check is skipped before deployment.

Click OK.
On the Job O&M page, you can view the deployed YAML job and start it as needed.

Deploy a JAR job

On the Operation Center > Job O&M page, choose Deploy Job > JAR Job.

Enter the following configuration information.

Parameter	Description
Deployment Mode	Select Stream or Batch.
Deployment Name	Enter a name for the JAR job.
Engine Version	For more information about engine versions, see Engine versions and Lifecycle policy. We recommend that you use a recommended or stable version. The version tags are described as follows: Recommended: The latest minor version of the latest major version. Stable: The latest minor version of a major version that is still within its service period. Bugs in previous versions are fixed in this version. Normal: Other minor versions that are still within their service period. EOS: Versions that have reached their end of service (EOS).
JAR URI	Select a file or upload a new file. You can drag a file to this area or click the icon to select a file to upload. Note Realtime Compute for Apache Flink VVR 8.0.6 and later supports access only to the bucket that is bound to the Flink workspace. Access to other buckets is not supported. If your job is a Python API job, specify the official JAR package of PyFlink. For more information about the download URLs of the official JAR packages, see PyFlink V1.11 and PyFlink V1.12.
Entry Point Class	The entry class of the program. If your JAR package does not specify a main class, enter the standard path of your Entry Point Class. Note If your job is a Python API job, set Entrypoint class to org.apache.flink.client.python.PythonDriver.
Entry Point Main Arguments	You can pass parameters here and call them in the main method. Note The parameter information cannot exceed 1024 characters. Do not pass complex parameters, such as parameters that include line breaks, spaces, or other special characters. To pass complex parameters, use additional dependency files. If your job is a Python API job, you must first upload your Python job file. After the Python job file is uploaded, it is stored in the /flink/usrlib/ directory on the node where the job runs by default. For example, if your Python job file is named word_count.py, set Entrypoint main args to `-py /flink/usrlib/word_count.py`. You must specify the full path of the Python job file. The /flink/usrlib/ directory cannot be omitted or changed.
Additional Dependency Files	(Recommended) Select the uploaded additional dependency files. You must upload a dependency file in advance. To upload the file, go to Resource Management in the navigation pane on the left of the development console of Realtime Compute for Apache Flink. Alternatively, when you create a deployment, click the icon to the right of Additional Dependencies. The dependency file that you upload is stored in the artifacts directory of the Object Storage Service (OSS) bucket that you select to associate with your Realtime Compute for Apache Flink workspace when you activate the workspace. The file path is in the format of `oss://<Name of your associated OSS bucket>/artifacts/namespaces/<namespace name>`. Enter the OSS path of the target additional dependency files. The OSS path must point to the OSS bucket that you selected when you created the current Flink workspace. Enter the URL of the target additional dependency files. The URL must be an address in an external storage system that Realtime Compute for Apache Flink can access. The access must be allowed, for example, public-read or no permission required. Only URLs that end with a filename are supported, such as http://xxxxxx/<file>. Note Additional dependency files uploaded using any of the three methods are also downloaded to the destination machine. When the job runs, the files are loaded into the /flink/usrlib/ directory of the pods where the JobManager (JM) and TaskManagers (TMs) reside. If you select a session cluster as the Deployment Target, you cannot configure additional dependency files for the job.
Deployment Target	From the drop-down list, select a target resource queue or session cluster (not for production use). For more information, see Manage resource queues and Step 1: Create a session cluster. Note Jobs deployed to a session cluster do not support monitoring and alerting or auto-tuning. Do not use session clusters in a production environment. Session clusters can be used as a staging environment. For more information, see Debug a job.
Description	Optional. Enter a description for the job.
Job Tags	After you configure tags for a job, you can quickly find the job by Tag Key and Tag Value on the Job O&M page. You can create a maximum of three tags for a job.
More Settings	Turn on this switch to configure the following parameters: Kerberos Cluster: From the drop-down list, select a Kerberos cluster that you created. For more information about how to create a Kerberos cluster, see Create a Kerberos cluster. principal: A Kerberos principal can be a user or a service. It is used to uniquely identify an identity in the Kerberos encryption system.

Click Deploy.
On the Job O&M page, you can view the deployed JAR job and start it as needed.

Deploy a Python job

On the Operation Center > Job O&M page, choose Deploy Job > Python Job.

Configure the parameters.

Parameter	Description
Deployment Mode	You can deploy in stream mode or batch mode.
Deployment Name	Enter a name for the Python job.
Engine Version	For more information about engine versions, see Engine versions and Lifecycle policy. We recommend that you use a recommended or stable version. The version tags are described as follows: Recommended: The latest minor version of the latest major version. Stable: The latest minor version of a major version that is still within its service period. Bugs in previous versions are fixed in this version. Normal: Other minor versions that are still within their service period. EOS: Versions that have reached their end of service (EOS).
Python File Path	Select the Python job file. The Python file can be a .py or .zip file. If the Entry Module parameter is empty, this parameter must be a .py file.
Entry Module	The entry class of the program, for example, example.word_count. This parameter is required if the Python File Path is a .zip file.
Entry Point Main Arguments	The job parameters.
Python Libraries	Third-party Python packages. The packages are added to the PYTHONPATH of the Python worker process and can be directly accessed in Python user-defined functions (UDFs). For more information about how to use third-party Python packages, see Use third-party Python packages.
Python Archives	Archive files. Only files in ZIP format are supported, such as .zip, .jar, .whl, and .egg files. Archive files are decompressed to the working directory of the Python worker process. For example, if the archive file is in a compressed package named mydata.zip, you can write the following code in a Python UDF to access the mydata.zip archive file. `def map(): with open("mydata.zip/mydata/data.txt") as f: ...` For more information about Python Archives, see Use a custom Python virtual environment and Use data files.
Additional Dependency Files	Select your Python job files and dependent data files. (Recommended) Select the uploaded additional dependency files. You must upload a dependency file in advance. To upload the file, go to Resource Management in the navigation pane on the left of the development console of Realtime Compute for Apache Flink. Alternatively, when you create a deployment, click the icon to the right of Additional Dependencies. The dependency file that you upload is stored in the artifacts directory of the Object Storage Service (OSS) bucket that you select to associate with your Realtime Compute for Apache Flink workspace when you activate the workspace. The file path is in the format of `oss://<Name of your associated OSS bucket>/artifacts/namespaces/<namespace name>`. Enter the OSS path of the target additional dependency files. The OSS path must point to the OSS bucket that you selected when you created the current Flink workspace. Enter the URL of the target additional dependency files. The URL must be an address in an external storage system that Realtime Compute for Apache Flink can access. The access must be allowed, for example, public-read or no permission required. Only URLs that end with a filename are supported, such as http://xxxxxx/<file>. Note Additional dependency files uploaded using any of the three methods are also downloaded to the destination machine. When the job runs, the files are loaded into the /flink/usrlib/ directory of the pods where the JobManager (JM) and TaskManagers (TMs) reside. If you select a session cluster as the Deployment Target, you cannot configure additional dependency files for the job. Important If you use JAR package dependencies, you must configure the `pipeline.classpaths` parameter after deployment to reference the JAR package dependencies. For more information, see Use JAR package dependencies.
Deployment Target	From the drop-down list, select a target resource queue or session cluster (not for production use). For more information, see Manage resource queues and Step 1: Create a session cluster. Note Jobs deployed to a session cluster do not support monitoring and alerting or auto-tuning. Do not use session clusters in a production environment. Session clusters can be used as a staging environment. For more information, see Debug a job.
Note	You can optionally enter a description.
Job Tags	After you configure tags for a job, you can quickly find the job by Tag Key and Tag Value on the Job O&M page. You can create a maximum of three tags for a job.
More Settings	Turn on this switch to configure the following parameters: Kerberos Cluster: From the drop-down list, select a Kerberos cluster that you created. For more information about how to create a Kerberos cluster, see Create a Kerberos cluster. principal: A Kerberos principal can be a user or a service. It is used to uniquely identify an identity in the Kerberos encryption system.

Click Deploy.
On the Job O&M page, you can view the deployed Python job and start it as needed.

References

You can configure the resources and deployment information for a job before or after it goes online. For more information, see Configure job deployment information and Configure job resources.
After you deploy a job, you must start it on the Operation Center > Job O&M page for it to run online. For more information about how to start a job, see Start a job.