All Products
Search
Document Center

Realtime Compute for Apache Flink:Develop a Python API draft

Last Updated:Oct 12, 2023

This topic describes the background information, limits, development methods, and debugging methods of Python API draft development in fully managed Flink. This topic also describes how to use a connector.

Background information

You must develop a Python API draft in your on-premises environment. After you develop a Python API draft, you can deploy the draft and start the deployment for the draft in the console of fully managed Flink. For more information, see Getting started with a Flink Python deployment.

The following table lists the software packages that are installed in fully managed Flink workspaces.

Software package

Version

apache-beam

2.23.0

avro-python3

1.9.1

certifi

2020.12.5

cloudpickle

1.2.2

crcmod

1.7

cython

0.29.16

dill

0.3.1.1

docopt

0.6.2

fastavro

0.23.6

future

0.18.2

grpcio

1.29.0

hdfs

2.6.0

httplib2

0.17.4

idna

2.10

jsonpickle

1.2

mock

2.0.0

numpy

1.19.5

oauth2client

3.0.0

pandas

0.25.3

pbr

5.5.1

pip

20.1.1

protobuf

3.15.3

py4j

0.10.8.1

pyarrow

0.17.1

pyasn1-modules

0.2.8

pyasn1

0.4.8

pydot

1.4.2

pymongo

3.11.3

pyparsing

2.4.7

python-dateutil

2.8.0

pytz

2021.1

requests

2.25.1

rsa

4.7.2

setuptools

47.1.0

six

1.15.0

typing-extensions

3.7.4.3

urllib3

1.26.3

wheel

0.36.2

Limits

Services provided by fully managed Flink are subject to deployment environments and network environments. Therefore, when you develop Python API drafts in fully managed Flink, take note of the following points:

  • Only Apache Flink 1.12 and later are supported.

  • Python 3.7.9 is pre-installed in your fully managed Flink workspace, and common Python libraries such as pandas, NumPy, and PyArrow are pre-installed in the Python environment. Therefore, you must develop code in Python 3.7.

  • Java Development Kit (JDK) 1.8 is used in the running environment of fully managed Flink. If your Python API draft depends on a third-party JAR package, make sure that the JAR package is compatible with JDK 1.8.

  • Only open source Scala 2.11 is supported. If your Python API draft depends on a third-party JAR package, make sure that the JAR package that is compatible with Scala 2.11 is used.

Develop a draft

You can develop business code in the console of fully managed Flink in your on-premises environment. For more information, see the following references:

  • For more information about how to develop business code of Apache Flink 1.15, see Python API.

  • Issues may occur when you develop code in Apache Flink. For more information about the issues and fixes, see FAQ.

Debug a deployment

In the code of Python user-defined functions (UDFs), you can use the logging method to generate logs and locate errors based on the logs. The following code shows an example.

@udf(result_type=DataTypes.BIGINT())
def add(i, j):    
  logging.info("hello world")    
  return i + j

After logs are generated, you can view the logs in the log file of TaskManager.

Use a connector

For more information about the connectors supported by fully managed Flink, see Supported connectors. To use a connector, perform the following steps:

  1. Log on to the Realtime Compute for Apache Flink console.

  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Artifacts.

  4. On the Artifacts page, click Upload Artifact and select the Python package of the desired connector that you want to upload.

    You can upload the Python package of the connector that you develop or the connector provided by fully managed Flink. For more information about the download links of the official Python packages provided by fully managed Flink, see Connectors.

  5. On the Deployments page, click Create Deployment. In the Create Deployment dialog box, select the Python package of the desired connector from the drop-down list of Additional Dependencies.

  6. On the Deployments page, click the name of the desired deployment. In the upper-right corner of the Parameters section on the Configuration tab, click Edit. Then, add related configurations to the Other Configuration field.

    If your deployment depends on the Python packages of multiple connectors and the packages are named connector-1.jar and connector-2.jar, configure the following information:

    pipeline.classpaths: 'file:///flink/usrlib/connector-1.jar;file:///flink/usrlib/connector-2.jar'