A Flink compute source provides Flink-based compute resources for projects in Dataphin. You must bind a Flink compute source to a project before you can develop Flink compute jobs. This topic describes how to create a Flink compute source.
Prerequisites
You must enable Apache Flink as the real-time compute engine for your tenant. For more information, see Set Real-time Compute Engine.
You must have a custom user role with the New Compute Source permission, or be a Super Admin or Project Admin. For more information, see Data Warehouse Planning Permission List.
Procedure
In the top menu bar on the Dataphin homepage, select Planning > Compute Source.
On the Compute Source page, click Add Compute Source and select Flink Compute Source.
On the Create Compute Source page, configure the parameters.
Basic Information
Parameter
Description
Compute Type
Select Flink.
Compute Source Name
Enter a name for the compute source. The name must meet the following requirements:
Can contain Chinese characters, letters, digits, underscores (_), and hyphens (-).
Cannot exceed 64 characters in length.
Compute Source Description
Enter a description for the compute source. The description must be 128 characters or less.
Select a deployment mode and configure its parameters
Dataphin supports two deployment modes: Yarn and Kubernetes. The required parameters depend on the selected deployment mode.
Yarn
Cluster Basic Information
Parameter
Description
Configuration File
Upload the cluster's
yarn-site.xml,core-site.xml, andhdfs-site.xmlconfiguration files.Cluster Kerberos
Kerberos is an authentication protocol that uses symmetric-key cryptography to authenticate client/server applications. It supports Single Sign-On (SSO), allowing an authenticated client to access multiple services such as HBase and HDFS.
If your cluster uses Kerberos authentication, enable this option and upload a Krb5 authentication file or configure the KDC server address.
Krb5 Authentication File: You need to upload a Krb5 file for Kerberos authentication.
KDC Server Address: The address of the KDC server for Kerberos authentication. You can specify multiple addresses separated by commas (,).
Cluster Type (Optional)
Select the type of your cluster to use for testing the connection. Supported types include Aliyun E-MapReduce 5.x, CDH 5.x Hadoop, CDH 6.x Hadoop, Cloudera Data Platform 7.x, AsiaInfo DP 5.3 Hadoop, and Transwarp TDH 6.x Hadoop.
ImportantAlthough the connection test can succeed without a cluster type, we recommend selecting one to prevent potential connection failures.
Flink Compute Resources
Parameter
Description
Compute Resource Type
You can select Resource Queue and Session Cluster.
Resource Queue
If you select Resource Queue as the compute resource type, enter the name of the YARN queue to which Flink jobs will be submitted. The name must follow these rules:
Length: The queue name cannot exceed 256 characters.
Character limit: The queue name can contain only English letters, numbers, spaces, and the following special characters:
-_.@'().Case-sensitivity: The queue name is case-sensitive.
Uniqueness: The queue name must be unique within the compute source.
If you need to configure multiple task queues, you can click + Add.
NoteYou can add a maximum of 10 resource queues.
To remove a resource queue, click the
delete icon. You must keep at least one resource queue. If you delete a queue, you can no longer submit existing jobs that rely on it.
Session Cluster
If you select Session Cluster as the compute resource type, select one or more Session Clusters. The drop-down list contains all clusters created in Session Cluster, regardless of their status.
Flink Kerberos Authentication
NoteYou can configure Flink Kerberos authentication only if you select Resource Queue as the compute resource type.
Flink Kerberos: If the Flink cluster has Kerberos authentication, you can enable Flink Kerberos, upload a Keytab File, and configure a Principal.
Keytab File: Upload the keytab file. You can obtain the keytab file on the Flink Server.
Principal: Enter the Kerberos username that corresponds to the Flink keytab file.
Username: When Flink Kerberos is disabled, enter the cluster username used to submit Flink jobs.
CheckPoint Storage
File system: Supports HDFS, OSS-HDFS, and Aliyun OSS (supported only for Flink 1.14 and 1.15). Different file systems require different parameters.
NoteThe OSS-HDFS file system is supported only with the Aliyun E-MapReduce 5.x Hadoop compute engine.
If you select HDFS as the file system, configure the following parameter:
Directory Path: Enter the directory path for Checkpoint storage, and ensure that Flink has permission to access this path. For example,
hdfs://cdh-cluster-00001:8020/openflink/savepoint/. If your HDFS is a High-Availability (HA) cluster, you can specify a high-availability path, such ashdfs://nameservice/path.If you select OSS-HDFS as the file system, configure the following parameters:
Directory Path: Enter the directory path for CheckPoint cluster storage, and ensure that Flink has permission to access this path. For example,
hdfs://cdh-cluster-00001:8020/openflink/savepoint/. If your HDFS is a high-availability (HA) cluster, you can specify the path in the format ofhdfs://nameservice/path.AccessKey ID and AccessKey Secret: Enter the AccessKey ID and AccessKey Secret that are used to access the cluster's OSS. You can use an existing AccessKey. To create a new one, see Create an AccessKey.
NoteTo prevent AccessKey Secret leaks, the AccessKey Secret is displayed only upon creation and cannot be retrieved.
If you select Aliyun OSS as the file system, configure the following parameters:
Endpoint: Enter the connection address for the OSS service.
Directory path: The format is
oss://{Bucket}/{Object}.AccessKey ID and AccessKey Secret: Enter the AccessKey ID and AccessKey Secret to access the cluster's OSS. You can use an existing AccessKey pair or create a new one. For more information, see Create an AccessKey.
NoteTo prevent AccessKey Secret leaks, the AccessKey Secret is displayed only upon creation and cannot be retrieved.
ImportantThe AccessKey credentials you configure here override any credentials set in the
core-site.xmlfile.
Kubernetes
Cluster Basic Information
No cluster basic information is required for the Kubernetes deployment mode.
Flink Compute Engine Configuration
For the Kubernetes deployment mode, you can select one of the following file systems: NFS, S3, or Azure Blob Storage. The required parameters vary based on the selected file system.
NFS
Parameter
Description
Server
Enter the domain name of the NFS server.
Version
Select the NFS version. Supported versions are NFSv3 and NFSv4.
Contents
Enter the storage directory path for CheckPoint on NFS. For example,
/data/checkpoint.Maximum capacity
Enter the maximum storage capacity for NFS in GiB. Exceeding this limit will disrupt CheckPoint storage.
S3
Parameter
Description
Endpoint (Optional)
Enter the correct address, for example,
http://s3.us-east-2.amazonaws.com.NoteThis field is optional for Amazon S3 but required for all other S3-compatible services.
Directory Path
Enter the storage path. The default path is
s3://{YOUR-BUCKET}/{path}. We recommend that you use a dedicated directory for Checkpoint storage and clean it up regularly.Access Key, Secret Key
Enter the AccessKey ID and AccessKey Secret for accessing your S3-compatible storage. Click the
icon to view the plain text.Azure Blob Storage
Parameter
Description
Protocol
Currently, only ABFS is supported.
Authentication Method
Currently, only Shared Key is supported.
Directory Path
Enter the storage path. The default is
abfs://{YOUR-CONTAINER}@${YOUR-AZURE-ACCOUNT}.dfs.core.windows.net/{object-path}.Access Key
Enter the access key for your Azure Blob Storage account. Click the
icon to view the plain text.
Click Test Connection to test the connectivity between Dataphin and the cluster.
The Kubernetes deployment mode does not support testing the connection, so you can directly click Submit.
After the test is successful, click Submit.
Next steps
After you create the Flink compute source, you can bind it to a project. For more information, see Create a general project.