A Flink compute source hosts Flink-based compute resources for a Dataphin project. Only projects bound to a Flink compute source support developing compute jobs using the Flink engine. This topic explains how to create a Flink compute source.
Prerequisites
Your tenant has enabled Apache Flink as the real-time computing engine. For more information, see Configure the real-time computing engine .
Only accounts with the create compute source permission or those assigned the super administrator or project administrator role can create a compute source. For more information, see Data warehouse planning permissions.
Procedure
On the Dataphin homepage, in the top menu bar, click Planning, and then click Compute Source.
On the Compute Sources page, click Add Compute Source, and then select Flink Compute Source.
On the Create Compute Source page, configure the parameters.
Configure basic compute engine source information
Parameter
Description
Compute type
Select Flink.
Compute source name
Enter a name for the compute source. The name must meet the following requirements:
It can contain Chinese characters, digits, letters, underscores (_), or hyphens (-).
It cannot exceed 64 characters.
Compute source description
Enter a description for the compute source. The description cannot exceed 128 characters.
Configure cluster basic information and Flink compute engine settings
Dataphin supports multiple cluster deployment modes, such as YARN and Kubernetes. Each mode requires different parameters.
YARN deployment mode
Cluster basic information
Parameter
Description
Configuration File
Upload the cluster configuration files. You must upload yarn-site.xml, core-site.xml, and hdfs-site.xml.
Cluster Kerberos
Kerberos is an identity authentication protocol based on symmetric keys. It provides identity authentication for other services and supports single sign-on (SSO). After authenticating once, a client can access multiple services such as HBase and HDFS.
If your cluster uses Kerberos authentication, enable Cluster Kerberos and upload a krb5 file or configure the KDC server address.
Krb5 file: Upload a krb5 file for Kerberos authentication.
KDC server address: Enter the KDC server address to help complete Kerberos authentication. You can enter multiple addresses separated by commas (,).
Cluster type
Optional. Select the cluster type to test connectivity. Options include Aliyun E-MapReduce 5.x, CDH 5.x Hadoop, CDH 6.x Hadoop, Cloudera Data Platform 7.x, AsiaInfo DP 5.3 Hadoop, and StarRocks TDH 6.x Hadoop.
ImportantIn most cases, you can test connectivity without selecting a cluster type. In rare cases, omitting this selection may cause the test to fail. We recommend selecting a cluster type.
Flink compute engine settings
Parameter
Description
Flink job resource queue
The YARN queue name where Flink jobs are submitted. Follow these naming rules and limits:
Length limit: The queue name cannot exceed 255 characters.
Character limit: The queue name can contain only letters, digits, periods (.), and underscores (_). No other special characters are allowed.
Case sensitivity: Queue names are case sensitive. Uppercase and lowercase letters are treated as different characters.
Uniqueness: The queue name must be unique within the compute source. It cannot duplicate any other queue name.
To add multiple job queues, click + Add.
NoteYou can add up to 10 resource queues.
To delete an extra resource queue, click
. After deletion, existing jobs cannot be submitted.
Checkpoint storage
File system: Supported file systems are HDFS, OSS-HDFS, and Aliyun OSS (supported only for Flink 1.14 and 1.15). Different file systems require different parameters.
NoteOSS-HDFS is supported only for Aliyun E-MapReduce 5.x Hadoop compute engines.
When the file system is HDFS, configure the following parameters:
Directory path: Enter the directory path where Checkpoint data is stored. Ensure Flink has permission to access this path. Example:
hdfs://cdh-cluster-00001:8020/openflink/savepoint/. If your HDFS cluster is high availability (HA), use an HA path such ashdfs://nameservice/path.Flink Kerberos: If your Flink cluster uses Kerberos authentication, enable Flink Kerberos and upload a keytab file and configure a principal.
Keytab File: Upload a keytab file. You can obtain this file from your Flink server.
Principal: Enter the Kerberos username that corresponds to the Flink keytab file.
Username: When Flink Kerberos is disabled, enter the cluster username used to submit Flink jobs.
When the file system is OSS-HDFS, configure the following parameters:
Directory path: Enter the directory path where Checkpoint data is stored. Ensure Flink has permission to access this path. Example:
hdfs://cdh-cluster-00001:8020/openflink/savepoint/. If your HDFS cluster is high availability (HA), use an HA path such ashdfs://nameservice/path.AccessKey ID and AccessKey Secret: Enter the AccessKey ID and AccessKey Secret used to access OSS in your cluster. Use an existing AccessKey or create a new one by following Create an AccessKey.
NoteTo reduce the risk of exposing your AccessKey, the AccessKey Secret appears only once during creation and cannot be viewed later. Store it securely.
Flink Kerberos: If your Flink cluster uses Kerberos authentication, enable Flink Kerberos and upload a keytab file and configure a principal.
Keytab File: Upload a keytab file. You can obtain this file from your Flink server.
Principal: Enter the Kerberos username that corresponds to the Flink keytab file.
Username: When Flink Kerberos is disabled, enter the cluster username used to submit Flink jobs.
When the file system is Aliyun OSS, configure the following parameters:
Endpoint: Enter the connection endpoint for the OSS service.
Directory path: Enter the path in the format
oss://{Bucket}/{Object}.AccessKey ID and AccessKey Secret: Enter the AccessKey ID and AccessKey Secret used to access OSS in your cluster. Use an existing AccessKey or create a new one by following Create an AccessKey.
NoteTo reduce the risk of exposing your AccessKey, the AccessKey Secret appears only once during creation and cannot be viewed later. Store it securely.
Flink Kerberos: If your Flink cluster uses Kerberos authentication, enable Flink Kerberos and upload a keytab file and configure a principal.
Keytab File: Upload a keytab file. You can obtain this file from your Flink server.
Principal: Enter the Kerberos username that corresponds to the Flink keytab file.
Username: When Flink Kerberos is disabled, enter the cluster username used to submit Flink jobs.
ImportantThe settings you enter here take precedence over any AccessKey settings in core-site.xml.
Kubernetes deployment mode
Cluster basic information
No cluster basic information is required for Kubernetes deployments.
Flink compute engine settings
For Kubernetes deployments, you can select NFS, S3, or Azure Blob Storage as the file system. Each option requires different parameters.
NFS
Parameter
Description
Server
Enter the domain name of the NFS server.
Version
Select the NFS version. Options are NFSv3 and NFSv4.
Directory
Enter the directory path on the NFS server where Checkpoint data is stored. Example:
/data/checkpoint.Maximum capacity
Enter the maximum storage capacity supported by the NFS server. Exceeding this limit affects Checkpoint storage. Unit: Gi.
S3
Parameter
Description
Endpoint (optional)
Enter the correct endpoint. Example:
http://s3.us-east-2.amazonaws.com.NoteDo not enter an endpoint for Amazon S3. For all other S3-compatible services, this field is required.
Directory path
Enter the storage path. Default:
s3://{YOUR-BUCKET}/{path}. We recommend using a dedicated directory for Checkpoint storage and cleaning it regularly.Access Key and Secret Key
Enter the AccessKey and AccessKey Secret used to access Amazon S3. Click the
icon to view the plaintext values.Azure Blob Storage
Parameter
Description
Protocol
Only ABDS is supported.
Authentication Type
Only Shared Key is supported.
Directory path
Enter the storage path. Default:
abfs://{YOUR-CONTAINER}@${YOUR-AZURE-ACCOUNT}.dfs.core.windows.net/{object-path}.Access key
Enter the access key for your Azure Blob Storage account. Click the
icon to view the plaintext value.
Click Test Connection to test the connectivity between Dataphin and the cluster.
The Kubernetes deployment mode does not support connection testing. Click Submit directly.
After the test succeeds, click Submit.
What to do next
After creating a Flink compute source, bind it to a project. For more information, see Create a general project.