Creating a Kafka data source enables Dataphin to read business data from Kafka or write data to Kafka. This topic describes how to create a Kafka data source.
Background information
Kafka is a message queue used to process real-time data. Dataphin supports Kafka 0.9, Kafka 0.10, and Kafka 0.11. If you use Kafka, you need to create a Kafka data source before you can develop data in Dataphin or export data from Dataphin to Kafka. For more information about the features of different Kafka versions, see the following official documentation:
Permissions
In Dataphin, only custom roles with the Create Data Source permission and the super administrator, data source administrator, domain architect, and project administrator roles can create data sources.
Procedure
In the top navigation bar of the Dataphin homepage, choose Management Center > Datasource Management.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, select Kafka in the Message Queue section.
If you have recently used Kafka, you can also select Kafka in the Recently Used section. You can also enter a keyword in the search box to quickly search for Kafka.
On the Create Kafka Data Source page, configure the parameters for connecting to the data source.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
The name must meet the following requirements:
The name can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
The name cannot exceed 64 characters in length.
Datasource Code
After you configure the data source code, you can reference tables in the data source in a Flink_SQL task by using the
data source code.table nameordata source code.schema.table nameformat. If you need to automatically access the data source in the corresponding environment based on the current environment, you can use the variable format${data source code}.tableor${data source code}.schema.table. For more information, see Dataphin data source table development method.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Data Source Description
A brief description of the data source. The description cannot exceed 128 characters in length.
Data Source Configuration
Select the data source that you want to configure:
If your business data source distinguishes between production and development data sources, select Production + Development Data Source.
If your business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize data sources by adding tags. For information about how to create tags, see Manage data source tags.
Configure the parameters for connecting the data source to Dataphin.
If you select Production + Development data source for your data source configuration, you need to configure the connection information for the Production + Development data source. If your data source configuration is Production data source, you only need to configure the connection information for the Production data source.
NoteIn most cases, the production data source and development data source should be configured as different data sources to isolate the development environment from the production environment and reduce the impact of the development data source on the production data source. However, Dataphin also supports configuring them as the same data source, which means using the same parameter values.
Parameter
Description
Connection Url
The endpoint of the Kafka cluster. Use the
host:portformat. If you need to configure the addresses of multiple nodes, separate them with commas (,).Dataphin supports Kafka 0.9, Kafka 0.10, and Kafka 0.11. For information about how to configure the endpoints of different Kafka versions, see the following official documentation:
Authentication Type
Three authentication types are supported: No Authentication, Kerberos, and Username + Password.
No Authentication: If your Kafka cluster uses no authentication, you can select this option.
Kerberos: Kerberos is an identity authentication protocol based on symmetric key technology and is commonly used for authentication between cluster components. Enabling Kerberos can enhance cluster security. If you select Kerberos authentication, you need to configure the following parameters:
Krb5 File: Upload the Krb5 configuration file for Kerberos.
Keytab File: Upload the Keytab file for user authentication.
Principal: Enter the Principal name for Kerberos authentication. For example, XXXX/hadoopclient@xxx.xxx.
Jaas File: If your jaas file contains only the Krb5 file, Keytab file, and Principal parameters, you can choose to fill in only the Principal. If your jaas file contains parameters other than the Krb5 file, Keytab file, and Principal, you need to upload the jaas file.
Username + Password: For username + password authentication, you need to configure the following parameters:
Encryption Method: You can select PLAIN, SCRAM-SHA-256, or SCRAM-SHA-512.
NoteIf you select SCRAM-SHA-256 or SCRAM-SHA-512 as the encryption method, SSL encryption is not supported, and the data source can be used only for offline integration.
Username, Password: Enter the username and password for connecting to the Kafka cluster.
SSL Encryption
If you need to encrypt data transmission between Dataphin and Kafka by using SSL, you can enable SSL encryption. To enable SSL encryption, you need to configure the following parameters:
NoteSSL encryption is not supported for Kerberos authentication and Username + Password authentication with SCRAM-SHA-256 or SCRAM-SHA-512 as the encryption method.
Truststore Certificate: The Truststore certificate used for SSL encryption.
Truststore Certificate Password: The password of the Truststore certificate.
Hostname Endpoint Identification Algorithm: Not required. The endpoint identification algorithm used to validate the server hostname with the server certificate. If this parameter is not specified, hostname verification is disabled by default. For example, HTTPS.
SSL Mutual Authentication: SSL mutual authentication between Dataphin and Kafka. It is commonly used in scenarios that require strict control and verification of communication, such as financial transactions or sensitive data transmission. To enable SSL mutual authentication, you need to upload a Keystore certificate and enter the Keystore certificate password and Keystore private key password.
Schema Registry
Schema Registry is a feature supported by Confluent Kafka. If your Confluent Kafka has Schema Registry enabled, select this option to enable it.
Select a Default Resource Group. This resource group is used to run tasks related to the current data source, including database SQL, offline database migration, and data preview.
Click Test Connection or directly click OK to save and complete the creation of the Kafka data source.
Click Test Connection, and the system will test whether the data source can connect to Dataphin normally. If you directly click OK, the system will automatically test the connection for all selected clusters. However, even if all selected clusters fail the connection test, the data source can still be created normally.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.