By creating a Databricks data source, Dataphin can read from and write data to Databricks. This topic explains how to create a Databricks data source.
Permission description
Only custom global roles with the New Data Source Permission Point and roles such as Super Administrator, Data Source Administrator, Section Architect, and Project Administrator are authorized to create data sources.
Procedure
On the Dataphin home page, select Management Center from the top menu bar, then choose Datasource Management.
On the Datasource page, you can click + Create Data Source.
In the Create Data Source dialog box, within the Big Data area, select Databricks .
If you've recently used Databricks, you can select it from the Recently Used area. You can also type 'Databricks' into the search box to quickly find it.
In the Create Databricks Data Source dialog box, configure the connection parameters for the data source.
Enter the basic information for the data source.
Parameter
Description
Datasource Name
Enter the name of the data source. The naming convention is as follows:
Can only contain Chinese characters, uppercase and lowercase English letters, numbers, underscores (_), or hyphens (-).
Cannot exceed 64 characters in length.
Datasource Code
After configuring the datasource code, you can directly access Dataphin data source tables in Flink_SQL tasks or using the Dataphin JDBC client by using the format
datasource code.table name
ordatasource code.schema.table name
for quick consumption. If you need to automatically switch data sources based on the task execution environment, access through the variable format${datasource code}.table
or${datasource code}.schema.table
. For more information, see Dataphin Data Source Table Development Method.ImportantOnce the data source encoding is successfully configured, it cannot be modified.
After the data source encoding is successfully configured, data preview can be performed on the object details page of the asset directory and asset checklist.
In Flink SQL, currently only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are supported.
Version
Currently, only version 2.6.40 is supported.
Datasource Description
A brief description of the Databricks data source. Must not exceed 128 characters.
Datasource Configuration
Based on whether the business data source distinguishes between production data source and development data source:
If the business data source distinguishes between production data source and development data source, select Production + Development Data Source.
If the business data source does not distinguish between production data source and development data source, select Production Data Source.
Tag
You can classify and tag the data source based on tags. For information on how to create tags, see Manage Data Source Tags.
Set up the connection details between the data source and Dataphin.
If your data source configuration selects Production + Development Data Source, you need to configure the connection information for Production + Development Data Source. If your data source configuration is Production Data Source, you only need to configure the connection information for Production Data Source.
NoteTypically, production and development data sources should be separate to maintain environment isolation and minimize the impact of development activities on production. However, Dataphin allows for the same data source configuration if needed.
Parameter
Description
Server Address
Enter the IP address and port number of the server. Only one set of server addresses is supported. Adding new addresses is not supported.
Parameter Checking (optional)
Click + Parameter Configuration to add a row. You can enter the Parameter Name and corresponding Parameter Value. You can click the
icon at the end of the corresponding row to delete the parameter.
Parameter names and parameter values support uppercase and lowercase English letters, numbers, half-width periods (.), underscores (_), and hyphens (-), with a length not exceeding 256 characters.
Authentication Mechanism
Token-based authentication: Authentication based on personal token.
M2M-based authentication: Authentication based on Service Principal.
Catalog
Enter the catalog associated with the username.
Schema
Enter the schema associated with the username.
Username And Password
Enter the username and password (or credentials) of the authentication user. To ensure normal task execution, ensure that the user has the necessary data permissions.
HTTP Path
Enter the HTTP path in the format
/sql/1.0/warehouses/warehouses_id
.Connection Retries
If the database connection times out, it will automatically retry the connection until the set number of retries is completed. If the connection is still unsuccessful after reaching the maximum number of retries, the connection fails.
NoteThe default number of retries is 1, and parameters between 0 to 10 are supported.
The connection retry count will be applied by default to Offline Integration Tasks and Global Quality (the asset quality module needs to be enabled). Offline integration tasks support configuring task-level retry counts separately.
Click Test Connection to verify if the data source can communicate properly with Dataphin.
Once the test is successful, choose the Default Resource Group. This group facilitates the execution of tasks associated with the current data source, such as database SQL, offline full database migration, and data preview.
Click OK to complete the creation of the Databricks data source.