All Products
Search
Document Center

Dataphin:Create Databricks Data Source

Last Updated:Mar 05, 2025

By creating a Databricks data source, Dataphin can read from and write data to Databricks. This topic explains how to create a Databricks data source.

Permission description

Only custom global roles with the New Data Source Permission Point and roles such as Super Administrator, Data Source Administrator, Section Architect, and Project Administrator are authorized to create data sources.

Procedure

  1. On the Dataphin home page, select Management Center from the top menu bar, then choose Datasource Management.

  2. On the Datasource page, you can click + Create Data Source.

  3. In the Create Data Source dialog box, within the Big Data area, select Databricks .

    If you've recently used Databricks, you can select it from the Recently Used area. You can also type 'Databricks' into the search box to quickly find it.

  4. In the Create Databricks Data Source dialog box, configure the connection parameters for the data source.

    1. Enter the basic information for the data source.

      Parameter

      Description

      Datasource Name

      Enter the name of the data source. The naming convention is as follows:

      • Can only contain Chinese characters, uppercase and lowercase English letters, numbers, underscores (_), or hyphens (-).

      • Cannot exceed 64 characters in length.

      Datasource Code

      After configuring the datasource code, you can directly access Dataphin data source tables in Flink_SQL tasks or using the Dataphin JDBC client by using the format datasource code.table name or datasource code.schema.table name for quick consumption. If you need to automatically switch data sources based on the task execution environment, access through the variable format ${datasource code}.table or ${datasource code}.schema.table. For more information, see Dataphin Data Source Table Development Method.

      Important
      • Once the data source encoding is successfully configured, it cannot be modified.

      • After the data source encoding is successfully configured, data preview can be performed on the object details page of the asset directory and asset checklist.

      • In Flink SQL, currently only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are supported.

      Version

      Currently, only version 2.6.40 is supported.

      Datasource Description

      A brief description of the Databricks data source. Must not exceed 128 characters.

      Datasource Configuration

      Based on whether the business data source distinguishes between production data source and development data source:

      • If the business data source distinguishes between production data source and development data source, select Production + Development Data Source.

      • If the business data source does not distinguish between production data source and development data source, select Production Data Source.

      Tag

      You can classify and tag the data source based on tags. For information on how to create tags, see Manage Data Source Tags.

    2. Set up the connection details between the data source and Dataphin.

      If your data source configuration selects Production + Development Data Source, you need to configure the connection information for Production + Development Data Source. If your data source configuration is Production Data Source, you only need to configure the connection information for Production Data Source.

      Note

      Typically, production and development data sources should be separate to maintain environment isolation and minimize the impact of development activities on production. However, Dataphin allows for the same data source configuration if needed.

      Parameter

      Description

      Server Address

      Enter the IP address and port number of the server. Only one set of server addresses is supported. Adding new addresses is not supported.

      Parameter Checking (optional)

      Click + Parameter Configuration to add a row. You can enter the Parameter Name and corresponding Parameter Value. You can click the image icon at the end of the corresponding row to delete the parameter.

      Parameter names and parameter values support uppercase and lowercase English letters, numbers, half-width periods (.), underscores (_), and hyphens (-), with a length not exceeding 256 characters.

      Authentication Mechanism

      • Token-based authentication: Authentication based on personal token.

      • M2M-based authentication: Authentication based on Service Principal.

      Catalog

      Enter the catalog associated with the username.

      Schema

      Enter the schema associated with the username.

      Username And Password

      Enter the username and password (or credentials) of the authentication user. To ensure normal task execution, ensure that the user has the necessary data permissions.

      HTTP Path

      Enter the HTTP path in the format /sql/1.0/warehouses/warehouses_id.

      Connection Retries

      If the database connection times out, it will automatically retry the connection until the set number of retries is completed. If the connection is still unsuccessful after reaching the maximum number of retries, the connection fails.

      Note
      • The default number of retries is 1, and parameters between 0 to 10 are supported.

      • The connection retry count will be applied by default to Offline Integration Tasks and Global Quality (the asset quality module needs to be enabled). Offline integration tasks support configuring task-level retry counts separately.

  5. Click Test Connection to verify if the data source can communicate properly with Dataphin.

  6. Once the test is successful, choose the Default Resource Group. This group facilitates the execution of tasks associated with the current data source, such as database SQL, offline full database migration, and data preview.

  7. Click OK to complete the creation of the Databricks data source.