All Products
Search
Document Center

Dataphin:Create a Databricks data source

Last Updated:May 28, 2025

By creating a Databricks data source, you can enable Dataphin to read business data from Databricks or write data to Databricks. This topic describes how to create a Databricks data source.

Permission requirements

Only custom global roles with the Create Data Source permission and the super administrator, data source administrator, domain architect, and project administrator roles can create data sources.

Procedure

  1. On the Dataphin homepage, choose Management Hub > Datasource Management from the top navigation bar.

  2. On the Datasource page, click +Create Data Source.

  3. On the Create Data Source page, select Databricks in the Big Data section.

    If you have recently used Databricks, you can also select Databricks in the Recently Used section. You can also enter keywords for Databricks in the search box to quickly find it.

  4. On the Create Databricks Data Source page, configure the parameters for connecting to the data source.

    1. Configure the basic information of the data source.

      Parameter

      Description

      Datasource Name

      Enter a name for the data source. The name must meet the following requirements:

      • It can contain only Chinese characters, letters, digits, underscores (_), or hyphens (-).

      • The name can be up to 64 characters in length.

      Datasource Code

      After you configure the data source code, you can directly access Dataphin data source tables in Flink_SQL tasks or by using the Dataphin JDBC client in the format of data source code.table name or data source code.schema.table name for quick consumption. If you need to automatically switch data sources based on the task execution environment, access them using the variable format ${data source code}.table or ${data source code}.schema.table. For more information, see Dataphin data source table development method.

      Important
      • The data source code cannot be modified after it is configured successfully.

      • After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.

      • In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.

      Version

      Currently, only version 2.6.40 is supported.

      Data Source Description

      A brief description of the Databricks data source. The description cannot exceed 128 characters.

      Time Zone

      Time format data in integration tasks will be processed according to the current time zone. The default time zone is GMT+00:00. Click Modify to select the target time zone. The options are as follows:

      • GMT: GMT-12:00, GMT-11:00, GMT-10:00, GMT-09:30, GMT-09:00, GMT-08:00, GMT-07:00, GMT-06:00, GMT-05:00, GMT-04:00, GMT-03:00, GMT-03:00, GMT-02:30, GMT-02:00, GMT-01:00, GMT+00:00, GMT+01:00, GMT+02:00, GMT+03:00, GMT+03:30, GMT+04:00, GMT+04:30, GMT+05:00, GMT+05:30, GMT+05:45, GMT+06:00, GMT+06:30, GMT+07:00, GMT+08:00, GMT+08:45, GMT+09:00, GMT+09:30, GMT+10:00, GMT+10:30, GMT+11:00, GMT+12:00, GMT+12:45, GMT+13:00, GMT+14:00.

      • Daylight Saving Time: Africa/Cairo, America/Chicago, America/Denver, America/Los_Angeles, America/New_York, America/Sao_Paulo, Asia/Bangkok, Asia/Dubai, Asia/Kolkata, Asia/Shanghai, Asia/Tokyo, Atlantic/Azores, Australia/Sydney, Europe/Berlin, Europe/London, Europe/Moscow, Europe/Paris, Pacific/Auckland, Pacific/Honolulu.

      Data Source Configuration

      Based on whether the business data source distinguishes between production and development data sources:

      • If the business data source distinguishes between production and development data sources, select Production + Development Data Source.

      • If the business data source does not distinguish between production and development data sources, select Production Data Source.

      Tag

      You can categorize data sources by adding tags. For information about how to create tags, see Manage data source tags.

    2. Configure the connection parameters between the data source and Dataphin.

      If you select Production + Development data source for your data source configuration, you need to configure the connection information for the Production + Development data source. If your data source configuration is Production data source, you only need to configure the connection information for the Production data source.

      Note

      Typically, production and development data sources should be configured as different data sources to achieve environment isolation between development and production data sources, reducing the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.

      Parameter

      Description

      Server Address

      Enter the IP address and port number of the server. Only one server address is supported. You cannot add more addresses.

      Parameter Checking (optional)

      Click +Parameter Configuration to add a row where you can enter a Parameter Name and its corresponding Parameter Value. You can click the image icon after the corresponding row to delete the parameter.

      Parameter names and values can contain uppercase and lowercase letters, digits, periods (.), underscores (_), and hyphens (-). They cannot exceed 256 characters in length.

      Authentication Mechanism

      • Token-based authentication: Authentication based on personal tokens.

      • M2M-based authentication: Authentication based on Service Principal.

      Catalog

      Enter the catalog associated with the username.

      Schema

      Enter the schema associated with the username.

      Username, Password

      Enter the username and password (or credentials) of the authentication user. To ensure that tasks are executed properly, make sure that the user has the required data permissions.

      HTTP Path

      Enter the HTTP path in the format /sql/1.0/warehouses/warehouses_id.

      Connection Retries

      If the database connection times out, the system will automatically retry the connection until the specified number of retries is reached. If the connection still fails after the maximum number of retries, the connection is considered failed.

      Note
      • The default number of retries is 1. You can configure a value between 0 and 10.

      • The connection retry count will be applied by default to offline integration tasks and global quality (requires the asset quality function module to be enabled). In offline integration tasks, you can configure task-level retry counts separately.

  5. Select a Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.

  6. Perform a Connection Test or directly click OK to save and complete the creation of the Databricks data source.

    Click Connection Test to test whether the data source can connect to Dataphin normally. If you directly click OK, the system will automatically test the connection for all selected clusters. However, even if all selected clusters fail the connection test, the data source can still be created normally.