Create a PolarDB-X data source - Dataphin - Alibaba Cloud Documentation Center

By creating a PolarDB-X data source, you can enable Dataphin to read business data from PolarDB-X or write data to PolarDB-X. This topic describes how to create a PolarDB-X data source.

Background information

PolarDB-X (upgraded version of DRDS) is a cloud-native distributed database independently developed by Alibaba. It is a high-performance cloud-native distributed database product designed and developed by Alibaba Cloud to meet database requirements in the cloud era for high throughput, large storage, low latency, easy scalability, and ultra-high availability. If you are using PolarDB-X, you need to first create a PolarDB-X data source when connecting to Dataphin for data development.

Limits

Currently, only 1.0 version of PolarDB-X data sources are supported.

Permission description

Only custom global roles with the Create Data Source permission point and system roles such as Super Administrator, Data Source Administrator, Domain Architect, and Project Administrator can create data sources.

Procedure

In the top navigation bar of the Dataphin homepage, choose Management Center > Datasource Management.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, in the Relational Database section, select PolarDB-X (formerly DRDS).
If you have recently used PolarDB-X, you can also select PolarDB-X in the Recently Used section. You can also enter PolarDB-X keywords in the search box to quickly filter.

On the Create PolarDB-X (formerly DRDS) Data Source page, configure the parameters for connecting to the data source.

Configure the basic information of the data source.

Parameter	Description
Datasource Name	Enter a name for the data source. The name must meet the following requirements: It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-). It cannot exceed 64 characters in length.
Datasource Code	After configuring the data source code, you can directly access Dataphin data source tables in Flink_SQL tasks or using the Dataphin JDBC client through the format `data source code.table name` or `data source code.schema.table name` for quick consumption. If you need to automatically switch data sources based on the task execution environment, please access using the variable format `${data source code}.table` or `${data source code}.schema.table`. For more information, see Dataphin data source table development method. Important The data source code cannot be modified after it is configured successfully. After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory. In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Data Source Description	A brief description of the data source. It cannot exceed 128 characters.
Data Source Configuration	Select the data source to be configured: If the business data source distinguishes between production data source and development data source, select Production + Development Data Source. If the business data source does not distinguish between production data source and development data source, select Production Data Source.
Tag	You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.

Configure the connection parameters between the data source and Dataphin.
If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both Production + Development Data Source. If you select Production Data Source for Data Source Configuration, you only need to configure the connection information for the Production Data Source.
Note
Typically, the production data source and development data source should be configured as different data sources to isolate the development environment from the production environment, reducing the impact of the development data source on the production data source. However, Dataphin also supports configuring them as the same data source, meaning with identical parameter values.
Parameter
Description
JDBC URL
The format of the connection address is jdbc:mysql://host:port/dbname.
Username, Password
The username and password for logging in to the database.

Configure advanced settings for the data source.

Parameter	Description
connectTimeout	The connectTimeout duration of the database (in milliseconds), default is 900000 milliseconds (15 minutes). Note If you have a connectTimeout configuration in the JDBC URL, the connectTimeout will be the timeout value configured in the JDBC URL. For data sources created before Dataphin V3.11, the default connectTimeout is `-1`, indicating no timeout limit.
socketTimeout	The socketTimeout duration of the database (in milliseconds), default is 1800000 milliseconds (30 minutes). Note If you have a socketTimeout configuration in the JDBC URL, the socketTimeout will be the timeout value configured in the JDBC URL. For data sources created before Dataphin V3.11, the default socketTimeout is `-1`, indicating no timeout limit.
Connection Retries	When the database connection times out, it will automatically retry connecting until the set number of retries is completed. If the maximum number of retries is reached and the connection is still unsuccessful, the connection fails. Note The default number of retries is 1 time, and parameters between 0 and 10 are supported. The connection retry count will be applied by default to offline integration tasks and global quality (requires the asset quality function module to be enabled). In offline integration tasks, task-level retry counts can be configured separately.

Select Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline full database migration, data preview, and more.
Perform a Test Connection or directly click OK to save and complete the creation of the PolarDB-X data source.
Click Test Connection, and the system will test whether the data source can connect normally with Dataphin. If you directly click OK, the system will automatically test the connection for all selected clusters, but even if all selected clusters fail to connect, the data source can still be created normally.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
- The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
- The test connection usually takes less than 2 minutes. If it times out, you can click the icon to view the specific reason and retry.
- Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
  Note
  Only the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
- When the test result is Connection Failed, you can click the icon to view the specific failure reason.
- When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the icon to view the log information.

Parameter	Description
JDBC URL	The format of the connection address is `jdbc:mysql://host:port/dbname`.
Username, Password	The username and password for logging in to the database.