You can create a Microsoft SQL Server data source in Dataphin to read data from or write data to a Microsoft SQL Server database. This topic describes how to create a Microsoft SQL Server data source.
Permission requirements
The following roles can create data sources: super administrator, data source administrator, domain architect, project administrator, and custom global roles with the Create Data Source permission.
Procedure
On the Dataphin home page, choose Management Hub > Data Source Management from the top menu bar.
On the Data Source page, click +Create Data Source.
On the Create Data Source page, select Microsoft SQL Server from the Relational Database section.
If you have recently used Microsoft SQL Server, you can also select it from the Recently Used section. Alternatively, you can enter Microsoft SQL Server in the search box to find it quickly.
On the Create Microsoft SQL Server Data Source page, configure the connection parameters for the data source.
Configure the basic information.
Parameter
Description
Datasource Name
Enter a name for the data source. The name must follow these rules:
Can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
Must be no more than 64 characters in length.
Datasource Code
After you configure the data source code, you can directly access Dataphin data source tables in Flink SQL nodes or using the Dataphin Java Database Connectivity (JDBC) client. Use the format
datasource_code.table_nameordatasource_code.schema.table_namefor quick data access. To automatically switch data sources based on the node execution environment, use the variable format${datasource_code}.tableor${datasource_code}.schema.table. For more information, see Develop Dataphin data source tables.ImportantThe data source code cannot be modified after it is configured.
You can preview data on the object details page in the asset directory and asset inventory only after the data source code is configured.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are supported.
Data Source Description
A brief description of the data source. The description cannot exceed 128 characters.
Time Zone
Time-formatted data in integration nodes is processed based on the selected time zone. The default time zone is Asia/Shanghai. Click Modify to select a time zone. The options are as follows:
GMT: GMT-12:00, GMT-11:00, GMT-10:00, GMT-09:30, GMT-09:00, GMT-08:00, GMT-07:00, GMT-06:00, GMT-05:00, GMT-04:00, GMT-03:00, GMT-03:00, GMT-02:30, GMT-02:00, GMT-01:00, GMT+00:00, GMT+01:00, GMT+02:00, GMT+03:00, GMT+03:30, GMT+04:00, GMT+04:30, GMT+05:00, GMT+05:30, GMT+05:45, GMT+06:00, GMT+06:30, GMT+07:00, GMT+08:00, GMT+08:45, GMT+09:00, GMT+09:30, GMT+10:00, GMT+10:30, GMT+11:00, GMT+12:00, GMT+12:45, GMT+13:00, GMT+14:00.
Daylight Saving Time: Africa/Cairo, America/Chicago, America/Denver, America/Los_Angeles, America/New_York, America/Sao_Paulo, Asia/Bangkok, Asia/Dubai, Asia/Kolkata, Asia/Shanghai, Asia/Tokyo, Atlantic/Azores, Australia/Sydney, Europe/Berlin, Europe/London, Europe/Moscow, Europe/Paris, Pacific/Auckland, Pacific/Honolulu.
Data Source Configuration
Select the data source to configure:
If your business data source distinguishes between production and development data sources, select Production + Development Data Source.
If your business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can classify and label data sources. For more information about how to create tags, see Manage data source tags.
Configure the connection parameters.
If you set Data Source Configuration to Production + Development Data Source, you must configure the connection information for both environments. If you set it to Production Data Source, you only need to configure the connection information for the production data source.
NoteTypically, the production and development data sources are different to isolate the two environments. This practice reduces the impact of the development environment on the production environment. However, you can also configure them as the same data source using identical parameter values.
For Configuration Method, select JDBC URL or Host. JDBC URL is the default method.
JDBC URL configuration
Parameter
Description
JDBC URL
The endpoint format is
jdbc:sqlserver://host:port/dbname.Schema
Enter the schema associated with the username.
Username, Password
Enter the username and password for the target database.
Host configuration
Host configuration
Parameter
Description
Server Address
Enter the IP address and port number of the server.
Click +Add to add multiple IP addresses and port numbers. Click the
icon to delete extra IP addresses and port numbers. You must retain at least one IP address and port number.dbname
Enter the database name.
Parameter configuration
Parameter
Description
Parameter
Parameter name: Select an existing parameter name or enter a custom parameter name.
A custom parameter name can contain only letters, digits, periods (.), underscores (_), and hyphens (-).
Parameter value: This parameter is required if you select a parameter name. The value can contain only letters, digits, periods (.), underscores (_), and hyphens (-). The value can be up to 256 characters in length.
NoteClick +Add Parameter to add multiple parameters. Click the
icon to delete extra parameters. You can add up to 30 parameters.Schema
Enter the schema information associated with the username.
Username, Password
The username and password used to log on to the Microsoft SQL Server instance.
NoteIf you set the instanceName parameter, it takes precedence and the configured port is ignored.
If you select Host as the configuration method and then switch to the JDBC URL method, the system automatically concatenates the server IP address and port number to create a JDBC URL.
Configure advanced settings.
Parameter
Description
loginTimeout
The login timeout period for the database, in seconds. Default value: 900 (15 minutes).
NoteIf your JDBC URL contains a loginTimeout configuration, the timeout value in the JDBC URL is used.
For data sources created before Dataphin V3.11, the default value of loginTimeout is
-1, which indicates no timeout limit.
socketTimeout
The socket timeout period for the database, in milliseconds. Default value: 1800000 (30 minutes).
NoteIf your JDBC URL contains a socketTimeout configuration, the timeout value in the JDBC URL is used.
For data sources created before Dataphin V3.11, the default value of socketTimeout is
-1, which indicates no timeout limit.
Connection Retries
If the database connection times out, the system automatically retries the connection for the specified number of times. If the connection still fails after the maximum number of retries, the connection fails.
NoteThe default number of retries is 1. You can set this parameter to a value from 0 to 10.
The number of connection retries is applied to offline integration nodes and global quality by default. To use the global quality feature, you must activate the asset quality module. For offline integration nodes, you can configure the number of retries at the node level.
NoteValue precedence for duplicate parameters:
If a parameter is configured in the JDBC URL, Advanced Settings, and Host configuration, the value in the JDBC URL takes precedence.
If a parameter is configured in both the JDBC URL and Advanced Settings, the value in the JDBC URL takes precedence.
If a parameter is configured in both the Advanced Settings and Host configuration, the value in the Advanced Settings takes precedence.
Select a Default Resource Group. This resource group is used to run nodes related to this data source, such as database SQL nodes, offline full database migration tasks, and data previews.
Click Test Connection to verify the connection, or click OK to save the configuration and create the Microsoft SQL Server data source.
Click Test Connection to test whether Dataphin can connect to the data source. If you click OK directly, the system automatically tests the connection for all selected clusters. A data source can be created even if the connection test fails.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.