By creating a PostgreSQL data source, you can enable Dataphin to read business data from PostgreSQL or write data to PostgreSQL. This topic describes how to create a PostgreSQL data source.
Permission requirements
Only custom global roles with the Create Data Source permission and the super administrator, data source administrator, board architect, and project administrator roles can create data sources.
Procedure
On the Dataphin homepage, choose Management Center > Datasource Management from the top navigation bar.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, select PostgreSQL in the Relational Database section.
If you have recently used PostgreSQL, you can also select PostgreSQL in the Recently Used section. You can also quickly search for PostgreSQL by entering keywords in the search box.
On the Create PostgreSQL Data Source page, configure the parameters for connecting to the data source.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
Enter a name for the data source. The name must meet the following requirements:
It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
The name can be up to 64 characters in length.
Datasource Code
After you configure the data source code, you can access Dataphin data source tables directly in Flink_SQL tasks or by using the Dataphin JDBC client in the format of
data source code.table nameordata source code.schema.table namefor quick consumption. If you need to automatically switch data sources based on the task execution environment, access them using the variable format${data source code}.tableor${data source code}.schema.table. For more information, see Dataphin data source table development method.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Data Source Description
A brief description of the data source. It cannot exceed 128 characters.
Time Zone
Time format data in integration tasks will be processed according to the current time zone. The default time zone is Asia/Shanghai. Click Modify to select the target time zone from the following options:
GMT: GMT-12:00, GMT-11:00, GMT-10:00, GMT-09:30, GMT-09:00, GMT-08:00, GMT-07:00, GMT-06:00, GMT-05:00, GMT-04:00, GMT-03:00, GMT-03:00, GMT-02:30, GMT-02:00, GMT-01:00, GMT+00:00, GMT+01:00, GMT+02:00, GMT+03:00, GMT+03:30, GMT+04:00, GMT+04:30, GMT+05:00, GMT+05:30, GMT+05:45, GMT+06:00, GMT+06:30, GMT+07:00, GMT+08:00, GMT+08:45, GMT+09:00, GMT+09:30, GMT+10:00, GMT+10:30, GMT+11:00, GMT+12:00, GMT+12:45, GMT+13:00, GMT+14:00.
Daylight Saving Time: Africa/Cairo, America/Chicago, America/Denver, America/Los_Angeles, America/New_York, America/Sao_Paulo, Asia/Bangkok, Asia/Dubai, Asia/Kolkata, Asia/Shanghai, Asia/Tokyo, Atlantic/Azores, Australia/Sydney, Europe/Berlin, Europe/London, Europe/Moscow, Europe/Paris, Pacific/Auckland, Pacific/Honolulu.
Data Source Configuration
Select the data source to configure:
If your business data source distinguishes between production and development data sources, select Production + Development Data Source.
If your business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources using tags. For information on how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you selected Production + Development Data Source for your data source configuration, you need to configure the connection information for both Production + Development Data Source. If you selected Production Data Source, you only need to configure the connection information for the Production Data Source.
NoteTypically, production and development data sources should be configured as separate data sources to achieve environment isolation between development and production data sources, reducing the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.
For Configuration Method, you can choose either JDBC URL or Host. The default selection is JDBC URL.
JDBC URL configuration method
Parameter
Description
JDBC URL
The format of the connection address is
jdbc:postgresql://host:port/dbname.Schema
Enter the schema associated with the username.
Username, Password
Enter the username and password of the database.
Host configuration method
Host configuration method
Parameter
Description
Server Address
Enter the IP address and port number of the server.
You can click +Add to add multiple sets of IP addresses and port numbers, and click the
icon to delete excess IP addresses and port numbers, but you must keep at least one set.dbname
Enter the database name.
Parameter configuration
Parameter
Description
Parameter
Parameter name: Supports selecting existing parameter names or entering custom parameter names.
Custom parameter names can only contain uppercase and lowercase letters, digits, periods (.), underscores (_), and hyphens (-).
Parameter value: When a parameter name is selected, the parameter value is required. It can only contain uppercase and lowercase letters, digits, periods (.), underscores (_), and hyphens (-), and cannot exceed 256 characters in length.
NoteYou can add multiple parameters by clicking +Add Parameter, and delete excess parameters by clicking the
icon. You can add up to 30 parameters.Schema
You need to enter the schema information associated with the username.
Username, Password
Enter the username and password of the database.
NoteWhen the configuration method is set to Host and the data source creation is complete, if you need to switch to the JDBC URL configuration method, the system will concatenate the server's IP address and port number into a JDBC URL for filling.
Configure advanced settings for the data source.
Parameter
Description
connectTimeout
The connectTimeout duration of the database (in seconds), default is 900 seconds (15 minutes).
NoteIf you have a connectTimeout configuration in the JDBC URL, the connectTimeout will be the timeout period configured in the JDBC URL.
For data sources created before Dataphin V3.11, the default connectTimeout is
-1, which means no timeout limit.
socketTimeout
The socketTimeout duration of the database (in seconds), default is 1800 seconds (30 minutes).
NoteIf you have a socketTimeout configuration in the JDBC URL, the socketTimeout will be the timeout period configured in the JDBC URL.
For data sources created before Dataphin V3.11, the default socketTimeout is
-1, which means no timeout limit.
Connection Retries
If the database connection times out, it will automatically retry connecting until the set number of retries is completed. If the maximum number of retries is reached and the connection is still unsuccessful, the connection fails.
NoteThe default number of retries is 1, and you can configure a value between 0 and 10.
The connection retry count will be applied by default to offline integration tasks and global quality (requires the asset quality function module to be enabled). In offline integration tasks, you can configure task-level retry counts separately.
NoteRules for duplicate parameter values:
If a parameter exists simultaneously in JDBC URL, Advanced Settings parameters, and Host Configuration method's parameter configuration, the value in the JDBC URL takes precedence.
If a parameter exists simultaneously in JDBC URL and Advanced Settings parameters, the value in the JDBC URL takes precedence.
If a parameter exists simultaneously in Advanced Settings parameters and Host Configuration method's parameter configuration, the value in the Advanced Settings parameter configuration takes precedence.
Select the Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, and data preview.
Perform a Test Connection or directly click OK to save and complete the creation of the PostgreSQL data source.
Click Test Connection, and the system will test whether the data source can connect normally with Dataphin. If you directly click OK, the system will automatically test the connection for all selected clusters, but the data source can still be created normally even if all selected clusters fail to connect.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.