Create an AnalyticDB for PostgreSQL data source - Dataphin

You can create an AnalyticDB for PostgreSQL data source so that Dataphin can read business data from or write data to AnalyticDB for PostgreSQL. This topic describes how to create an AnalyticDB for PostgreSQL data source.

Background information

AnalyticDB for PostgreSQL is a cloud-native data warehouse service from Alibaba Cloud. It is a Massively Parallel Processing (MPP) database that is compatible with ANSI SQL 2003 and the PostgreSQL and Oracle database ecosystems. It provides complete transaction processing, high-throughput writes, and a unified stream and batch processing engine. AnalyticDB for PostgreSQL uses a proprietary compute engine and hybrid row-column storage to deliver high-performance data processing and online analytics. To use AnalyticDB for PostgreSQL with Dataphin for data development, you must first create an AnalyticDB for PostgreSQL data source.

Permissions

Only custom global roles with the Create Data Source permission and system roles such as super administrator, Data Source Administrator, Domain Architect, and project administrator can create data sources.

Procedure

In the top menu bar of the Dataphin home page, choose Management Center > Data Source Management.
On the Data Source page, click + Create Data Source.
On the Create Data Source page, go to the Relational Database section and select AnalyticDB for PostgreSQL.
If you recently used AnalyticDB for PostgreSQL, you can also select it from the Recently Used section. You can also enter keywords for AnalyticDB for PostgreSQL in the search box to quickly find the data source.

On the Create AnalyticDB for PostgreSQL Data Source page, configure the connection parameters for the data source.

Configure the basic information for the data source.

Parameter	Description
Datasource Name	The name must meet the following requirements: It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-). The name can be up to 64 characters in length.
Datasource Code	After you configure the data source code, you can reference tables in the data source in Flink_SQL nodes using the format `datasource_code.table_name` or `datasource_code.schema.table_name`. To automatically access the data source in the corresponding environment, use the variable format `${datasource_code}.table` or `${datasource_code}.schema.table`. For more information, see Develop a Dataphin data source table. Important The data source code cannot be modified after it is configured. You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured. In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.
Version	Select 6.x or 7.x. The default value is 6.x.
Data Source Description	A brief description of the data source. The description cannot exceed 128 characters.
Time Zone	The time zone is used to process time-formatted data in integration nodes. The default time zone is Asia/Shanghai. Click Modify to select a destination time zone. The options are: GMT: GMT-12:00, GMT-11:00, GMT-10:00, GMT-09:30, GMT-09:00, GMT-08:00, GMT-07:00, GMT-06:00, GMT-05:00, GMT-04:00, GMT-03:00, GMT-03:00, GMT-02:30, GMT-02:00, GMT-01:00, GMT+00:00, GMT+01:00, GMT+02:00, GMT+03:00, GMT+03:30, GMT+04:00, GMT+04:30, GMT+05:00, GMT+05:30, GMT+05:45, GMT+06:00, GMT+06:30, GMT+07:00, GMT+08:00, GMT+08:45, GMT+09:00, GMT+09:30, GMT+10:00, GMT+10:30, GMT+11:00, GMT+12:00, GMT+12:45, GMT+13:00, GMT+14:00. Daylight saving time: Africa/Cairo, America/Chicago, America/Denver, America/Los_Angeles, America/New_York, America/Sao_Paulo, Asia/Bangkok, Asia/Dubai, Asia/Kolkata, Asia/Shanghai, Asia/Tokyo, Atlantic/Azores, Australia/Sydney, Europe/Berlin, Europe/London, Europe/Moscow, Europe/Paris, Pacific/Auckland, Pacific/Honolulu.
Data Source Configuration	Select the data source to configure: If your business data source has separate production and development data sources, select Production + Development Data Source. If your business data source does not have separate production and development data sources, select Production Data Source.
Tag	You can classify data sources using tags. For more information about how to create tags, see Manage data source tags.

Configure the connection parameters between the data source and Dataphin.

If you set Data Source Configuration to Production + Development Data Source, you must configure the connection information for both environments. If you set it to Production Data Source, you only need to configure the connection information for the production data source.

Note

Typically, production and development data sources should be configured separately. This isolates the development environment from the production environment and reduces the impact of development activities on production data. However, Dataphin also supports configuring them as a single data source with identical parameter values.

For Configuration Method, select JDBC URL or Host. The default value is JDBC URL.

JDBC URL configuration

Parameter	Description
JDBC URL	You can configure one or more IP addresses. Separate multiple IP addresses with commas (,).
Schema	Enter the schema associated with the username.
Username, Password	The username and password for the database.

Host configuration

Parameter

Description

Server Address

Enter the IP address and port number of the server.

Click +Add to add multiple sets of IP addresses and port numbers. Click the icon to delete extra sets. You must keep at least one set.

dbname

Enter the database name.

Parameter configuration

Parameter	Description
Parameter	Parameter name: Select an existing parameter name or enter a custom parameter name. Custom parameter names can contain only letters, digits, periods (.), underscores (_), and hyphens (-). Parameter value: This parameter is required if a parameter name is selected. The value can contain only letters, digits, periods (.), underscores (_), and hyphens (-). The value cannot exceed 256 characters. Note Click +Add Parameter to add multiple parameters. Click the icon to delete extra parameters. You can add up to 30 parameters.
Schema (optional)	Enter the schema information to read. Cross-schema table selection is supported. Select the schema where the table is located. If you do not specify a schema, the schema configured in the data source is used by default.
Username, Password	Enter the username and password for the database.

Note

If you create a data source using the Host configuration method and then switch to the JDBC URL configuration method, the system automatically populates the JDBC URL by concatenating the server IP address and port number.

Configure advanced settings for the data source.

Parameter	Description
connectTimeout	The connection timeout period for the database, in seconds. The default value is 900 seconds (15 minutes). Note If your JDBC URL contains a connectTimeout configuration, the timeout value configured in the JDBC URL is used. For data sources created before Dataphin V3.11, the default value of connectTimeout is `-1`, which indicates no timeout limit.
socketTimeout	The socket timeout period for the database, in seconds. The default value is 1800 seconds (30 minutes). Note If your JDBC URL contains a socketTimeout configuration, the timeout value configured in the JDBC URL is used. For data sources created before Dataphin V3.11, the default value of socketTimeout is `-1`, which indicates no timeout limit.
Connection Retry Count	If the database connection times out, the system automatically retries the connection until the specified number of retries is reached. If the connection still fails after the maximum number of retries, the connection fails. Note The default number of retries is 1. You can configure a value from 0 to 10. The connection retry count is applied by default to Offline Integration Nodes and Global Quality (requires the Asset Quality module). You can configure node-level retry counts separately in offline integration nodes.

Note

Precedence rules for duplicate parameters:

If a parameter is configured in the JDBC URL, Advanced Settings, and Host Configuration sections, the value in the JDBC URL takes precedence.
If a parameter is configured in both the JDBC URL and Advanced Settings sections, the value in the JDBC URL takes precedence.
If a parameter is configured in both the Advanced Settings and Host Configuration sections, the value in Advanced Settings takes precedence.

Select a Default Resource Group. This resource group is used to run nodes related to this data source, such as database SQL, offline full database migration, and data preview.
Click Test Connection or OK to save the configuration and create the AnalyticDB for PostgreSQL data source.
When you click Test Connection, the system tests whether the data source can connect to Dataphin. If you click OK directly, the system automatically tests the connection for all selected clusters. The data source can be created even if the connection test fails for all selected clusters.