The AnalyticDB for PostgreSQL input component reads data from an AnalyticDB for PostgreSQL data source. When you need to synchronize data from an AnalyticDB for PostgreSQL data source to other data sources, you must first configure the source data source information for the AnalyticDB for PostgreSQL input component, and then configure the target data source for data synchronization. This topic describes how to configure an AnalyticDB for PostgreSQL input component.
Prerequisites
An AnalyticDB for PostgreSQL data source is created. For more information, see Create an AnalyticDB for PostgreSQL data source.
The account used to configure the AnalyticDB for PostgreSQL input component properties has read-through permission on the data source. If you do not have the permission, you need to request data source permissions. For more information, see Request data source permissions.
Procedure
In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.
In the top navigation bar of the Integration page, select a project (In Dev-Prod mode, you need to select an environment).
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the left-side navigation pane of the Component Library panel, select Inputs. Find the AnalyticDB for PostgreSQL component in the input component list on the right and drag it to the canvas.
Click the
icon in the AnalyticDB for PostgreSQL input component card to open the AnalyticDB for PostgreSQL Input Configuration dialog box.In the AnalyticDB For PostgreSQL Input Configuration dialog box, configure the following parameters.
Parameter
Description
Step Name
The name of the AnalyticDB for PostgreSQL input component. Dataphin automatically generates a step name, which you can modify based on your business scenario. The name must meet the following requirements:
It can contain only Chinese characters, letters, underscores (_), and digits.
It can be up to 64 characters in length.
Datasource
The data source dropdown list displays all AnalyticDB for PostgreSQL data sources and project levels in the current Dataphin instance, including data sources for which you may or may not have read-through permissions. Click the
icon to copy the current data source name.For data sources for which you do not have read-through permissions, you can click Request next to the data source to request read-through permissions. For more information, see Request data source permissions.
If you do not have an AnalyticDB for PostgreSQL data source, click Create Data Source to create one. For more information, see Create an AnalyticDB for PostgreSQL data source.
Time Zone
The system processes time format data based on the current time zone. The default is the time zone configured in the selected data source and cannot be modified.
NoteFor tasks created before V5.1.2, you can select Default Data Source Configuration or Channel Configuration Time Zone. The default is Channel Configuration Time Zone.
Default Data Source Configuration: The default time zone of the selected data source.
Channel Configuration Time Zone: The time zone configured in Properties > Channel Configuration for the current integration task.
Schema (optional)
Supports selecting tables across schemas. Select the schema where the table is located. If not specified, the default is the schema configured in the data source.
Source Table Quantity
Select the source table quantity. Options include Single Table and Multiple Tables:
Single Table: Applicable to scenarios where business data from one table is synchronized to one target table.
Multiple Tables: Applicable to scenarios where business data from multiple tables is synchronized to the same target table. When data from multiple tables is written to the same data table, the union algorithm is used.
Table
Select the source table:
If you selected Single Table for Source Table Quantity, you can enter a keyword to search for tables or enter the exact table name and click Exact Match. After you select a table, the system automatically checks the table status. Click the
icon to copy the name of the selected table.If you selected Multiple Tables for Source Table Quantity, perform the following steps to add tables.
In the input box, enter a table expression to filter tables with the same structure.
The system supports enumeration format, regular expression-like format, and a combination of both. For example,
table_[001-100];table_102.Click Exact Match. In the Confirm Matching Details dialog box, view the list of matched tables.
Click OK.
Shard Key (optional)
The system shards data based on the configured shard key field, which can be used with the concurrency configuration to implement concurrent reading. You can use a column in the source data table as the shard key. We recommend that you use the primary key or a column with an index as the shard key to ensure transmission performance.
ImportantWhen you select a date-time type, the system identifies the maximum and minimum values, and performs forced sharding based on the total time range and concurrency. Even distribution is not guaranteed.
Batch Read Count (optional)
The number of records to read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1,024 records) instead of reading records one by one. This reduces the number of interactions with the data source, improves I/O efficiency, and reduces network latency.
Input Filter (optional)
Configure filtering conditions for data extraction. Detailed configuration instructions are as follows:
Configure a static value to extract the corresponding data, for example,
ds=20210101.Configure a variable parameter to extract a specific portion of data, for example,
ds=${bizdate}.
Output Fields
NoteWhen the tenant's compute engine is AnalyticDB for PostgreSQL, the output fields of the AnalyticDB for PostgreSQL input component support viewing the classification and grading of fields. This feature is not supported for non-AnalyticDB for PostgreSQL compute engines.
The output fields area displays all fields from the selected table that match the filtering conditions. You can perform the following operations:
Field Management: If you do not need to output certain fields to downstream components, you can delete those fields:
Single field deletion scenario: If you need to delete a small number of fields, you can click the
icon in the operation column to delete the unnecessary fields.Batch field deletion scenario: If you need to delete many fields, you can click Field Management. In the Field Management dialog box, select multiple fields, click the
left arrow icon to move the selected input fields to the unselected input fields, and then click OK to complete the batch deletion of fields.
Batch Add: Click Batch Add to configure fields in JSON, TEXT, or DDL format.
NoteAfter completing the batch addition and clicking OK, this will overwrite the previously configured field information.
Batch configuration in JSON format, for example:
// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]Noteindex indicates the column number of the specified object, name indicates the field name after import, and type indicates the field type after import. For example,
"index":3,"name":"user_id","type":"String"means importing the 4th column from the file, with the field name as user_id and field type as String.Batch configuration in TEXT format, for example:
// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2The row delimiter is used to separate the information for each field. The default is a line feed (\n), and it supports line feed (\n), semicolon (;), and period (.).
The column delimiter is used to separate the field name and field type. The default is a comma (,). It supports
','. The field type can be omitted, with the default being','.
Batch configuration in DDL format, for example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
Create Output Field: Click + Create Output Field, and fill in Column, Type, Comment, and select Mapping Type as prompted. After completing the configuration for the current row, click the
icon to save.
Click OK to complete the property configuration of the AnalyticDB for PostgreSQL input component.