The AnalyticDB for PostgreSQL input component is designed to read data from an AnalyticDB for PostgreSQL data source. When synchronizing data from this source to other destinations, it's necessary to configure the source data source information for the AnalyticDB for PostgreSQL input component before setting up the target data source for synchronization. This topic describes the configuration process for the AnalyticDB for PostgreSQL input component.
Prerequisites
An AnalyticDB for PostgreSQL data source has been created. For more information, see Create AnalyticDB for PostgreSQL Data Source.
To configure the properties of the AnalyticDB for PostgreSQL input component, the account must have read-through permission for the data source. If you lack the necessary permissions, you must obtain them from the data source. For more information, see Request Data Source Permission.
Procedure
On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.
In the top menu bar on the integration page, select Project (the Dev-Prod mode requires selecting an environment).
In the left-side navigation pane, click on the Batch Pipeline. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
Click Component Library in the upper-right corner to open the Component Library panel.
In the Component Library panel's left-side navigation pane, select Input. Then, in the right-side list of input components, locate the AnalyticDB for PostgreSQL component and drag it onto the canvas.
To configure the component, click the
icon on the AnalyticDB for PostgreSQL input component card, which opens the AnalyticDB for PostgreSQL Input Configuration dialog box.
In the AnalyticDB for PostgreSQL Input Configuration dialog box, set the required parameters as listed.
Parameter
Description
Step Name
This is the name of the AnalyticDB for PostgreSQL input component. Dataphin automatically generates the step name, but you can modify it according to the business scenario. The naming convention is as follows:
It can only contain Chinese characters, letters, underscores (_), and numbers.
It can be up to 64 characters in length.
Datasource
The data source drop-down list displays all AnalyticDB for PostgreSQL type data sources and project levels in the current Dataphin, including data sources with or without read-through permission. Click the
icon to copy the current data source name.
For data sources without read-through permission, you can click Request after the data source to request read-through permission for the data source. For more information, see Request Data Source Permission.
If you do not have an AnalyticDB for PostgreSQL type data source, click Create Data Source to create a data source. For more information, see Create AnalyticDB for PostgreSQL Data Source.
Schema (optional)
Cross-schema table selection is supported. Please select the schema where the table is located. If not specified, the default is the schema configured in the data source.
Source Table Quantity
Select the source table quantity. The source table quantity includes Single Table and Multiple Tables:
Single Table: Suitable for scenarios where business data from one table is synchronized to one target table.
Multiple Tables: Suitable for scenarios where business data from multiple tables is synchronized to the same target table. When multiple tables' data is written to the same data table, the union algorithm is used.
Table
Select the source table:
If Source Table Quantity is set to Single Table, you can enter a table name keyword for search or enter the exact table name and then click Precise Search. After selecting the table, the system will automatically detect the table status. Click the
icon to copy the name of the currently selected table.
If Source Table Quantity is set to Multiple Tables, perform the following operations to add tables.
In the input box, enter the expression of the table to filter tables with the same structure.
The system supports enumeration form, class regular form, and a mixed form of both. For example,
table_[001-100];table_102
.Click Precise Search to view the list of matching tables in the Confirm Match Details dialog box.
Click Confirm.
Shard Key (optional)
The system shards data based on the configured shard key field, which can be used in conjunction with concurrency configuration to achieve concurrent reading. It supports using a column in the source data table as the shard key. Additionally, it is recommended to use the primary key or a column with an index as the shard key to ensure transmission performance.
ImportantWhen selecting a date-time type, the system will perform brute-force sharding based on the total time range and concurrency by detecting the maximum and minimum values. Average is not guaranteed.
Batch Read Count (optional)
The number of data records read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.
Input Filter (optional)
Configure the filter conditions for extracting data. The detailed configuration instructions are as follows:
Configure a static field to extract the corresponding data, for example
ds=20210101
.Configure variable parameters to extract a certain part of the data, for example
ds=${bizdate}
.
Output Fields
The output fields area displays all fields hit by the selected table and filter conditions. If you do not need to output certain fields to downstream components, you can delete the corresponding fields:
NoteWhen the tenant's compute engine is AnalyticDB for PostgreSQL, the output fields of the AnalyticDB for PostgreSQL input component support viewing the classification and level of fields. Non-AnalyticDB for PostgreSQL compute engines do not support this.
Single Field Deletion Scenario: If you need to delete a small number of fields, you can click the
icon under the operation column to delete the extra fields.
Batch Field Deletion Scenario: If you need to delete many fields, you can click Field Management, select multiple fields in the Field Management dialog box, then click the
shift-left icon to move the selected input fields to the unselected input fields and click Confirm to complete the batch deletion of fields.
Click Confirm to finalize the property configuration for the AnalyticDB for PostgreSQL input component.