Greenplum input components enable data reading from Greenplum data sources. To synchronize data from Greenplum to other data sources, configure the Greenplum input component to access the data source, then set up the target data source for synchronization. This topic outlines the configuration process for Greenplum input components.
Prerequisites
A Greenplum data source has been established. For more information, see create GreenPlum data source or .
To configure the Greenplum input component properties, the account must possess read-through permissions for the data source. Without these permissions, you must request data source permission. For additional details, see .
Procedure
For more information, see offline pipeline component development entry or to access the development page for the offline pipeline script.
Navigate to the Greenplum Input Configuration page by following these steps:
Click Component Library, then click Input. Next, drag the Greenplum input component onto the canvas. Finally, click the
to access its configuration.
In the Greenplum Input Configuration dialog box, set the parameters as follows:
Parameter
Description
Step Name
Assign a name to the Greenplum input component. Dataphin generates a default name, but you can modify it to suit your business needs. The naming convention allows Chinese characters, letters, underscores (_), and numbers, and must not exceed 64 characters.
Must consist of Chinese characters, letters, underscores (_), and numbers only.
Should not be longer than 64 characters.
Datasource
The data source drop-down list in Dataphin shows all Greenplum data sources, encompassing both those with read-through permissions and those without.
If you lack read-through permission for a data source, simply Request it by single-clicking after the data source's name. For more information, see request data source permission or .
If you have not yet established a Greenplum data source, simply click Create to set one up. For additional details, see create GreenPlum data source or .
Schema
Enables reading tables across different schemas. You must select the appropriate schema containing the source table. While the data source link may include schema information by default, you have the flexibility to choose alternative schemas, provided you have the necessary permissions.
Source Table Quantity
Choose the quantity of source tables, which can be either a single table or multiple tables:
Single table: Ideal for situations where data from a single source table needs to be synchronized with a single target table.
Multiple tables: Appropriate for cases where data from several source tables are consolidated into one target table. This supports various matching patterns, including enumeration form, regex-like form, and mixed form, exemplified by
table_[001-100];table_102.
Shard Key
For optimal data sharding and concurrent reading, it is advisable to use an integer type column from the source table as the shard key. Preferably, select the primary key or an indexed column to enhance data synchronization efficiency.
Batch Read Count
Specify the number of records to read in a batch to optimize I/O efficiency and reduce network latency, as opposed to reading records individually.
Input Filter
Set up the filter conditions to extract data according to the following configuration instructions:
Set up static fields to capture specific data, for example,
ds=20210101.Establish variable parameters to retrieve data segments, such as
ds=${bizdate}.
Output fields
The output fields area shows all fields affected by the chosen table and filter criteria. To prevent specific fields from being sent to downstream components, you can remove them:
Single field deletion scenario: If you need to delete a small number of fields, you can single click the
icon in the operation column to delete the extra fields.Batch field deletion scenario: To delete multiple fields at once, simply click Field Management, choose the desired fields within the Field Management dialog box, then click
the left shift icon to transfer the selected fields to the unselected list, and click Confirm to finalize the batch deletion.
To finalize the configuration, click Confirm.