The openGauss input component reads data from an openGauss data source. To synchronize data from an openGauss data source to another data source, first configure the openGauss input component to read the source data, and then configure the destination data source. This topic describes how to configure the openGauss input component.
Prerequisites
An openGauss data source is created. For more information, see Create an openGauss data source.
The account used to configure the openGauss input component must have read-through permissions on the data source. If the account does not have the required permissions, request them. For more information, see Request, renew, and return data source permissions.
Procedure
On the Dataphin home page, choose Develop > Data Integration from the top menu bar.
In the top menu bar of the Data Integration page, select a Project. If you are in Dev-Prod mode, you must also select an environment.
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the batch pipeline that you want to develop to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the navigation pane on the left of the Component Library panel, select Input. Find the openGauss component in the list of input components on the right and drag it to the canvas.
Click the
icon on the openGauss input component card to open the openGauss Input Configuration dialog box.In the openGauss Input Configuration dialog box, configure the parameters.
Parameter
Description
Step Name
The name of the openGauss input component. Dataphin automatically generates a step name. You can also change it as needed. The naming convention is as follows:
Can contain only Chinese characters, letters, underscores (_), and numbers.
Cannot exceed 64 characters in length.
Datasource
The drop-down list displays all openGauss data sources in the current Dataphin project. This includes data sources for which you have read-through permissions and those for which you do not. Click the
icon to copy the current data source name.For data sources where you lack read-through permissions, click Request next to the data source to request the permissions. For more information, see Request, renew, and return data source permissions.
If you do not have an openGauss data source, click Create to create one. For more information, see Create an openGauss data source.
Schema
Cross-schema table reads are supported. Select the schema where the source table is located.
Number of source tables
Select the number of source tables. Options include Single table and Multiple tables:
Single table: Use this option to synchronize data from one table to one destination table.
Multiple tables: Use this option to synchronize data from multiple tables to a single destination table. When data from multiple tables is written to one table, a union algorithm is used.
Table match pattern
Select General rule or Database regex.
NoteThis parameter is available only when Number of source tables is set to Multiple tables.
Table
Select the source table or tables:
If Number of source tables is set to Single table, you can enter a keyword to search for the table name, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the
icon to copy the name of the selected table.If Number of source tables is set to Multiple tables, enter an expression to add tables based on the selected match pattern.
If Table match pattern is set to General rule, enter an expression in the input box to filter for tables with the same structure. The system supports enumerations, regular expression-like patterns, and a mix of both. For example:
table_[001-100];table_102;.If Table match pattern is set to Database regex, enter a regular expression supported by the current database. The system uses this expression to match tables in the destination database. At runtime, the node matches the new range of tables based on the database regular expression for synchronization.
After you enter the expression, click Exact Match to view a list of matched tables in the Confirm Match Details dialog box.
Split key
Use a column of an integer type in the source table as the split key. A primary key or an indexed column is recommended for the split key. When reading data, the system partitions the data based on the configured split key to enable concurrent reads. This improves data synchronization efficiency.
Batch read size
The number of records to read at one time. When reading from the source database, configure a specific batch size, such as 1024 records, instead of reading one record at a time. This reduces interactions with the data source, improves I/O efficiency, and lowers network latency.
Input filter
Enter the filter conditions for the input. For example,
ds=${bizdate}. The Input filter is suitable for the following scenarios:A fixed portion of data.
Parameter-based filtering.
Output fields
The Output fields section displays all fields from the selected tables that match the filter conditions. The following operations are supported:
Field management: If you do not need to output certain fields to a downstream component, you can delete them:
To delete a single field: Click the
icon in the Actions column to delete an unnecessary field.To delete multiple fields in a batch: Click Field Management. In the Field Management dialog box, select multiple fields, click the
left arrow icon to move the selected input fields to the unselected input fields list, and then click OK.
Batch add: Click Batch Add to configure fields in a batch using JSON, TEXT, or DDL format.
NoteAfter you add fields in a batch and click OK, the existing field configuration is overwritten.
To configure in JSON format, for example:
// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]Noteindex specifies the column number of the object. name specifies the field name after import. type specifies the field type after import. For example,
"index":3,"name":"user_id","type":"String"means that the fourth column of the file is imported with the field name user_id and the field type String.To configure in TEXT format, for example:
// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2The row delimiter separates the information for each field. The default delimiter is a line feed (\n). Semicolons (;) and periods (.) are also supported.
The column delimiter separates field names from field types. The default is a half-width comma (,). It supports
','. The field type is optional and defaults to','.
To configure in DDL format, for example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
Create an output field: Click + Create Output Field. Follow the on-screen instructions to enter the Column, Type, and Comment, and select the Mapping Type. After you configure the current row, click the
icon to save.
Click Confirm to save the configuration of the openGauss input component.