Sync DB2 Data with Amazon RDS Input Component - Dataphin

The Amazon RDS for DB2 input component reads data from Amazon RDS for DB2 data sources. When you need to synchronize data from an Amazon RDS for DB2 data source to other data sources, you must first configure the source data source information read by the Amazon RDS for DB2 input component, and then configure the target data source for data synchronization. This topic describes how to configure the Amazon RDS for DB2 input component.

Prerequisites

Before you begin, make sure that you have completed the following operations:

You have created an Amazon RDS for DB2 data source. For more information, see Create an Amazon RDS for DB2 data source.
The account used to configure the Amazon RDS for DB2 input component properties must have the read-through permission for the data source. If you do not have the permission, you need to request the data source permission. For more information, see Request, renew, and return data source permissions.

Procedure

In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.
In the top navigation bar of the Integration page, select Project (In Dev-Prod mode, you need to select an environment).
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the left-side navigation pane of the Component Library panel, select Input. Find the Amazon RDS for DB2 component in the input component list on the right, and drag it to the canvas.
Click the icon in the Amazon RDS for DB2 input component card to open the Amazon RDS for DB2 Input Configuration dialog box.

In the Amazon RDS For DB2 Input Configuration dialog box, configure the parameters.

Parameter	Description
Step Name	The name of the Amazon RDS for DB2 input component. Dataphin automatically generates a step name, which you can modify based on your business scenario. The name must meet the following requirements: It can contain only Chinese characters, letters, underscores (_), and digits. It cannot exceed 64 characters in length.
Datasource	The data source dropdown list displays all Amazon RDS for DB2 data sources in the current Dataphin instance, including those for which you have read-through permissions and those for which you do not. Click the icon to copy the current data source name. For data sources for which you do not have read-through permissions, you can click Request next to the data source to request read-through permissions. For more information, see Request, renew, and return data source permissions. If you do not have an Amazon RDS for DB2 data source, click Create Data Source to create one. For more information, see Create an Amazon RDS for DB2 data source.
Table	You can enter a keyword to search for tables or enter the exact table name and click Exact Match. After you select a table, the system automatically checks the table status. Click the icon to copy the name of the selected table.
Shard Key (optional)	The system shards data based on the configured shard key field. This can be used with the concurrency configuration to implement concurrent reading. You can use a column in the source data table as the shard key. We recommend that you use the primary key or a column with an index as the shard key to ensure transmission performance. Important When you select a datetime type, the system identifies the maximum and minimum values, and performs sharding based on the total time range and concurrency. Even distribution is not guaranteed.
Batch Read Count (optional)	The number of records to read at a time. When reading data from the source database, you can configure a specific batch read count (such as 1,024 records) instead of reading records one by one. This reduces the number of interactions with the data source, improves I/O efficiency, and reduces network latency.
Input Filter (optional)	The filter condition for extracting data. The configuration instructions are as follows: Configure a static value to extract the corresponding data, for example, `ds=20210101`. Configure a variable parameter to extract a specific part of the data, for example, `ds=${bizdate}`.
Output Fields	The output fields area displays all fields that match the selected table and filter conditions. You can perform the following operations: Field Management: If you do not need to output certain fields to downstream components, you can delete these fields: Delete a single field: If you need to delete a small number of fields, you can click the icon in the Operation column to delete the unnecessary fields. Delete multiple fields in batch: If you need to delete many fields, you can click Field Management. In the Field Management dialog box, select multiple fields and click the left shift icon to move the selected input fields to the unselected input fields. Then click OK to complete the batch deletion of fields. Batch Add: Click Batch Add to configure fields in JSON, TEXT, or DDL format. Note After you complete the batch addition and click OK, the system will overwrite the configured field information. Batch configuration in JSON format, for example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note index indicates the column number of the specified object, name indicates the field name after import, and type indicates the field type after import. For example, `"index":3,"name":"user_id","type":"String"` means importing the 4th column of the file, with the field name as user_id and the field type as String. Batch configuration in TEXT format, for example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` The row delimiter is used to separate the information of each field. The default is a line feed (\n). Supported delimiters include line feed (\n), semicolon (;), and period (.). The column delimiter is used to separate the field name and field type. The default is a comma (,). Supported delimiter is `','`. The field type can be omitted, and the default is `','`. Batch configuration in DDL format, for example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create a new output field: Click + Create Output Field and fill in Column, Type, Remarks, and select Mapping Type as prompted. After you complete the configuration for the current row, click the icon to save it.

Click OK to complete the property configuration of the Amazon RDS for DB2 input component.