Sync IBM DB2 Data in Batch Pipelines via Input Component - Dataphin

The IBM DB2 input component enables reading data from an IBM DB2 data source. When synchronizing data from an IBM DB2 data source to other destinations, it is essential to configure the source information for the IBM DB2 input component before setting up the target data source for synchronization. This topic describes the configuration process for the IBM DB2 input component.

Prerequisites

Before you begin, ensure the following prerequisites are met:

An IBM DB2 data source has been created. For more information, see Create an IBM DB2 data source.
To configure the IBM DB2 input component properties, the account must possess read-through permissions for the data source. Should the account lack these permissions, you must obtain them for the data source. For more information, see Request, renew, and return data source permissions .

Procedure

On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.
At the top of the integration page, select Project (choose an environment in Dev-Prod mode).
In the left-side navigation pane, click Batch Pipeline. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
Click Component Library in the upper-right corner to open the Component Library panel.
In the Component Library panel's left-side navigation, select Input. Then, locate the IBM DB2 component in the list on the right and drag it onto the canvas.
Click the icon on the IBM DB2 input component card to open the IBM DB2 Input Configuration dialog box.

In the IBM DB2 Input Configuration dialog box, set the parameters.

Parameter	Description
Step Name	The name of the IBM DB2 input component. Dataphin automatically generates a step name. You can also modify the name based on your business scenario. The naming conventions are as follows: Only Chinese characters, letters, underscores (_), and digits are allowed. The name cannot exceed 64 characters.
Datasource	The drop-down list of data sources displays all IBM DB2 data sources in Dataphin, including data sources for which you have read-through permissions and those for which you do not. Click the icon to copy the name of the current data source. For data sources without read-through permissions, you can click Request next to the data source to request read-through permissions. For more information, see Request, renew, and return data source permissions. If you do not have an IBM DB2 data source, click Create Data Source to create one. For more information, see Create an IBM DB2 data source.
Table	You can search for a table by entering a keyword or enter the exact table name and click Exact Search. After you select a table, the system automatically detects the table status. Click the icon to copy the name of the selected table.
Shard Key (Optional)	The system shards data based on the configured shard key field. You can use this feature with the concurrency configuration to achieve concurrent reading. You can use a column in the source data table as the shard key. It is recommended to use a primary key or an indexed column as the shard key to ensure transmission performance. Important If you select a date-time type, the system identifies the maximum and minimum values and performs brute-force sharding based on the total time range and concurrency. Even distribution is not guaranteed.
Batch Read Count (Optional)	The number of records to read at a time. When reading data from the source database, you can configure a specific batch read count (such as 1,024 records) instead of reading one record at a time. This reduces the number of interactions with the data source, improves I/O efficiency, and reduces network latency.
Input Filter (Optional)	Configure the filter conditions for extracting data. The configuration details are as follows: Configure a static field to extract the corresponding data. For example, `ds=20210101`. Configure a variable parameter to extract a specific portion of data. For example, `ds=${bizdate}`.
Output Fields	The output fields section displays all fields in the selected table that match the filter conditions. You can perform the following operations: Field Management: If you do not need to output certain fields to downstream components, you can delete the corresponding fields: Delete a single field: If you need to delete a small number of fields, click the icon in the operation column to delete the extra fields. Delete multiple fields: If you need to delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, click the left arrow icon to move the selected input fields to the unselected input fields, and then click Confirm to complete the batch deletion of fields. Batch addition: Click Batch Addition to support batch configuration in JSON, TEXT, and DDL formats. Note After batch adding is complete, clicking Confirm will overwrite the existing field information. Batch configure fields in JSON format. For example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note The index specifies the column number of the object. The name specifies the name of the field after it is imported. The type specifies the type of the field after it is imported. For example, `"index":3,"name":"user_id","type":"String"` indicates that the fourth column in the file is imported. The field name is user_id, and the field type is String. Batch configure fields in TEXT format. For example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` The row delimiter is used to separate the information of each field. The default is a line feed (\n). The supported delimiters include line feed (\n), semicolon (;), and period (.). The column delimiter is used to separate field names from field types, with the default being a half-width comma (,). It can support`','`. The field type can be omitted, with the default being`','`. Batch configure fields in DDL format. For example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create a new output field: Click +create Output Field. Follow the instructions on the page to fill in Column, Type, and Comment, and select Mapping Type. After completing the configuration for the current row, click the icon to save.

Click Confirm to finalize the IBM DB2 input component's property configuration.