This topic describes the data types and parameters that Db2 Reader supports and how to configure it by using the code editor.
Background information
Db2 Reader allows you to read data from Db2. Db2 Reader connects to a remote Db2 database and executes a SELECT statement to select and read data from the database.
- Db2 Reader generates the SELECT statement based on the table, column, and where parameters that you have configured, and sends the generated SELECT statement to the Db2 database.
- If you specify the querySql parameter, Db2 Reader directly sends the value of this parameter to the Db2 database.
The version of the Db2 JDBC driver that Db2 Reader uses is IBM Data Server Driver for JDBC and SQLJ 4.11.77. For more information about the mapping between the versions of Db2 JDBC drivers and Db2 versions, see IBM Support.
Data types
Db2 Reader supports most Db2 data types. Make sure that your data types are supported.
Category | Db2 data type |
---|---|
Integer | SMALLINT |
Floating point | DECIMAL, REAL, and DOUBLE |
String | CHAR, CHARACTER, VARCHAR, GRAPHIC, VARGRAPHIC, LONG VARCHAR, CLOB, LONG VARGRAPHIC, and DBCLOB |
Date and time | DATE, TIME, and TIMESTAMP |
Boolean | N/A |
Binary | BLOB |
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
datasource | The connection name. It must be the same as the name of the created connection. You can create connections in the code editor. | Yes | N/A |
jdbcUrl | The JDBC URL of the Db2 database. In accordance with official Db2 specifications,
the URL must be in the jdbc:db2://ip:port/database format. You can also specify the information of the attachment facility.
|
Yes | N/A |
username | The username that you can use to connect to the database. | Yes | N/A |
password | The password that you can use to connect to the database. | Yes | N/A |
table | The name of the source table from which Db2 Reader reads data. You can select only one source table for each sync node. | Yes | N/A |
column | The columns to be synchronized from the source table. The columns are described in
a JSON array. The default value is [*], which indicates all columns.
|
Yes | N/A |
splitPk | The field that is used for data sharding when Db2 Reader reads data. If you specify
the splitPk parameter, the table is sharded based on the shard key that is specified by this
parameter. Data Integration then runs concurrent threads to synchronize data. This way, data
can be synchronized more efficiently.
|
No | "" |
where | The WHERE clause. Db2 Reader generates a SELECT statement based on the table, column, and where parameters that you have configured, and uses the generated SELECT statement to select
and read data.
To read data that is generated on the current day, set the where parameter to |
No | N/A |
querySql | The SELECT statement that is used for refined data filtering. If you specify this parameter,
Data Integration filters data based on this parameter. For example, if you want to join multiple tables for data synchronization, set this
parameter to |
No | N/A |
fetchSize | The number of data records to read at a time. This parameter determines the number
of interactions between Data Integration and the database and affects reading efficiency.
Note A value greater than 2048 may lead to out of memory (OOM) during the data synchronization
process.
|
No | 1024 |
Configure Db2 Reader by using the codeless UI
The codeless user interface (UI) is not supported for Db2 Reader.
Configure Db2 Reader by using the code editor
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"db2",// The reader type.
"parameter":{
"password":"",// The password that you can use to connect to the Db2 database.
"jdbcUrl":"",// The JDBC URL of the Db2 database.
"column":[
"id"
],
"where":"",// The WHERE clause.
"splitPk":"",// The shard key.
"table":"",// The name of the source table.
"username":""// The username that you can use to connect to the Db2 database.
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"stream",
"parameter":{},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":false,// Specifies whether to enable bandwidth throttling.
"concurrent":1,// The maximum number of concurrent threads.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Usage notes
- Data synchronization between primary and secondary databases
A secondary Db2 database can be deployed for disaster recovery. The secondary database continuously synchronizes data from the primary database based on binlogs. Especially when network conditions are unfavorable, data latency between the primary and secondary databases is unavoidable, which can lead to data inconsistency.
- Concurrency control
Db2 is a relational database management system (RDBMS), which ensures strong consistency for data queries. A database snapshot is created before a sync node starts. Db2 Reader reads data from the database snapshot. Therefore, if new data is written to the database during data synchronization, Db2 Reader cannot obtain the new data.
Data consistency cannot be ensured when you enable Db2 Reader to run concurrent threads in a single sync node.
Db2 Reader shards the table based on the splitPk parameter and runs multiple concurrent threads to synchronize data. These concurrent threads belong to different transactions. They read data at different time points. This means that the concurrent threads observe different snapshots.
Theoretically, the data inconsistency issue is unavoidable if a single sync node includes multiple threads. However, two workarounds are available:- Do not enable concurrent threads in a single sync node. Essentially, do not specify the splitPk parameter. This way, data consistency is ensured although data is synchronized at a low efficiency.
- Disable writers to ensure that the data remains unchanged during data synchronization. For example, lock the table and disable data synchronization between primary and secondary databases. This way, data is efficiently synchronized but your ongoing services may be interrupted.
- Character encoding
Db2 Reader uses JDBC, which can automatically convert the encoding format of characters. Therefore, you do not need to specify the encoding format.
- Incremental data synchronization
Db2 Reader connects to a database by using JDBC and uses a SELECT statement with a
WHERE
clause to read incremental data.- For data in batches, incremental add, update, and delete operations are distinguished by timestamps. The delete operations include logical delete operations. Specify the WHERE clause based on the timestamp. The timestamp must be later than the latest timestamp in the last synchronization.
- For streaming data, specify the WHERE clause based on the data record ID. The data record ID must be greater than the maximum ID in the last synchronization.
If incremental data cannot be distinguished, Db2 Reader can perform only full synchronization but not incremental synchronization.
- Syntax validation
Db2 Reader allows you to specify custom SELECT statements by using the querySql parameter but does not verify the syntax of the custom SELECT statements.