The DB2 data source provides a bidirectional channel for reading data from and writing data to DB2 databases. This feature helps you quickly process large amounts of data. This topic describes the data synchronization capabilities of DataWorks for DB2.
Supported versions
DB2 Reader and DB2 Writer use the IBM Data Server Driver for JDBC and SQLJ 4.11.77 driver. For more information about the version mapping between the DB2 driver and the database service, see the official documentation.
Limits
DB2 Reader and DB2 Writer support Serverless resource groups and exclusive resource groups for Data Integration. Public resource groups and custom resource groups are not supported.
DB2 Writer retrieves data from the Reader through the data synchronization framework and writes the data to DB2 using
insert intostatements. If a primary key or unique index conflict occurs, the conflicting rows are not written. For better performance, DB2 Writer usesPreparedStatement + BatchwithrewriteBatchedStatements=true. Data is buffered in the thread context buffer, and a write request is sent only when the buffer reaches a specified threshold.NoteThe task requires at least the
insert intopermission. Other permissions may be required depending on the statements that you specify in the preSql and postSql parameters when you configure the task.Offline synchronization supports reading data from views.
The DB2 AES encryption algorithm is not supported.
Supported field types
DB2 Reader and DB2 Writer support most DB2 data types for offline read and write operations. However, some data types are not supported. We recommend that you check your data types carefully.
The following table lists the DB2 data types that DB2 Reader and DB2 Writer support.
Type category | DB2 data type |
Integer types | SMALLINT |
Floating-point types | DECIMAL, REAL, and DOUBLE |
String types | CHAR, CHARACTER, VARCHAR, GRAPHIC, VARGRAPHIC, LONG VARCHAR, CLOB, LONG VARGRAPHIC, and DBCLOB |
Date and time types | DATE, TIME, and TIMESTAMP |
Boolean types | None |
Binary types | BLOB |
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Guide to configuring an offline sync task for a single table
For more information about the procedure, see Configure a task in the codeless UI and Configure a task in the code editor.
For all parameters and a script demo for the code editor, see Appendix: Script demo and parameter description.
Guide to configuring an offline read sync task for a full database
For more information about the procedure, see Configure a real-time sync task for a full database.
FAQ
Data restoration issues in active/standby synchronization
In an active/standby disaster recovery configuration for DB2, the secondary database continuously restores data from the primary database using binlogs. A time lag is inherent in this synchronization process. This lag, especially when exacerbated by network latency, means the data on the secondary database may not be a complete, real-time mirror of the primary database. This can result in significant data discrepancies between the two databases.
Consistency constraints
DB2 is a relational database management system (RDBMS) that provides query interfaces with strong consistency. For example, if another data source writes data to the database while a sync task is running, DB2 Reader does not retrieve the updated data because of the database's snapshot feature.
The preceding description applies to the data synchronization consistency of DB2 Reader in a single-threaded model. DB2 Reader can concurrently extract data based on your configurations. Therefore, strict data consistency cannot be guaranteed.
When DB2 Reader partitions data based on the splitPk parameter, it initiates multiple concurrent tasks to synchronize the data. These concurrent tasks do not belong to the same read transaction and are separated by time intervals. For raw data, snapshots read by multiple concurrent tasks may be incomplete or inconsistent.
A technical solution for consistent snapshots in a multi-threaded environment is not available. You can address this issue only from an engineering perspective, which involves trade-offs. Consider the following solutions based on your requirements:
Use single-threaded synchronization. In this mode, data is not partitioned. The disadvantage is that the synchronization speed is slow, but data consistency is well-maintained.
Disable other data writers to ensure that the source data is static. For example, you can lock tables or disable synchronization to the secondary database. The disadvantage is that this may affect online services.
Database encoding issues
DB2 Reader uses Java Database Connectivity (JDBC) to extract data. JDBC is natively compatible with various encodings and performs encoding conversion at the underlying layer. Therefore, you do not need to specify an encoding for DB2 Reader. DB2 Reader can automatically detect and transcode the encoding.
Methods for incremental data synchronization
DB2 Reader uses JDBC SELECT statements to extract data. Therefore, you can use
SELECT…WHERE…to perform incremental data extraction in the following ways:When an online application writes data to the database, a `modify` field is populated with a timestamp that indicates when the change occurred. This includes new, updated, and logically deleted data. For this type of application, you can configure DB2 Reader to add a `WHERE` clause that uses the timestamp of the last synchronization.
For new append-only data, you can configure DB2 Reader to add a `WHERE` clause that uses the maximum auto-increment ID from the previous synchronization.
If your business logic does not use a specific field to distinguish between new and modified data, DB2 Reader cannot perform incremental data synchronization. In this case, only full data synchronization is possible.
SQL security
DB2 Reader provides the querySql parameter that lets you customize SELECT statements for data extraction. DB2 Reader does not perform any security checks on the querySql statement.
Appendix: Script demo and parameter description
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script demo
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"db2",// The plug-in name.
"parameter":{
"password":"",// The password.
"jdbcUrl":"",// The JDBC connection string for the DB2 database.
"column":[
"id"
],
"where":"",// The filter condition.
"splitPk":"",// The field used for data partitioning.
"table":"",// The table name.
"username":""// The username.
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"stream",
"parameter":{},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The number of error records.
},
"speed":{
"throttle":true,// If you set throttle to false, the mbps parameter does not take effect and no rate limit is imposed. If you set throttle to true, a rate limit is imposed.
"concurrent":1, // The number of concurrent jobs.
"mbps":"12"// The rate limit. 1 mbps is equal to 1 MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}Reader script parameters
Parameter | Description | Required | Default value |
datasource | The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source. | Yes | None |
jdbcUrl | The JDBC connection string for the DB2 database. The JDBC URL must follow the official DB2 specifications. The format is | Yes | None |
username | The username for the data source. | Yes | None |
password | The password for the specified username of the data source. | Yes | None |
table | The table from which you want to synchronize data. A job can synchronize data from only one table. | Yes | None |
column | The columns that you want to synchronize from the configured table. Use a JSON array to describe the fields. By default, all columns are used. For example, [ * ]:
| Yes | None |
splitPk | When DB2 Reader extracts data, if you specify splitPk, you can use the field represented by splitPk for data partitioning. The data synchronization system then starts concurrent tasks to synchronize the data more efficiently:
| No | "" |
where | The filter condition. DB2 Reader constructs a SQL statement based on the specified column, table, and where parameters, and then extracts data based on the SQL statement. In business scenarios, you might want to synchronize data generated on the current day. You can set the where condition to | No | None |
querySql | In some business scenarios, the where parameter is not sufficient to describe the filter condition. You can use this parameter to customize the filter SQL statement. When you configure this parameter, the data synchronization system ignores the table and column parameters and uses the content of this parameter to filter data. For example, to synchronize data after a multi-table join, use | No | None |
fetchSize | This parameter specifies the number of records to retrieve in each batch from the database server. This value determines the number of network interactions between the data synchronization system and the server, which can significantly improve data extraction performance. Note If the value of fetchSize is too large (>2048), an out-of-memory (OOM) error may occur in the data synchronization process. | No | 1024 |
Writer script demo
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"db2",// The plug-in name.
"parameter":{
"postSql":[],// The SQL statement to execute after the data synchronization task.
"password":"",// The password.
"jdbcUrl":"jdbc:db2://ip:port/database",// The JDBC connection string for the DB2 database.
"column":[
"id"
],
"batchSize":1024,// The number of records to submit in a single batch.
"table":"",// The table name.
"username":"",// The username.
"preSql":[]// The SQL statement to execute before the data synchronization task.
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The number of error records.
},
"speed":{
"throttle":true,// If you set throttle to false, the mbps parameter does not take effect and no rate limit is imposed. If you set throttle to true, a rate limit is imposed.
"concurrent":1, // The number of concurrent jobs.
"mbps":"12"// The rate limit. 1 mbps is equal to 1 MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}Writer script parameters
Parameter | Description | Required | Default value |
jdbcUrl | The JDBC connection string for the DB2 database. The JDBC URL must follow the official DB2 specifications. The format is jdbc:db2://ip:port/database. You can also add connection attachment control information. | Yes | None |
username | The username for the data source. | Yes | None |
password | The password for the specified username of the data source. | Yes | None |
table | The table to which you want to synchronize data. | Yes | None |
column | The fields in the destination table to which you want to write data. Separate the fields with commas (,). Example: "column": ["id", "name", "age"]. To write to all columns in order, use an asterisk (*). Example: "column": ["*"] . | Yes | None |
preSql | The SQL statement to execute before the data synchronization task. Currently, only one SQL statement can be executed. For example, you can use this to purge old data. | No | None |
postSql | The SQL statement to execute after the data synchronization task. In the codeless UI, only one SQL statement is allowed. In the code editor, multiple SQL statements are supported. For example, you can use this to add a timestamp. | No | None |
batchSize | The number of records to submit in a single batch. This value can significantly reduce the number of network interactions between the data synchronization system and Db2, and improve overall throughput. If this value is set too high, an OOM error may occur in the data synchronization process. | No | 1024 |