The DB2 data source connects DataWorks to IBM DB2 databases for offline data synchronization. Use it to read from and write to DB2 in batch synchronization tasks.
Supported versions
DB2 Reader and DB2 Writer use the IBM Data Server Driver for JDBC and SQLJ 4.11.77. For version mapping between the DB2 driver and the database service, see the official documentation.
Supported capabilities
| Capability | Supported |
|---|---|
| Offline read (full synchronization) | Yes |
| Offline write (full synchronization) | Yes |
| Read from views | Yes |
| Incremental synchronization | Yes (via where filter or querySql) |
| AES encryption | No |
| Serverless resource groups | Yes |
| Exclusive resource groups for Data Integration | Yes |
| Public resource groups | No |
| Custom resource groups | No |
Supported field types
DB2 Reader and DB2 Writer support the following DB2 data types for offline read and write operations. Check your data types before configuring a synchronization task.
| Type category | DB2 data types |
|---|---|
| Integer types | SMALLINT |
| Floating-point types | DECIMAL, REAL, DOUBLE |
| String types | CHAR, CHARACTER, VARCHAR, GRAPHIC, VARGRAPHIC, LONG VARCHAR, CLOB, LONG VARGRAPHIC, DBCLOB |
| Date and time types | DATE, TIME, TIMESTAMP |
| Boolean types | None |
| Binary types | BLOB |
Limitations
-
Resource groups: DB2 Reader and DB2 Writer support Serverless resource groups and exclusive resource groups for Data Integration. Public resource groups and custom resource groups are not supported.
-
Write behavior: DB2 Writer writes data using
insert intostatements. If a primary key or unique index conflict occurs, the conflicting rows are not written. For performance, DB2 Writer usesPreparedStatement + BatchwithrewriteBatchedStatements=true, buffering data in a thread context buffer and sending a write request when the buffer reaches the configured threshold.The synchronization task requires at least the
insert intopermission. Additional permissions may be required depending on the SQL statements configured in thepreSqlandpostSqlparameters. -
Encryption: The DB2 AES encryption algorithm is not supported.
Add a data source
Add the DB2 data source to DataWorks before developing a synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Develop a data synchronization task
Offline sync task for a single table
-
For configuration steps, see Configure a task in the codeless UI and Configure a task in the code editor.
-
For all parameters and a script reference, see Appendix: Script demo and parameter description.
Offline sync task for a full database
For configuration steps, see Configure a real-time sync task for a full database.
FAQ
Why might the standby database be missing data during active/standby synchronization?
In an active/standby disaster recovery setup, the standby database continuously applies changes from the primary database via binlogs. This process has an inherent time lag, which network latency can make worse. As a result, the standby database may not reflect the primary database in real time, and significant data discrepancies between the two are possible.
Does DB2 Reader guarantee data consistency during concurrent extraction?
DB2 is a relational database management system (RDBMS) with strong consistency query interfaces. In single-threaded mode, DB2 Reader reads a consistent snapshot—if another source writes data while the task runs, DB2 Reader does not retrieve those updates.
When you configure concurrent extraction using the splitPk parameter, DB2 Reader launches multiple tasks that are not part of the same read transaction and run at different times. In this case, the snapshots read by concurrent tasks may be incomplete or inconsistent. Two workarounds are available:
-
Use single-threaded synchronization (no data partitioning). This preserves data consistency but reduces synchronization speed.
-
Disable writes to the source during synchronization, for example by locking tables or pausing replication to the standby database. This keeps source data static but may affect online services.
Do I need to configure the encoding for DB2 Reader?
No. DB2 Reader uses Java Database Connectivity (JDBC) to extract data. JDBC handles encoding conversion at the underlying layer, so DB2 Reader automatically detects and transcodes the encoding.
How do I set up incremental data synchronization?
DB2 Reader extracts data using JDBC SELECT statements, so you can use SELECT...WHERE... for incremental extraction in two ways:
-
If the source application populates a timestamp field (such as a
modifycolumn) when data changes, configure awhereclause using the timestamp from the last synchronization run. This captures new, updated, and logically deleted rows. -
For append-only data, configure a
whereclause using the maximum auto-increment ID from the previous synchronization run.
If no field distinguishes new data from existing data, only full synchronization is possible.
Is the querySql parameter safe from SQL injection?
DB2 Reader does not perform security checks on the querySql statement. Review and validate any custom SQL before configuring this parameter.
Appendix: Script demo and parameter description
Use the following scripts and parameter references when configuring a batch synchronization task in the code editor. For the general procedure, see Configure a task in the code editor.
Reader script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "db2",
"parameter": {
"datasource": "",
"jdbcUrl": "",
"username": "",
"password": "",
"table": "",
"column": [
"id"
],
"where": "",
"splitPk": ""
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Reader parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
datasource |
The name of the data source as configured in DataWorks. | Yes | None |
jdbcUrl |
The JDBC URL for the DB2 database. Format: jdbc:db2://ip:port/database. You can append connection control attributes. |
Yes | None |
username |
The username for the data source. | Yes | None |
password |
The password for the username. | Yes | None |
table |
The source table. A single job can read from one table only. | Yes | None |
column |
The columns to read, specified as a JSON array. All columns: ["*"]. Supports column pruning, column reordering, and constants. Constants must follow DB2 SQL syntax, for example: ["id", "1", "'const name'", "null", "upper('abc_lower')", "2.3", "true"]. Cannot be empty. |
Yes | None |
splitPk |
The field used to partition data for concurrent extraction. Use the table's primary key for even distribution and to avoid data hotspots. Only integer fields are supported—floating-point numbers, strings, and dates are not supported. If an unsupported type is specified, DB2 Reader returns an error. | No | "" |
where |
A filter condition. DB2 Reader constructs a SELECT statement from column, table, and where, then extracts matching data. For example, set where to gmt_create>$bizdate for daily incremental synchronization. If left blank, all rows in the table are synchronized. |
No | None |
querySql |
A custom SELECT statement for cases where where is not sufficient—for example, to synchronize results of a multi-table join: select a,b from table_a join table_b on table_a.id = table_b.id. When querySql is set, the table, column, and where parameters are ignored. |
No | None |
fetchSize |
The number of records retrieved per batch from the database. A larger value reduces network round trips and improves extraction performance. | No | 1024 |
If fetchSize exceeds 2048, an out-of-memory (OOM) error may occur during synchronization.
Writer script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "db2",
"parameter": {
"jdbcUrl": "jdbc:db2://ip:port/database",
"username": "",
"password": "",
"table": "",
"column": [
"id"
],
"preSql": [],
"postSql": [],
"batchSize": 1024
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Writer parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
jdbcUrl |
The JDBC URL for the DB2 database. Format: jdbc:db2://ip:port/database. You can append connection control attributes. |
Yes | None |
username |
The username for the data source. | Yes | None |
password |
The password for the username. | Yes | None |
table |
The destination table. | Yes | None |
column |
The destination columns to write to, specified as a JSON array. Example: ["id", "name", "age"]. To write to all columns in order, use ["*"]. |
Yes | None |
preSql |
A SQL statement to execute before the synchronization task starts. Use this to clear old data or perform setup operations. Only one statement is supported. | No | None |
postSql |
A SQL statement to execute after the synchronization task completes. In the codeless UI, only one statement is supported. In the code editor, multiple statements are supported. | No | None |
batchSize |
The number of records submitted per batch. A larger value reduces network round trips and improves write throughput. If set too high, an OOM error may occur. | No | 1024 |