Read from and write to DB2 for data synchronization - DataWorks

Supported versions

DB2 Reader and DB2 Writer use the IBM Data Server Driver for JDBC and SQLJ 4.11.77. For version mapping between the DB2 driver and the database service, see the official documentation.

Supported capabilities

Capability	Supported
Offline read (full synchronization)	Yes
Offline write (full synchronization)	Yes
Read from views	Yes
Incremental synchronization	Yes (via `where` filter or `querySql`)
AES encryption	No
Serverless resource groups	Yes
Exclusive resource groups for Data Integration	Yes
Public resource groups	No
Custom resource groups	No

Supported field types

DB2 Reader and DB2 Writer support the following DB2 data types for offline read and write operations. Check your data types before configuring a synchronization task.

Type category	DB2 data types
Integer types	SMALLINT
Floating-point types	DECIMAL, REAL, DOUBLE
String types	CHAR, CHARACTER, VARCHAR, GRAPHIC, VARGRAPHIC, LONG VARCHAR, CLOB, LONG VARGRAPHIC, DBCLOB
Date and time types	DATE, TIME, TIMESTAMP
Boolean types	None
Binary types	BLOB

Limitations

Resource groups: DB2 Reader and DB2 Writer support Serverless resource groups and exclusive resource groups for Data Integration. Public resource groups and custom resource groups are not supported.
Write behavior: DB2 Writer writes data using insert into statements. If a primary key or unique index conflict occurs, the conflicting rows are not written. For performance, DB2 Writer uses PreparedStatement + Batch with rewriteBatchedStatements=true, buffering data in a thread context buffer and sending a write request when the buffer reaches the configured threshold.

The synchronization task requires at least the insert into permission. Additional permissions may be required depending on the SQL statements configured in the preSql and postSql parameters.
Encryption: The DB2 AES encryption algorithm is not supported.

Add a data source

Add the DB2 data source to DataWorks before developing a synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.

Develop a data synchronization task

Offline sync task for a single table

For configuration steps, see Configure a task in the codeless UI and Configure a task in the code editor.
For all parameters and a script reference, see Appendix: Script demo and parameter description.

Offline sync task for a full database

For configuration steps, see Configure a real-time sync task for a full database.

FAQ

Why might the standby database be missing data during active/standby synchronization?

In an active/standby disaster recovery setup, the standby database continuously applies changes from the primary database via binlogs. This process has an inherent time lag, which network latency can make worse. As a result, the standby database may not reflect the primary database in real time, and significant data discrepancies between the two are possible.

Does DB2 Reader guarantee data consistency during concurrent extraction?

DB2 is a relational database management system (RDBMS) with strong consistency query interfaces. In single-threaded mode, DB2 Reader reads a consistent snapshot—if another source writes data while the task runs, DB2 Reader does not retrieve those updates.

When you configure concurrent extraction using the splitPk parameter, DB2 Reader launches multiple tasks that are not part of the same read transaction and run at different times. In this case, the snapshots read by concurrent tasks may be incomplete or inconsistent. Two workarounds are available:

Use single-threaded synchronization (no data partitioning). This preserves data consistency but reduces synchronization speed.
Disable writes to the source during synchronization, for example by locking tables or pausing replication to the standby database. This keeps source data static but may affect online services.

Do I need to configure the encoding for DB2 Reader?

No. DB2 Reader uses Java Database Connectivity (JDBC) to extract data. JDBC handles encoding conversion at the underlying layer, so DB2 Reader automatically detects and transcodes the encoding.

How do I set up incremental data synchronization?

DB2 Reader extracts data using JDBC SELECT statements, so you can use SELECT...WHERE... for incremental extraction in two ways:

If the source application populates a timestamp field (such as a modify column) when data changes, configure a where clause using the timestamp from the last synchronization run. This captures new, updated, and logically deleted rows.
For append-only data, configure a where clause using the maximum auto-increment ID from the previous synchronization run.

If no field distinguishes new data from existing data, only full synchronization is possible.

Is the querySql parameter safe from SQL injection?

DB2 Reader does not perform security checks on the querySql statement. Review and validate any custom SQL before configuring this parameter.

Appendix: Script demo and parameter description

Use the following scripts and parameter references when configuring a batch synchronization task in the code editor. For the general procedure, see Configure a task in the code editor.

Reader script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "db2",
            "parameter": {
                "datasource": "",
                "jdbcUrl": "",
                "username": "",
                "password": "",
                "table": "",
                "column": [
                    "id"
                ],
                "where": "",
                "splitPk": ""
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader parameters

Parameter	Description	Required	Default
`datasource`	The name of the data source as configured in DataWorks.	Yes	None
`jdbcUrl`	The JDBC URL for the DB2 database. Format: `jdbc:db2://ip:port/database`. You can append connection control attributes.	Yes	None
`username`	The username for the data source.	Yes	None
`password`	The password for the username.	Yes	None
`table`	The source table. A single job can read from one table only.	Yes	None
`column`	The columns to read, specified as a JSON array. All columns: `["*"]`. Supports column pruning, column reordering, and constants. Constants must follow DB2 SQL syntax, for example: `["id", "1", "'const name'", "null", "upper('abc_lower')", "2.3", "true"]`. Cannot be empty.	Yes	None
`splitPk`	The field used to partition data for concurrent extraction. Use the table's primary key for even distribution and to avoid data hotspots. Only integer fields are supported—floating-point numbers, strings, and dates are not supported. If an unsupported type is specified, DB2 Reader returns an error.	No	`""`
`where`	A filter condition. DB2 Reader constructs a `SELECT` statement from `column`, `table`, and `where`, then extracts matching data. For example, set `where` to `gmt_create>$bizdate` for daily incremental synchronization. If left blank, all rows in the table are synchronized.	No	None
`querySql`	A custom `SELECT` statement for cases where `where` is not sufficient—for example, to synchronize results of a multi-table join: `select a,b from table_a join table_b on table_a.id = table_b.id`. When `querySql` is set, the `table`, `column`, and `where` parameters are ignored.	No	None
`fetchSize`	The number of records retrieved per batch from the database. A larger value reduces network round trips and improves extraction performance.	No	`1024`

If fetchSize exceeds 2048, an out-of-memory (OOM) error may occur during synchronization.

Writer script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "db2",
            "parameter": {
                "jdbcUrl": "jdbc:db2://ip:port/database",
                "username": "",
                "password": "",
                "table": "",
                "column": [
                    "id"
                ],
                "preSql": [],
                "postSql": [],
                "batchSize": 1024
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Writer parameters

Parameter	Description	Required	Default
`jdbcUrl`	The JDBC URL for the DB2 database. Format: `jdbc:db2://ip:port/database`. You can append connection control attributes.	Yes	None
`username`	The username for the data source.	Yes	None
`password`	The password for the username.	Yes	None
`table`	The destination table.	Yes	None
`column`	The destination columns to write to, specified as a JSON array. Example: `["id", "name", "age"]`. To write to all columns in order, use `["*"]`.	Yes	None
`preSql`	A SQL statement to execute before the synchronization task starts. Use this to clear old data or perform setup operations. Only one statement is supported.	No	None
`postSql`	A SQL statement to execute after the synchronization task completes. In the codeless UI, only one statement is supported. In the code editor, multiple statements are supported.	No	None
`batchSize`	The number of records submitted per batch. A larger value reduces network round trips and improves write throughput. If set too high, an OOM error may occur.	No	`1024`