All Products
Search
Document Center

DataWorks:DB2

Last Updated:Feb 28, 2026

The DB2 data source provides a bidirectional channel for reading data from and writing data to DB2 databases. This feature helps you quickly process large amounts of data. This topic describes the data synchronization capabilities of DataWorks for DB2.

Supported versions

DB2 Reader and DB2 Writer use the IBM Data Server Driver for JDBC and SQLJ 4.11.77 driver. For more information about the version mapping between the DB2 driver and the database service, see the official documentation.

Limits

  • DB2 Reader and DB2 Writer support Serverless resource groups and exclusive resource groups for Data Integration. Public resource groups and custom resource groups are not supported.

  • DB2 Writer retrieves data from the Reader through the data synchronization framework and writes the data to DB2 using insert into statements. If a primary key or unique index conflict occurs, the conflicting rows are not written. For better performance, DB2 Writer uses PreparedStatement + Batch with rewriteBatchedStatements=true. Data is buffered in the thread context buffer, and a write request is sent only when the buffer reaches a specified threshold.

    Note

    The task requires at least the insert into permission. Other permissions may be required depending on the statements that you specify in the preSql and postSql parameters when you configure the task.

  • Offline synchronization supports reading data from views.

  • The DB2 AES encryption algorithm is not supported.

Supported field types

DB2 Reader and DB2 Writer support most DB2 data types for offline read and write operations. However, some data types are not supported. We recommend that you check your data types carefully.

The following table lists the DB2 data types that DB2 Reader and DB2 Writer support.

Type category

DB2 data type

Integer types

SMALLINT

Floating-point types

DECIMAL, REAL, and DOUBLE

String types

CHAR, CHARACTER, VARCHAR, GRAPHIC, VARGRAPHIC, LONG VARCHAR, CLOB, LONG VARGRAPHIC, and DBCLOB

Date and time types

DATE, TIME, and TIMESTAMP

Boolean types

None

Binary types

BLOB

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Guide to configuring an offline sync task for a single table

Guide to configuring an offline read sync task for a full database

For more information about the procedure, see Configure a real-time sync task for a full database.

FAQ

  • Data restoration issues in active/standby synchronization

    In an active/standby disaster recovery configuration for DB2, the secondary database continuously restores data from the primary database using binlogs. A time lag is inherent in this synchronization process. This lag, especially when exacerbated by network latency, means the data on the secondary database may not be a complete, real-time mirror of the primary database. This can result in significant data discrepancies between the two databases.

  • Consistency constraints

    DB2 is a relational database management system (RDBMS) that provides query interfaces with strong consistency. For example, if another data source writes data to the database while a sync task is running, DB2 Reader does not retrieve the updated data because of the database's snapshot feature.

    The preceding description applies to the data synchronization consistency of DB2 Reader in a single-threaded model. DB2 Reader can concurrently extract data based on your configurations. Therefore, strict data consistency cannot be guaranteed.

    When DB2 Reader partitions data based on the splitPk parameter, it initiates multiple concurrent tasks to synchronize the data. These concurrent tasks do not belong to the same read transaction and are separated by time intervals. For raw data, snapshots read by multiple concurrent tasks may be incomplete or inconsistent.

    A technical solution for consistent snapshots in a multi-threaded environment is not available. You can address this issue only from an engineering perspective, which involves trade-offs. Consider the following solutions based on your requirements:

    • Use single-threaded synchronization. In this mode, data is not partitioned. The disadvantage is that the synchronization speed is slow, but data consistency is well-maintained.

    • Disable other data writers to ensure that the source data is static. For example, you can lock tables or disable synchronization to the secondary database. The disadvantage is that this may affect online services.

  • Database encoding issues

    DB2 Reader uses Java Database Connectivity (JDBC) to extract data. JDBC is natively compatible with various encodings and performs encoding conversion at the underlying layer. Therefore, you do not need to specify an encoding for DB2 Reader. DB2 Reader can automatically detect and transcode the encoding.

  • Methods for incremental data synchronization

    DB2 Reader uses JDBC SELECT statements to extract data. Therefore, you can use SELECT…WHERE… to perform incremental data extraction in the following ways:

    • When an online application writes data to the database, a `modify` field is populated with a timestamp that indicates when the change occurred. This includes new, updated, and logically deleted data. For this type of application, you can configure DB2 Reader to add a `WHERE` clause that uses the timestamp of the last synchronization.

    • For new append-only data, you can configure DB2 Reader to add a `WHERE` clause that uses the maximum auto-increment ID from the previous synchronization.

    If your business logic does not use a specific field to distinguish between new and modified data, DB2 Reader cannot perform incremental data synchronization. In this case, only full data synchronization is possible.

  • SQL security

    DB2 Reader provides the querySql parameter that lets you customize SELECT statements for data extraction. DB2 Reader does not perform any security checks on the querySql statement.

Appendix: Script demo and parameter description

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"db2",// The plug-in name.
            "parameter":{
                "password":"",// The password.
                "jdbcUrl":"",// The JDBC connection string for the DB2 database.
                "column":[
                    "id"
                ],
                "where":"",// The filter condition.
                "splitPk":"",// The field used for data partitioning.
                "table":"",// The table name. 
                "username":""// The username.
            },
            "name":"Reader",
            "category":"reader"
        },
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The number of error records.
        },
        "speed":{
             "throttle":true,// If you set throttle to false, the mbps parameter does not take effect and no rate limit is imposed. If you set throttle to true, a rate limit is imposed.
             "concurrent":1, // The number of concurrent jobs.
             "mbps":"12"// The rate limit. 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Reader script parameters

Parameter

Description

Required

Default value

datasource

The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source.

Yes

None

jdbcUrl

The JDBC connection string for the DB2 database. The JDBC URL must follow the official DB2 specifications. The format is jdbc:db2://ip:port/database. You can also add connection attachment control information.

Yes

None

username

The username for the data source.

Yes

None

password

The password for the specified username of the data source.

Yes

None

table

The table from which you want to synchronize data. A job can synchronize data from only one table.

Yes

None

column

The columns that you want to synchronize from the configured table. Use a JSON array to describe the fields. By default, all columns are used. For example, [ * ]:

  • Column pruning is supported. You can export a subset of columns.

  • Column reordering is supported. You can export columns in an order different from the table schema.

  • Constants are supported. You must follow the DB2 SQL syntax. For example, ["id", "1", "'const name'", "null", "upper('abc_lower')", "2.3" , "true"]:

    • id is a regular column name.

    • 1 is an integer constant.

    • 'const name' is a string constant. It must be enclosed in single quotation marks.

    • null is a null pointer.

    • upper('abc_lower') is a function expression.

    • 2.3 is a floating-point number.

    • true is a Boolean value.

  • The column parameter must explicitly specify the set of columns to synchronize. It cannot be empty.

Yes

None

splitPk

When DB2 Reader extracts data, if you specify splitPk, you can use the field represented by splitPk for data partitioning. The data synchronization system then starts concurrent tasks to synchronize the data more efficiently:

  • We recommend that you set splitPk to the primary key of the table. Primary keys are usually evenly distributed, which helps prevent data hotspots in the partitioned data.

  • Currently, splitPk supports data partitioning only for integer fields. Floating-point numbers, strings, dates, and other data types are not supported. If you specify an unsupported data type, DB2 Reader reports an error.

No

""

where

The filter condition. DB2 Reader constructs a SQL statement based on the specified column, table, and where parameters, and then extracts data based on the SQL statement.

In business scenarios, you might want to synchronize data generated on the current day. You can set the where condition to gmt_create>$bizdate. The where condition can be used for efficient incremental synchronization. If you leave this parameter empty, all data in the table is synchronized.

No

None

querySql

In some business scenarios, the where parameter is not sufficient to describe the filter condition. You can use this parameter to customize the filter SQL statement. When you configure this parameter, the data synchronization system ignores the table and column parameters and uses the content of this parameter to filter data.

For example, to synchronize data after a multi-table join, use select a,b from table_a join table_b on table_a.id = table_b.id. When you configure querySql, DB2 Reader ignores the table, column, and where parameters.

No

None

fetchSize

This parameter specifies the number of records to retrieve in each batch from the database server. This value determines the number of network interactions between the data synchronization system and the server, which can significantly improve data extraction performance.

Note

If the value of fetchSize is too large (>2048), an out-of-memory (OOM) error may occur in the data synchronization process.

No

1024

Writer script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"db2",// The plug-in name.
            "parameter":{
                "postSql":[],// The SQL statement to execute after the data synchronization task.
                "password":"",// The password.
                "jdbcUrl":"jdbc:db2://ip:port/database",// The JDBC connection string for the DB2 database.
                "column":[
                    "id"
                ],
                "batchSize":1024,// The number of records to submit in a single batch.
                "table":"",// The table name.
                "username":"",// The username.
                "preSql":[]// The SQL statement to execute before the data synchronization task.
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The number of error records.
        },
        "speed":{
            "throttle":true,// If you set throttle to false, the mbps parameter does not take effect and no rate limit is imposed. If you set throttle to true, a rate limit is imposed.
            "concurrent":1, // The number of concurrent jobs.
            "mbps":"12"// The rate limit. 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer script parameters

Parameter

Description

Required

Default value

jdbcUrl

The JDBC connection string for the DB2 database. The JDBC URL must follow the official DB2 specifications. The format is jdbc:db2://ip:port/database. You can also add connection attachment control information.

Yes

None

username

The username for the data source.

Yes

None

password

The password for the specified username of the data source.

Yes

None

table

The table to which you want to synchronize data.

Yes

None

column

The fields in the destination table to which you want to write data. Separate the fields with commas (,). Example: "column": ["id", "name", "age"]. To write to all columns in order, use an asterisk (*). Example: "column": ["*"] .

Yes

None

preSql

The SQL statement to execute before the data synchronization task. Currently, only one SQL statement can be executed. For example, you can use this to purge old data.

No

None

postSql

The SQL statement to execute after the data synchronization task. In the codeless UI, only one SQL statement is allowed. In the code editor, multiple SQL statements are supported. For example, you can use this to add a timestamp.

No

None

batchSize

The number of records to submit in a single batch. This value can significantly reduce the number of network interactions between the data synchronization system and Db2, and improve overall throughput. If this value is set too high, an OOM error may occur in the data synchronization process.

No

1024