This topic describes the data types and parameters supported by Db2 Reader and how to configure it by using the code editor.

Db2 Reader allows you to read data from Db2. Db2 Reader connects to a remote Db2 database and runs a SELECT statement to select and read data from the database.

Specifically, Db2 Reader connects to a remote Db2 database through Java Database Connectivity (JDBC), generates a SELECT statement based on your configurations, and then sends the statement to the database. The Db2 database runs the statement and returns the result. Then, Db2 Reader assembles the returned data to abstract datasets in custom data types supported by Data Integration, and sends the datasets to a writer.

  • Db2 Reader generates the SELECT statement based on the table, column, and where parameters that you have configured, and sends the generated SELECT statement to the Db2 database.
  • If you specify the querySql parameter, Db2 Reader directly sends the value of this parameter to the Db2 database.

Db2 Reader supports most Db2 data types. Make sure that your data types are supported.

The following table lists the data types supported by Db2 Reader.
Category Db2 data type
Integer SMALLINT
Floating point DECIMAL, REAL, and DOUBLE
String CHAR, CHARACTER, VARCHAR, GRAPHIC, VARGRAPHIC, LONG VARCHAR, CLOB, LONG VARGRAPHIC, and DBCLOB
Date and time DATE, TIME, and TIMESTAMP
Boolean N/A
Binary BLOB

Parameters

Parameter Description Required Default value
datasource The connection name. It must be identical to the name of the added connection. You can add connections in the code editor. Yes None
jdbcUrl The JDBC URL for connecting to the Db2 database. In accordance with official Db2 specifications, the URL must be in the jdbc:db2://ip:port/database format. You can also specify the information of the attachment facility. Yes None
username The username for connecting to the database. Yes None
password The password for connecting to the database. Yes None
table The name of the table to be synchronized. You can select only one source table for each sync node. Yes None
column The columns to be synchronized from the source table. The columns are described in a JSON array. The default value is [ * ], which indicates all columns.
  • Column pruning is supported. You can select and export specific columns.
  • Change of the column order is supported. You can export the columns in an order different from that specified in the schema of the table.
  • Constants are supported. The column names must be arranged in compliance with the SQL syntax supported by Db2, for example, ["id", "1", "'const name'", "null", "upper('abc_lower')", "2.3" , "true"].
    • id: a column name.
    • 1: an integer constant.
    • 'const name': a string constant, which is enclosed in single quotation marks (' ').
    • null: a null pointer.
    • upper('abc_lower'): a function expression.
    • 2.3: a floating-point constant.
    • true: a Boolean value.
  • The column parameter must explicitly specify a set of columns to be synchronized. The parameter cannot be left empty.
Yes None
splitPk The field used for data sharding when Db2 Reader extracts data. If you specify the splitPk parameter, the table is sharded based on the shard key specified by this parameter. Data Integration then runs concurrent threads to synchronize data. This improves efficiency.
  • We recommend that you set the splitPk parameter to the primary key of the table. Based on the primary key, data can be well distributed to different shards, but not intensively distributed to certain shards.
  • Currently, the splitPk parameter supports data sharding only for integers but not for other data types such as string, floating point, and date. If you specify this parameter to a column of an unsupported type, Db2 Reader returns an error.
No ""
where The WHERE clause. Db2 Reader generates a SELECT statement based on the table, column, and where parameters that you have configured, and uses the generated SELECT statement to select and read data. For example, set this parameter to gmt_create>$bizdate. You can use the WHERE clause to synchronize incremental data. If you do not specify the where parameter or leave it empty, all data is synchronized. No None
querySql The SELECT statement used for refined data filtering. If you specify this parameter, Data Integration directly filters data based on this parameter.

For example, if you want to join multiple tables for data synchronization, set this parameter to select a,b from table_a join table_b on table_a.id = table_b.id. If you specify the querySql parameter, Db2 Reader ignores the table, column, and where parameters that you have configured.

No None
fetchSize The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects reading efficiency.
Note A value greater than 2048 may lead to out of memory (OOM) during the data synchronization process.
No 1024

Configure Db2 Reader by using the codeless UI

Currently, the codeless user interface (UI) is not supported for Db2 Reader.

Configure Db2 Reader by using the code editor

In the following code, a node is configured to read data from a Db2 database.
{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"db2",// The reader type.
            "parameter":{
                "password":"",// The password for connecting to the database.
                "jdbcUrl":"",// The JDBC URL for connecting to the Db2 database.
                "column":[
                    "id"
                ],
                "where":"",// The WHERE clause.
                "splitPk":"", // The field used for data sharding. If you specify the splitPk parameter, the table is sharded based on the shard key specified by this parameter.
                "table":"",// The name of the table to be synchronized. 
                "username":""// The username for connecting to the database.
            },
            "name":"Reader",
            "category":"reader"
        },
        {// The following template is used to configure Stream Writer. For more information, see the corresponding topic.
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1,// The maximum number of concurrent threads.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Additional instructions

  • Data synchronization between primary and secondary databases

    A secondary Db2 database can be deployed for disaster recovery. The secondary database continuously synchronizes data from the primary database based on binlogs. Especially when network conditions are unfavorable, data latency between the primary and secondary databases is unavoidable, which can lead to data inconsistency.

  • Concurrency control

    Db2 is a relational database management system (RDBMS), which supports strong consistency for data queries. A database snapshot is created before a sync node starts. Db2 Reader reads data from the database snapshot. Therefore, if new data is written to the database during data synchronization, Db2 Reader cannot obtain the new data.

    Data consistency cannot be guaranteed when you enable Db2 Reader to run concurrent threads on a single sync node.

    Db2 Reader shards the table based on the splitPk parameter and runs multiple concurrent threads to synchronize data. These concurrent threads belong to different transactions, and they read data at different times. This means that the concurrent threads observe different snapshots.

    Theoretically, the data inconsistency issue is unavoidable if a single sync node includes multiple threads. However, two workarounds are available:
    • Do not enable concurrent threads on a single sync node. Essentially, do not specify the splitPk parameter. In this way, data consistency is guaranteed although data is synchronized at a low efficiency.
    • Disable writers to make sure that the data is unchanged during data synchronization. For example, lock the table and disable data synchronization between primary and secondary databases. In this way, data is synchronized efficiently but your ongoing services may be interrupted.
  • Character encoding

    Db2 Reader uses JDBC, which can automatically convert the encoding of characters. Therefore, you do not need to specify the encoding format.

  • Incremental data synchronization
    Db2 Reader connects to a database through JDBC and uses a SELECT statement with a WHERE clause to read incremental data in the following ways:
    • For data in batches, incremental add, update, and delete operations (including logical delete operations) are distinguished by timestamps. Specify the WHERE clause based on the timestamp. The timestamp must be later than the latest timestamp in the last synchronization.
    • For streaming data, specify the WHERE clause based on the data record ID. The data record ID must be larger than the maximum ID involved in the last synchronization.

    If incremental data cannot be distinguished, Db2 Reader cannot perform incremental synchronization but can perform full synchronization only.

  • Syntax validation

    Db2 Reader allows you to specify custom SELECT statements by using the querySql parameter but does not verify the syntax of the custom SELECT statements.