All Products
Document Center

Configure DB2 Reader

Last Updated: Apr 17, 2018

DB2 Reader plug-in enables data reading from DB2. At the underlying implementation level, DB2 Reader connects to a remote DB2 database through JDBC and runs corresponding SQL statements to select data from the DB2 database.

Specifically, DB2 Reader connects to a remote DB2 database through the JDBC connector. The SELECT SQL query statements are generated and sent to the remote DB2 database based on your configuration. Then, the SQL statements are run and the returned results are assembled into abstract datasets using the custom data types of data integration. Datasets are passed to the downstream writer for processing.

  • DB2 Reader concatenates the table, column, and WHERE information you configured into SQL statements and sends them to the DB2 database.

  • DB2 Reader directly sends the querySql information you configured to the DB2 database.

DB2 Reader supports most of the DB2 data types. Check whether your data type is supported.

DB2 Reader converts DB2 data types as follows:

Category DB2 data type
Floating point decimal, real, double
String char, character, varchar, graphic, vargraphic, long varchar, clob, long vargraphic, dbclob
Date and time date, time, timestamp
Binary blob

Parameter description

Parameter Description Required Default value
datasource Data source name. It must be identical to the data source name added. Adding data source is supported in script mode. Yes None
jdbcUrl Information of the JDBC connection to the DB2 database. In accordance with the DB2 official specification, jdbcUrl in the db2 format is jdbc:db2://ip:port/database, and the connection accessory control information can be filled in. Yes None
username User name for the data source. Yes None
password Password corresponding to the specified user name for the data source. Yes None
table The table selected for synchronization. One job can only synchronize one table. Yes None
column The column name set to be synchronized in the configured table. Field information is described with arrays in JSON. [ * ] indicates all columns by default.
- Column pruning is supported, which means you can select some columns to export.
- Change of column order is supported, which means you can export the columns in an order different from the schema order of the table.
- Constant configuration is supported. You must follow the DB2 SQL syntax format, for example, [“id”, “1”, “‘const name’”, “null”, “upper(‘abc_lower’)”, “2.3” , “true”], where id refers to the ordinary column name, 1 is an integer constant, ‘const name’ is a string constant (note that a pair of single quotation marks must be added), null refers to null pointer, upper(‘abc_lower’) is a function expression, 2.3 is a floating point number, and true is a Boolean value.
- column must displays the column set specified by the user to be synchronized and cannot be blank.
Yes None
splitPk If you specify the splitPk when using the DB2 Reader to extract data, it means that you want to use the fields represented by splitPk for data sharding. Then, the data synchronization system starts concurrent tasks to synchronize data, which greatly improves the efficiency of data synchronization.
- We recommend splitPk users to use table main key which is usually very even and therefore, data hotpoints seldom occur to data shards.
- Currently, splitPk only supports data sharding for integer data types. Other types such as float, string, and date are not supported. If you specify an unsupported data type, DB2 Reader reports an error.
No Null
where Filtering condition. DB2 Reader concatenates an SQL command based on specified column, table, and where conditions and extracts data according to the SQL. In actual business scenarios, the data on the current day are usually required to be synchronized. You can specify the where condition as gmt_create > $bizdate.
The where condition can be used to synchronize incremental business data effectively. If the value is null, it means synchronizing all the information in the table.
No None
querySql In some business scenarios, the where condition is insufficient for filtration. In such cases, you can customize a filter SQL using this configuration item. When this item is configured, the data synchronization system filters data using this configuration item directly instead of such configuration items as table and column. For example, for data synchronization after multi-table join, use select a,b from table_a join table_b on =
When querySql is configured, DB2 Reader directly ignores the configuration of table, column, and where conditions.
No None
fetchSize It defines the pieces of batch data that the plug-in and database server can fetch each time. The value determines the number of network interactions between the data synchronization system and the server, which can greatly improve data extraction performance.
Note: A value greater than 2048 may lead to OOM for data synchronization.
No 1024

Development in wizard mode

Development in wizard mode is unavailable currently.

Development in script mode

Configure a job to synchronously extract data from a DB2 database:

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "db2",
  7. "parameter": {
  8. "jdbcUrl": "jdbc:db2://ip:port/database",
  9. "username": "username",
  10. "password": "password",
  11. "table": "table",
  12. "column": [
  13. "id", "name", "age"
  14. ],
  15. "fetchSize": 1024,
  16. "splitPk": "id",
  17. "where": "id > 100"
  18. }
  19. },
  20. "writer": {
  21. }
  22. }
  23. }

Configure a job to synchronize tasks from a database with custom SQLs to MaxCompute:

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "db2",
  7. "parameter": {
  8. "jdbcUrl": "jdbc:db2://ip:port/database",
  9. "username": "username",
  10. "password": "password",
  11. "querySql": "SELECT * from tableName",
  12. "fetchSize": 1024
  13. }
  14. },
  15. "writer": {
  16. }
  17. }
  18. }

Supplemental instructions

Data restoration of master/slave synchronization

Master/slave synchronization means that DB2 uses a master/slave disaster recovery mode in which the slave database continuously restores data from the master database through binlog. Due to the time difference in master/slave data synchronization, especially in some special situations such as network latency, the restored data in the slave database after synchronization are significantly different from the data of the master database, that is to say, the data synchronized in the slave database are not a full image of the master database at the current time.

Consistency constraints

DB2 is a RDBMS system in terms of data storage, which can provide APIs for querying strong consistency data. For example, if another data writer writes data to the database during a synchronization task, DB2 Reader does not get the newly written data because of the snapshot features of the database. For the snapshot features of databases, see MVCC Wikipedia.

These are the features of data synchronization consistency in the single-threaded model of DB2 Reader. Because DB2 Reader uses concurrent data extraction according to your configuration information, robust data consistency cannot be guaranteed. After DB2 Reader completes data sharding based on splitPk, multiple concurrent tasks are successively enabled to synchronize data. Since multiple concurrent tasks do not belong to the same read transaction and time intervals exist between the concurrent tasks, the data is not complete and consistent data snapshot information.

Currently, the consistency snapshot demands in the multi-threaded model cannot be met technically, which can only be solved from the engineering point of view. The engineering approaches have both advantages and disadvantages. The following solutions are provided for your consideration:

  • Use single-threaded synchronization without data sharding. This is slow but can make sure robust data consistency.

  • Close other data writers to make sure the current data is static. For example, you can lock the table or close slave database synchronization. The disadvantage is that the online businesses may be affected.

Database encoding

DB2 Reader extracts data using JDBC at the underlying level. JDBC is applicable to all types of encodings and can complete transcoding at the underlying level. Therefore, DB2 Reader can identify the encoding and complete transcoding automatically without the need to specify the encoding.

Incremental data synchronization

Since DB2 Reader extracts data using JDBC SELECT statements, you can extract incremental data using the SELECT…WHERE…conditions in the following ways:

  • When database online applications write data into the database, the modify field is filled with the modification timestamp, including addition, update, and deletion (logical deletion). For this type of applications, DB2 Reader only requires the WHERE condition followed by the timestamp of the last synchronization phase.
  • For new streamline data, DB2 Reader requires the WHERE condition followed by the maximum auto-increment ID of the last synchronization phase.

In case that no field is provided for the business to identify the addition or modification of data, DB2 Reader cannot perform incremental data synchronization and can only perform full data synchronization.

SQL security

DB2 Reader provides querySql statements for you to SELECT data by yourself. DB2 Reader conducts no security verification on querySql. The security during use is ensured by the data integration users.