This topic describes the data types and parameters that relational database management system (RDBMS) Reader supports and how to configure it by using the code editor.

Notice RDBMS Reader supports only exclusive resource groups for Data Integration, but not the default resource group or custom resource groups. For more information, see Exclusive resource group for Data Integration and Create a custom resource group for Data Integration.

Background information

RDBMS Reader allows you to read data from an RDBMS database. RDBMS Reader connects to a remote RDBMS database and executes a SELECT statement to select and read data from the database. RDBMS Reader can read data from databases such as Dameng (DM), Db2, PPAS, and Sybase databases. If you want RDBMS Reader to read data from a common relational database, register the driver for the corresponding database type.

RDBMS Reader connects to a remote RDBMS database by using Java Database Connectivity (JDBC), generates a SELECT statement based on your configurations, and then sends the statement to the database. The RDBMS database executes the statement and returns the result. Then, RDBMS Reader assembles the returned data to abstract datasets in custom data types that Data Integration supports, and sends the datasets to a writer.
  • RDBMS Reader generates the SQL statement based on the table, column, and where parameters that you have configured, and sends the generated SQL statement to the RDBMS database.
  • If you specify the querySql parameter, RDBMS Reader sends the value of this parameter to the RDBMS database.

RDBMS Reader supports most data types of a common relational database, such as numbers and characters. Make sure that your data types are supported.

Parameters

Parameter Description Required Default value
jdbcUrl The JDBC URL that you can use to connect to the RDBMS database. The format must be in accordance with the official RDBMS specifications. You can also specify the information of the attachment facility. The format varies based on the database type. Data Integration selects an appropriate driver for data reading based on the format.
  • Format for DM databases: jdbc:dm://ip:port/database
  • Format for Db2 databases: jdbc:db2://ip:port/database
  • Format for PPAS databases: jdbc:edb://ip:port/database
You can enable RDBMS Reader to support a new database by using the following method:
  • Go to the RDBMS Reader directory. In the directory, ${DATAX_HOME} indicates the main directory of Data Integration.
  • Open the plugin.json file in the RDBMS Reader directory, and add the driver of your database to the drivers array in the file. RDBMS Reader dynamically selects the appropriate database driver to connect to the database when nodes are run.
{
    "name": "rdbmsreader",
    "class": "com.alibaba.datax.plugin.reader.rdbmsreader.RdbmsReader",
    "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.",
    "developer": "alibaba",
    "drivers": [
        "dm.jdbc.driver.DmDriver",
        "com.ibm.db2.jcc.DB2Driver",
        "com.sybase.jdbc3.jdbc.SybDriver",
        "com.edb.Driver"
    ]
}
```
- Add the package of the driver to the libs directory in the RDBMS Reader directory.
```
$tree
.
|-- libs
|   |-- Dm7JdbcDriver16.jar
|   |-- commons-collections-3.0.jar
|   |-- commons-io-2.4.jar
|   |-- commons-lang3-3.3.2.jar
|   |-- commons-math3-3.1.1.jar
|   |-- datax-common-0.0.1-SNAPSHOT.jar
|   |-- datax-service-face-1.0.23-20160120.024328-1.jar
|   |-- db2jcc4.jar
|   |-- druid-1.0.15.jar
|   |-- edb-jdbc16.jar
|   |-- fastjson-1.1.46.sec01.jar
|   |-- guava-r05.jar
|   |-- hamcrest-core-1.3.jar
|   |-- jconn3-1.0.0-SNAPSHOT.jar
|   |-- logback-classic-1.0.13.jar
|   |-- logback-core-1.0.13.jar
|   |-- plugin-rdbms-util-0.0.1-SNAPSHOT.jar
|   `-- slf4j-api-1.7.10.jar
|-- plugin.json
|-- plugin_job_template.json
`-- rdbmsreader-0.0.1-SNAPSHOT.jar
Yes N/A
username The username that you can use to connect to the database. Yes N/A
password The password that you can use to connect to the database. Yes N/A
table The name of the source table. Yes N/A
column The columns to be synchronized from the source table. The columns are described in a JSON array. The default value is [ * ], which indicates all columns.
  • Column pruning is supported. You can select specific columns to export.
  • The column order can be changed. You can configure RDBMS Reader to export the specified columns in an order different from that specified in the schema of the table.
  • Constants are supported. The column names must be arranged in JSON format, for example, ["id","1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"].
    • id: a column name.
    • 1: an integer constant.
    • 'bazhen.csy': a string constant.
    • null: a null pointer.
    • to_char(a + 1): a function expression.
    • 2.3: a floating-point constant.
    • true: a Boolean value.
  • The column parameter must explicitly specify a set of columns to be synchronized. The parameter cannot be left empty.
Yes N/A
splitPk The field that is used for data sharding when RDBMS Reader reads data. If you specify the splitPk parameter, the table is sharded based on the shard key that is specified by this parameter. Data Integration then runs concurrent threads to synchronize data. This way, data can be synchronized more efficiently.
  • We recommend that you set the splitPk parameter to the primary key of the table. Based on the primary key, data can be well distributed to different shards, but not intensively distributed to specific shards.
  • The splitPk parameter supports data sharding only for integers but not for other data types such as string, floating point, and date. If you specify this parameter to a column of an unsupported type, RDBMS Reader returns an error.
  • If you do not specify the splitPk parameter or leave it empty, RDBMS Reader synchronizes data by using a single thread.
No N/A
where The WHERE clause. RDBMS Reader generates a SELECT statement based on the table, column, and where parameters that you have configured, and uses the generated SELECT statement to select and read data. For example, set this parameter to limit 10.
To synchronize data that is generated on the current day, set the where parameter to gmt_create > $bizdate.
  • You can use the WHERE clause to synchronize incremental data.
  • If you do not specify the where parameter or leave it empty, all data is read.
No N/A
querySql The SELECT statement that is used for refined data filtering. If you specify this parameter, Data Integration filters data based on this parameter.

For example, if you need to join multiple tables for data synchronization, set this parameter to select a,b from table_a join table_b on table_a.id = table_b.id. If you specify the querySql parameter, RDBMS Reader ignores the table, column, and where parameters that you have configured.

No N/A
fetchSize The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects reading efficiency.
Note A value greater than 2048 may lead to out of memory (OOM) during the data synchronization process.
No 1,024

Configure RDBMS Reader by using the codeless UI

The codeless user interface (UI) is not supported for RDBMS Reader.

Configure RDBMS Reader by using the code editor

The following example shows how to configure a sync node to read data from an RDBMS database. For more information, see Create a sync node by using the code editor.
{
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "concurrent": 1,
            "throttle": false
        }
    },
    "steps": [
        {
            "category": "reader",
            "name": "Reader",
            "parameter": {
                "connection": [
                    {
                        "jdbcUrl": [
                            "jdbc:dm://ip:port/database"
                        ],
                        "table": [
                            "table"
                        ]
                    }
                ],
                "username": "username",
                "password": "password",
                "table": "table",
                "column": [
                    "*"
                ],
                "preSql": [
                    "delete from XXX;"
                ]
            },
            "stepType": "rdbms"
        },
        {
            "category": "writer",
            "name": "Writer",
            "parameter": {},
            "stepType": "stream"
        }
    ],
    "type": "job",
    "version": "2.0"
}