This topic describes the data types and parameters that are supported by DM Reader and how to configure DM Reader by using the codeless user interface (UI) and code editor.

Notice DM Reader supports only exclusive resource groups for Data Integration, but not the default resource group or custom resource groups for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration and Create a custom resource group for Data Integration.

Background information

DM Reader reads data from databases such as Dameng (DM), Db2, PPAS, and Sybase. DM Reader is commonly used to read data from relational databases. To enable DM Reader to read data from a relational database, you must register the driver for the relational database.

DM Reader connects to a remote database by using Java Database Connectivity (JDBC), generates an SQL statement based on your configurations, and then sends the statement to the database. The system executes the statement on the database and returns data. Then, DM Reader assembles the returned data into abstract datasets of the data types supported by Data Integration and sends the datasets to a writer.
  • DM Reader generates the SQL statement based on the settings of the table, column, and where parameters and sends the generated statement to the remote database.
  • If you specify the querySql parameter, DM Reader sends the value of this parameter to the remote database.

DM Reader supports most data types of common relational databases, such as numeric and string data types. Make sure that the data types of your database are supported.

Parameters

Parameter Description Required Default value
jdbcUrl The JDBC URL that you can use to connect to the source database. The format must be in accordance with the official specifications of the database. You can also specify the information of the attachment facility. The format varies based on the database type. Data Integration selects the most suitable driver based on the format and uses this driver to read data from the source database.
  • Format for DM databases: jdbc:dm://IP address:Port number/database
  • Format for Db2 databases: jdbc:db2://IP address:Port number/database
  • Format for PPAS databases: jdbc:edb://IP address:Port number/database
You can use the following method to enable DM Reader to support a new type of database:
  • Go to the directory of DM Reader. In the directory, ${DATAX_HOME} indicates the main directory of Data Integration.
  • Open the plugin.json file in the directory of DM Reader and add the driver of the database to the drivers array in the file. When a synchronization node runs, DM Reader dynamically selects the most suitable database driver to connect to the database.
{
    "name": "rdbmsreader",
    "class": "com.alibaba.datax.plugin.reader.rdbmsreader.RdbmsReader",
    "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.",
    "developer": "alibaba",
    "drivers": [
        "dm.jdbc.driver.DmDriver",
        "com.ibm.db2.jcc.DB2Driver",
        "com.sybase.jdbc3.jdbc.SybDriver",
        "com.edb.Driver"
    ]
}
```
- Add the package of the driver to the libs subdirectory in the directory of DB Reader. 
```
$tree
.
|-- libs
|   |-- Dm7JdbcDriver16.jar
|   |-- commons-collections-3.0.jar
|   |-- commons-io-2.4.jar
|   |-- commons-lang3-3.3.2.jar
|   |-- commons-math3-3.1.1.jar
|   |-- datax-common-0.0.1-SNAPSHOT.jar
|   |-- datax-service-face-1.0.23-20160120.024328-1.jar
|   |-- db2jcc4.jar
|   |-- druid-1.0.15.jar
|   |-- edb-jdbc16.jar
|   |-- fastjson-1.1.46.sec01.jar
|   |-- guava-r05.jar
|   |-- hamcrest-core-1.3.jar
|   |-- jconn3-1.0.0-SNAPSHOT.jar
|   |-- logback-classic-1.0.13.jar
|   |-- logback-core-1.0.13.jar
|   |-- plugin-rdbms-util-0.0.1-SNAPSHOT.jar
|   `-- slf4j-api-1.7.10.jar
|-- plugin.json
|-- plugin_job_template.json
`-- rdbmsreader-0.0.1-SNAPSHOT.jar
Yes No default value
username The username that is used to connect to the source database. Yes No default value
password The password that is used to connect to the source database. Yes No default value
table The name of the table from which you want to read data. Yes No default value
column The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [ * ], which indicates all the columns in the source table.
  • You can select specific columns to read.
  • The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table.
  • Constants are supported. The column names must be arranged in the JSON format, such as ["id","1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"].
    • id: a column name.
    • 1: an integer constant.
    • 'bazhen.csy': a string constant.
    • null: a null pointer.
    • to_char(a + 1): a function expression.
    • 2.3: a floating-point constant.
    • true: a Boolean value.
  • The column parameter must explicitly specify all the columns from which you want to read data. The parameter cannot be left empty.
Yes No default value
splitPk The field that is used for data sharding when DM Reader reads data. If you specify this parameter, the table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This way, data can be synchronized more efficiently.
  • We recommend that you set the splitPk parameter to the name of the primary key column of the table. Data can be evenly distributed to different shards based on the primary key column, instead of being intensively distributed only to specific shards.
  • The splitPk parameter supports sharding only for data of integer data types. If you set this parameter to a field of an unsupported data type, such as a string, floating point, or date data type, DM Reader returns an error.
  • If you do not specify the splitPk parameter, DM Reader uses a single thread to read all the data in the source table.
No No default value
where The WHERE clause. DM Reader generates an SQL statement based on the settings of the column, table, and where parameters and uses the generated statement to read data. For example, when you perform a test, you can set the where parameter to limit 10.
To read the data that is generated on the current day, you can set the where parameter to gmt_create > $bizdate.
  • You can use the WHERE clause to read incremental data.
  • If the where parameter is not provided or is left empty, DM Reader reads all data.
No No default value
querySql The SQL statement that is used for refined data filtering. If you specify this parameter, data is filtered based only on the value of this parameter.

For example, if you want to join multiple tables for data synchronization, set this parameter to select a,b from table_a join table_b on table_a.id = table_b.id. If you specify this parameter, DM Reader ignores the settings of the column, table, and where parameters.

No No default value
fetchSize The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the source database and affects read efficiency.
Note If you set this parameter to a value greater than 2048, an out of memory (OOM) error may occur during data synchronization.
No 1,024

Configure DM Reader by using the codeless UI

This method is not supported.

Configure DM Reader by using the code editor

In the following code, a synchronization node is configured to read data from a DM database. For more information about how to configure a synchronization node by using the code editor, see Create a sync node by using the code editor.
{
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
                  "concurrent":1 // The maximum number of parallel threads. 
                  "mbps":"12",// The maximum transmission rate.

        }
    },
    "steps": [
        {
            "category": "reader",
            "name": "Reader",
            "parameter": {
                "connection": [
                    {
                        "jdbcUrl": [
                            "jdbc:dm://ip:port/database"
                        ],
                        "table": [
                            "table"
                        ]
                    }
                ],
                "username": "username",
                "password": "password",
                "table": "table",
                "column": [
                    "*"
                ],
                "preSql": [
                    "delete from XXX;"
                ]
            },
            "stepType": "rdbms"
        },
        {
            "category": "writer",
            "name": "Writer",
            "parameter": {},
            "stepType": "stream"
        }
    ],
    "type": "job",
    "version": "2.0"
}