ApsaraDB for OceanBase is a financial-grade distributed relational database developed by Alibaba Cloud and Ant Financial. This topic describes how ApsaraDB for OceanBase Reader works, the supported parameters, and how to configure it by using the code editor.

Notice Currently, ApsaraDB for OceanBase Reader only supports exclusive resource groups and does not support the default resource group or custom resource groups. For more information, see Use exclusive resource groups for data integration and Add a custom resource group.

Background

ApsaraDB for OceanBase implements automated and non-disruptive disaster recovery across cities with the Five Data Centers in Three Regions solution. It provides high availability for financial services through conventional hardware. With the online scaling capability, ApsaraDB for OceanBase is a Chinese database that has undergone strict verification in terms of functionality, stability, scalability, and performance.

ApsaraDB for OceanBase Reader allows you to read data from ApsaraDB for OceanBase in Oracle or MySQL mode.

ApsaraDB for OceanBase Reader connects to a remote ApsaraDB for OceanBase database through Java Database Connectivity (JDBC), and runs a SELECT statement to select and read data from the ApsaraDB for OceanBase database.

ApsaraDB for OceanBase Reader connects to a remote ApsaraDB for OceanBase database through the Java client, generates a SELECT statement based on your configurations, and then sends the statement to the database. The ApsaraDB for OceanBase database runs the statement and returns the result. Then, ApsaraDB for OceanBase Reader assembles the returned data to abstract datasets in custom data types supported by Data Integration, and sends the datasets to a writer.
  • ApsaraDB for OceanBase Reader generates a SELECT statement based on the table, column, and where parameters that you have configured, and sends the generated SELECT statement to the ApsaraDB for OceanBase database.
  • If you specify the querySql parameter, ApsaraDB for OceanBase Reader directly sends the value of this parameter to the ApsaraDB for OceanBase database.
Note ApsaraDB for OceanBase supports the Oracle and MySQL modes. The WHERE clause and columns that contain functions must be in compliance with the SQL syntax supported by Oracle or MySQL. Otherwise, the SQL statement may fail to be run.
ApsaraDB for OceanBase Reader accesses an ApsaraDB for OceanBase database through the OceanBase driver. Confirm the compatibility between the driver version and your ApsaraDB for OceanBase database. ApsaraDB for OceanBase Reader uses the following version of the OceanBase database driver:
<dependency>
    <groupId>com.alipay.OceanBase</groupId>
    <artifactId>OceanBase-connector-java</artifactId>
    <version>3.1.0</version>
</dependency>

Parameters

Parameter Description Required Default value
datasource The connection name. It must be identical to the name of the ApsaraDB for OceanBase connection that you added in DataWorks.

You can connect to the ApsaraDB for OceanBase database based on the settings of the jdbcUrl or username parameter.

No None
jdbcUrl The JDBC URL for connecting to the ApsaraDB for OceanBase database. You can specify multiple JDBC URLs for a database. The JDBC URLs are described in a JSON array.

If you specify multiple JDBC URLs, ApsaraDB for OceanBase Reader verifies the connectivity of the URLs in sequence to find a valid URL.

If no URL is valid, ApsaraDB for OceanBase Reader returns an error.
Note The jdbcUrl parameter must be included in the connection parameter.

The value of the jdbcUrl parameter must be in compliance with the standard format supported by ApsaraDB for OceanBase. You can also specify the information of the attachment facility. Example: jdbc:OceanBase://127.0.0.1:3306/database. One of the jdbcUrl parameter and the datasource parameter must be specified.

No None
username The username for connecting to the ApsaraDB for OceanBase database. No None
password The password for connecting to the ApsaraDB for OceanBase database. No None
table The name of the table to be synchronized. ApsaraDB for OceanBase Reader can read data from multiple tables. The tables are described in a JSON array.
If you specify multiple tables, make sure that the tables have the same schema. ApsaraDB for OceanBase Reader does not check whether the tables have the same schema.
Note The table parameter must be included in the connection parameter.
Yes None
column The columns to be synchronized from the source table. The columns are described in a JSON array. The default value is [ * ], which indicates all columns.
  • Column pruning is supported. You can select and export specific columns.
  • Change of the column order is supported. You can export the columns in an order different from that specified in the schema of the table.
  • Constants are supported. Example: '123'.
  • Functions are supported. Example: date('now').
  • The column parameter must explicitly specify a set of columns to be synchronized. The parameter cannot be left empty.
Yes None
splitPk The column used for data sharding when ApsaraDB for OceanBase Reader extracts data. If you specify the splitPk parameter, the table is sharded based on the shard key specified by this parameter. Data Integration then runs concurrent threads to synchronize data. This improves efficiency.
  • We recommend that you set the splitPk parameter to the primary key of the table. Based on the primary key, data can be well distributed to different shards, but not intensively distributed to certain shards.
  • Currently, the splitPk parameter supports data sharding only for integers but not for other data types such as string, floating point, and date. If you specify this parameter to a value of an unsupported type, ApsaraDB for OceanBase Reader returns an error.
  • If you leave the splitPk parameter empty, ApsaraDB for OceanBase Reader extracts the data in the source table through a single thread.
No None
where The WHERE clause. ApsaraDB for OceanBase Reader generates a SELECT statement based on the table, column, and where parameters that you have configured, and uses the generated SELECT statement to select and read data.
For example, you can set the where parameter to limit 10 during testing. To synchronize data generated on the current day, set the where parameter to gmt_create > $bizdate.
  • You can use the WHERE clause to synchronize incremental data.
  • If you do not specify the where parameter or leave it empty, all data is synchronized.
No None
querySql The SELECT statement used for refined data filtering. If you specify this parameter, Data Integration directly filters data based on this parameter.

If you specify the querySql parameter, ApsaraDB for OceanBase Reader ignores the table, column, where, and splitPk parameters that you have configured.

No None
fetchSize The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects reading efficiency.
Note A value greater than 2048 may lead to out of memory (OOM) during the data synchronization process.
No 1024

Configure ApsaraDB for OceanBase Reader by using the codeless UI

Currently, the codeless user interface (UI) is not supported for ApsaraDB for OceanBase Reader.

Configure ApsaraDB for OceanBase Reader by using the code editor

In the following code, a node is configured to read data from an ApsaraDB for OceanBase database.
{
    "type": "job",
    "steps": [
        {
            "stepType": "apsaradb_for_OceanBase", // The reader type.
            "parameter": {
                "datasource": "", // The connection name.
                "where": "",
                "column":[// The columns to be synchronized.
                    "id",
                    "name"
                ],
                "splitPk": ""
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0"// The maximum number of dirty data records allowed.
        },
        "speed": {
            "throttle": false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent": 1,// The maximum number of concurrent threads.
        }
    }
}