Read from and write to ApsaraDB for OceanBase - DataWorks

Supported versions and tenant modes

The ApsaraDB for OceanBase Reader and Writer plug-ins support OceanBase 2.x, 3.x, and 4.x. The following table shows which capabilities are available for each version and tenant mode.

Capability	OceanBase version	Tenant mode
Batch read	2.x, 3.x, 4.x	Oracle and MySQL
Batch write	2.x, 3.x, 4.x	Oracle and MySQL
Real-time read	2.x, 3.x, 4.x (V3.0 or later for entire database)	MySQL only

Prerequisites

Before you begin, make sure you have:

Added the CIDR block of your serverless resource group or exclusive resource group for Data Integration to the ApsaraDB for OceanBase instance whitelist. See Configure an IP address whitelist.
Created a database account with the required permissions on the ApsaraDB for OceanBase instance. See Create an account.

The required permissions depend on your synchronization mode:

Sync mode	Required permissions
Batch read	If you use partition-based reads, the account needs system permissions.
Batch write	At least `insert into` permissions. Additional permissions may be required based on the SQL statements in `preSql` and `postSql`.
Real-time read	Binlog feature enabled. See Binlog-related operations (Alibaba Cloud instances) and Binlog-related operations (OceanBase Cloud instances).

Add a data source

Add ApsaraDB for OceanBase as a data source in DataWorks before configuring any synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.

Configure a data synchronization task

Choose the synchronization mode that fits your use case. Use batch synchronization if you need periodic bulk transfers or incremental loads. Use real-time synchronization if you need low-latency change capture from OceanBase to a downstream system.

For real-time synchronization of an entire database, the ApsaraDB for OceanBase instance must be version V3.0 or later. Real-time synchronization also requires MySQL tenant mode.

Sync mode	Supported sources	Configuration guide
Batch synchronization (single table)	All data sources supported by Data Integration	Configure a task in the codeless UI
Real-time synchronization (single table)	Kafka	Configure a real-time full-database synchronization task
Real-time synchronization (entire database)	MySQL	Configure a real-time full-database synchronization task

Limitations

Batch read

ApsaraDB for OceanBase supports two tenant modes: Oracle and MySQL. The where clause and function columns in the column parameter must use syntax compatible with your instance's tenant mode.
Reading from views is supported.
Do not modify data while a batch synchronization task is running. Concurrent modifications can cause data duplication or data loss.
Partition-based reads require the account to have system permissions.

Batch write

Use the batch write method. It buffers rows and submits a single write request after reaching the configured row count, reducing round trips to the database.
The preSql and postSql syntax must be compatible with the tenant mode of your instance.

Real-time read

Only the MySQL tenant mode is supported.
Data sources that use a connection string are not supported for full-database real-time synchronization.
ApsaraDB for OceanBase is a distributed relational database that consolidates multiple physically distributed databases into a single logical database. When synchronizing in real time to AnalyticDB for MySQL, only a single physical database is supported — logical databases are not.

Appendix: Script mode parameters

Use the code editor to configure batch synchronization tasks as JSON scripts. All examples use the unified script format. For format requirements, see Use the code editor.

Reader

Script example

{
    "type": "job",
    "steps": [
        {
            "stepType": "apsaradb_for_OceanBase",
            "parameter": {
                "datasource": "",
                "column": [
                    "id",
                    "name"
                ],
                "where": "",
                "splitPk": ""
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    }
}

Reader parameters

Parameter	Required	Default	Description
`datasource`	Yes	None	The ApsaraDB for OceanBase data source name. Use this parameter or the `jdbcUrl` + `username` + `password` combination.
`jdbcUrl`	No	None	The JDBC URL of the source database. Specify as a JSON array to configure multiple URLs — the plug-in tries them in order until a connection succeeds. If all connections fail, the task fails. Must be nested within `connection`. Format: `jdbc:oceanbase://127.0.0.1:3306/database`.
`username`	No	None	The database account username.
`password`	No	None	The database account password.
`table`	Yes	None	The tables to read from, specified as a JSON array. All tables must share the same schema. Must be nested within `connection`.
`column`	Yes	None	The columns to read, specified as a JSON array. Set to `[*]` to read all columns. Supports column subsets, any column order, constant values (for example, `'123'`), and function columns (for example, `date('now')`). Cannot be blank.
`splitPk`	No	Blank	The column used for data sharding. Enables concurrent reads, which improves performance. Use the primary key for even data distribution. Supports integer data types only — other types cause an error. Leave blank to read with a single thread.
`where`	No	None	A filter condition appended to the SQL query. Leave blank to run a full table synchronization. Supports incremental synchronization — for example, set to `gmt_create>$bizdate`.
`querySql`	No	None	A custom SQL statement for complex filter conditions. When set, the plug-in ignores `table`, `column`, `where`, and `splitPk`.
`fetchSize`	No	1,024	The number of records fetched per batch. Larger values reduce network round trips. Values above 2,048 may cause out-of-memory (OOM) errors.

Writer

Script example

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "apsaradb_for_OceanBase",
            "parameter": {
                "datasource": "The data source name",
                "table": "apsaradb_for_OceanBase_table",
                "column": [
                    "id",
                    "name"
                ],
                "obWriteMode": "insert",
                "preSql": [
                    "delete from @table where db_id = -1"
                ],
                "postSql": [
                    "update @table set db_modify_time = now() where db_id = 1"
                ]
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Writer parameters

Parameter	Required	Default	Description
`datasource`	No	None	The ApsaraDB for OceanBase data source name.
`jdbcUrl`	Yes	None	The JDBC URL of the destination database. Specify only one URL — writing to multiple primary databases is not supported. Must be nested within `connection`. Format: `jdbc:oceanbase://127.0.0.1:3306/database`.
`username`	Yes	None	The database account username.
`password`	Yes	None	The database account password.
`table`	Yes	None	The destination table name. Only one table is supported. Must be nested within `connection`.
`column`	Yes	None	The destination columns to write to, specified as a JSON array. Example: `["id", "name", "age"]`. Cannot be blank.
`obWriteMode`	No	`insert`	The write mode. Valid values: `insert` — uses `insert into ...` (conflicts cause the row to fail); `update` — uses `... on duplicate key update ...` (MySQL tenant mode; conflicts update the row); `merge` — uses `merge into ... when matched then update ...` (Oracle tenant mode; conflicts update the row).
`onClauseColumns`	No	None	Oracle tenant mode only. Required when `obWriteMode` is `merge`; otherwise the plug-in falls back to `insert` mode. Specify primary key or unique constraint columns, separated by commas. Example: `ID,C1`.
`obUpdateColumns`	No	All columns	The columns to update when a write conflict occurs. Applies only when `obWriteMode` is `merge` or `update`. Separate multiple columns with commas. Example: `c2,c3`.
`preSql`	No	None	An SQL statement to run before the task writes data. Use `@table` to reference the destination table name.
`postSql`	No	None	An SQL statement to run after the task writes data.
`batchSize`	No	1,024	The number of records submitted per batch. Larger values reduce network round trips. Values above 2,048 may cause out-of-memory (OOM) errors.