The ApsaraDB for OceanBase data source lets you read from and write to ApsaraDB for OceanBase in DataWorks. Configure it to run batch and real-time data synchronization tasks.
Supported versions and tenant modes
The ApsaraDB for OceanBase Reader and Writer plug-ins support OceanBase 2.x, 3.x, and 4.x. The following table shows which capabilities are available for each version and tenant mode.
| Capability | OceanBase version | Tenant mode |
|---|---|---|
| Batch read | 2.x, 3.x, 4.x | Oracle and MySQL |
| Batch write | 2.x, 3.x, 4.x | Oracle and MySQL |
| Real-time read | 2.x, 3.x, 4.x (V3.0 or later for entire database) | MySQL only |
Prerequisites
Before you begin, make sure you have:
-
Added the CIDR block of your serverless resource group or exclusive resource group for Data Integration to the ApsaraDB for OceanBase instance whitelist. See Configure an IP address whitelist.
-
Created a database account with the required permissions on the ApsaraDB for OceanBase instance. See Create an account.
The required permissions depend on your synchronization mode:
| Sync mode | Required permissions |
|---|---|
| Batch read | If you use partition-based reads, the account needs system permissions. |
| Batch write | At least insert into permissions. Additional permissions may be required based on the SQL statements in preSql and postSql. |
| Real-time read | Binlog feature enabled. See Binlog-related operations (Alibaba Cloud instances) and Binlog-related operations (OceanBase Cloud instances). |
Add a data source
Add ApsaraDB for OceanBase as a data source in DataWorks before configuring any synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Configure a data synchronization task
Choose the synchronization mode that fits your use case. Use batch synchronization if you need periodic bulk transfers or incremental loads. Use real-time synchronization if you need low-latency change capture from OceanBase to a downstream system.
For real-time synchronization of an entire database, the ApsaraDB for OceanBase instance must be version V3.0 or later. Real-time synchronization also requires MySQL tenant mode.
| Sync mode | Supported sources | Configuration guide |
|---|---|---|
| Batch synchronization (single table) | All data sources supported by Data Integration | Configure a task in the codeless UI |
| Real-time synchronization (single table) | Kafka | Configure a real-time full-database synchronization task |
| Real-time synchronization (entire database) | MySQL | Configure a real-time full-database synchronization task |
Limitations
Batch read
-
ApsaraDB for OceanBase supports two tenant modes: Oracle and MySQL. The
whereclause and function columns in thecolumnparameter must use syntax compatible with your instance's tenant mode. -
Reading from views is supported.
-
Do not modify data while a batch synchronization task is running. Concurrent modifications can cause data duplication or data loss.
-
Partition-based reads require the account to have system permissions.
Batch write
-
Use the batch write method. It buffers rows and submits a single write request after reaching the configured row count, reducing round trips to the database.
-
The
preSqlandpostSqlsyntax must be compatible with the tenant mode of your instance.
Real-time read
-
Only the MySQL tenant mode is supported.
-
Data sources that use a connection string are not supported for full-database real-time synchronization.
-
ApsaraDB for OceanBase is a distributed relational database that consolidates multiple physically distributed databases into a single logical database. When synchronizing in real time to AnalyticDB for MySQL, only a single physical database is supported — logical databases are not.
Appendix: Script mode parameters
Use the code editor to configure batch synchronization tasks as JSON scripts. All examples use the unified script format. For format requirements, see Use the code editor.
Reader
Script example
{
"type": "job",
"steps": [
{
"stepType": "apsaradb_for_OceanBase",
"parameter": {
"datasource": "",
"column": [
"id",
"name"
],
"where": "",
"splitPk": ""
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": false,
"fieldDelimiter": ","
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
}
}
Reader parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
datasource |
Yes | None | The ApsaraDB for OceanBase data source name. Use this parameter or the jdbcUrl + username + password combination. |
jdbcUrl |
No | None | The JDBC URL of the source database. Specify as a JSON array to configure multiple URLs — the plug-in tries them in order until a connection succeeds. If all connections fail, the task fails. Must be nested within connection. Format: jdbc:oceanbase://127.0.0.1:3306/database. |
username |
No | None | The database account username. |
password |
No | None | The database account password. |
table |
Yes | None | The tables to read from, specified as a JSON array. All tables must share the same schema. Must be nested within connection. |
column |
Yes | None | The columns to read, specified as a JSON array. Set to [*] to read all columns. Supports column subsets, any column order, constant values (for example, '123'), and function columns (for example, date('now')). Cannot be blank. |
splitPk |
No | Blank | The column used for data sharding. Enables concurrent reads, which improves performance. Use the primary key for even data distribution. Supports integer data types only — other types cause an error. Leave blank to read with a single thread. |
where |
No | None | A filter condition appended to the SQL query. Leave blank to run a full table synchronization. Supports incremental synchronization — for example, set to gmt_create>$bizdate. |
querySql |
No | None | A custom SQL statement for complex filter conditions. When set, the plug-in ignores table, column, where, and splitPk. |
fetchSize |
No | 1,024 | The number of records fetched per batch. Larger values reduce network round trips. Values above 2,048 may cause out-of-memory (OOM) errors. |
Writer
Script example
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "apsaradb_for_OceanBase",
"parameter": {
"datasource": "The data source name",
"table": "apsaradb_for_OceanBase_table",
"column": [
"id",
"name"
],
"obWriteMode": "insert",
"preSql": [
"delete from @table where db_id = -1"
],
"postSql": [
"update @table set db_modify_time = now() where db_id = 1"
]
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Writer parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
datasource |
No | None | The ApsaraDB for OceanBase data source name. |
jdbcUrl |
Yes | None | The JDBC URL of the destination database. Specify only one URL — writing to multiple primary databases is not supported. Must be nested within connection. Format: jdbc:oceanbase://127.0.0.1:3306/database. |
username |
Yes | None | The database account username. |
password |
Yes | None | The database account password. |
table |
Yes | None | The destination table name. Only one table is supported. Must be nested within connection. |
column |
Yes | None | The destination columns to write to, specified as a JSON array. Example: ["id", "name", "age"]. Cannot be blank. |
obWriteMode |
No | insert |
The write mode. Valid values: insert — uses insert into ... (conflicts cause the row to fail); update — uses ... on duplicate key update ... (MySQL tenant mode; conflicts update the row); merge — uses merge into ... when matched then update ... (Oracle tenant mode; conflicts update the row). |
onClauseColumns |
No | None | Oracle tenant mode only. Required when obWriteMode is merge; otherwise the plug-in falls back to insert mode. Specify primary key or unique constraint columns, separated by commas. Example: ID,C1. |
obUpdateColumns |
No | All columns | The columns to update when a write conflict occurs. Applies only when obWriteMode is merge or update. Separate multiple columns with commas. Example: c2,c3. |
preSql |
No | None | An SQL statement to run before the task writes data. Use @table to reference the destination table name. |
postSql |
No | None | An SQL statement to run after the task writes data. |
batchSize |
No | 1,024 | The number of records submitted per batch. Larger values reduce network round trips. Values above 2,048 may cause out-of-memory (OOM) errors. |