The OceanBase data source supports reading data from and writing data to OceanBase. You can use an OceanBase data source to configure a data synchronization task. This topic describes the data synchronization capabilities that DataWorks provides for OceanBase.
Supported versions
OceanBase Reader and OceanBase Writer support the following OceanBase versions for reading and writing offline data.
OceanBase 2.x
OceanBase 3.x
OceanBase 4.x
Limits
Offline read
OceanBase supports the Oracle and MySQL tenant modes. When you configure the where filter condition and the function columns in the column parameter, the configuration must comply with the SQL syntax of the corresponding tenant mode. Otherwise, the SQL statement may fail to be executed.
You can read data from views.
Do not modify data that is being synchronized during an offline read from OceanBase. Otherwise, data quality issues, such as data duplication or data loss, may occur.
If the data source is configured with Read By Partition, the account used to access the data source must have system permissions.
Offline write
The account used for the task requires at least the permissions to execute the insert into... statement. Other permissions may be required depending on the statements that you specify in the preSql and postSql parameters.
Data is written in batch mode. A write request is initiated after the number of rows reaches a specified threshold.
OceanBase supports the Oracle and MySQL tenant modes. The statements that you specify in the preSql and postSql parameters must comply with the SQL syntax of the corresponding tenant mode. Otherwise, the SQL statements may fail to be executed.
Real-time read
OceanBase is a distributed relational database that integrates data from multiple physically distributed databases into a single logical database. However, when you synchronize data from OceanBase to AnalyticDB for MySQL in real time, you can synchronize data from only a single physical database. You cannot synchronize data from a logical database.
Data sources added using a connection string are not supported for real-time tasks that synchronize an entire database.
The database version must be 3.0 or later for a real-time sync task.
Preparations before data synchronization
Before you synchronize data in DataWorks, you must prepare the OceanBase environment. This ensures that the OceanBase data synchronization task can be configured and run as expected. The following sections describe the required preparations.
Configure a whitelist
Add the VPC CIDR block of the serverless resource group or exclusive resource group for Data Integration to the whitelist of your OceanBase instance. For more information, see Add a whitelist.
Create an account and grant permissions
Create a database logon account. This account must have the required operation permissions on OceanBase. For more information, see Create an account.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Configure a batch sync task for a single table
For more information about the procedure, see Configure a task in the codeless UI and Configure a task in the code editor.
For information about all parameters and a script demo for the code editor, see Appendix: Script demo and parameter descriptions.
Configure a real-time sync task for an entire database
For more information about the procedure, see Configure a real-time sync task in DataStudio.
Configure a full and incremental (real-time) read sync task for a single table or an entire database
For more information about the procedure, see Configure a real-time sync task for an entire database.
Appendix: Script demo and parameter descriptions
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script demo
{
"type": "job",
"steps": [
{
"stepType": "apsaradb_for_OceanBase", // Plugin name
"parameter": {
"datasource": "", // Data source name
"where": "",
"column": [ // Fields
"id",
"name"
],
"splitPk": ""
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": false,
"fieldDelimiter": ","
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0" // Number of error records
},
"speed": {
"throttle": true, // Specifies whether to enable throttling. If set to true, throttling is enabled. If set to false, throttling is disabled and the mbps parameter does not take effect.
"concurrent": 1, // Number of concurrent jobs
"mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
}
}
}Reader script parameters
Parameter | Description | Required | Default value |
datasource | The name of the ApsaraDB For OceanBase data source that you added. You can use this parameter if your DataWorks version supports adding ApsaraDB For OceanBase data sources. You can configure the connection using either the jdbcUrl or username parameter. | Yes | None |
jdbcUrl | The JDBC connection information of the peer database. Use a JSON array to specify multiple connection addresses for a database. If you configure multiple addresses, ApsaraDB For OceanBase Reader probes the connectivity of the IP addresses in sequence until a valid IP address is found. If all connections fail, ApsaraDB For OceanBase Reader reports an error. Note The jdbcUrl parameter must be included in the connection configuration unit. The jdbcUrl format must comply with the official ApsaraDB For OceanBase specifications. You can also specify connection attachment control information. Example: | No | None |
username | The username for the data source. | No | None |
password | The password for the specified username. | No | None |
table | The tables to synchronize. Use a JSON array to specify multiple tables. If you configure multiple tables, make sure that the tables have the same schema structure. ApsaraDB For OceanBase Reader does not check whether the tables have a unified logical structure. Note The table parameter must be included in the connection configuration unit. | Yes | None |
column | The set of columns to synchronize from the configured tables. Use a JSON array to describe the fields. By default, all columns are used. Example: [ * ].
| Yes | None |
splitPk | When ApsaraDB for OceanBase Reader extracts data, if you specify the splitPk parameter, the field specified in splitPk is used for data sharding. Data Integration then runs parallel threads to read the data. This way, data can be synchronized more efficiently.
| No | Empty |
where | ApsaraDB for OceanBase Reader concatenates an SQL statement based on the specified column, table, and where conditions, and extracts data based on the SQL statement. For example, for testing purposes, you can set the where condition to limit 10. In actual business scenarios, you typically synchronize data for the current day and set the where condition to
| No | None |
querySql | In some business scenarios, the where parameter is not sufficient to describe the filter conditions. You can use this parameter to customize a filter SQL statement. If you configure this parameter, the data synchronization system ignores the tables, columns, and splitPk parameters and uses the content of this parameter to filter data. When you configure querySql, ApsaraDB For OceanBase Reader ignores the table, column, where, and splitPk parameters. | No | None |
fetchSize | This parameter specifies the number of data records to retrieve in each batch. This value determines the number of network interactions between Data Integration and the server and can significantly improve data extraction performance. Note A fetchSize value greater than 2048 may cause an out-of-memory (OOM) error for the data synchronization process. | No | 1,024 |
Writer script demo
{
"type":"job",
"version":"2.0",// Version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"apsaradb_for_OceanBase",// Plugin name.
"parameter":{
"datasource": "Data source name",
"column": [// Fields.
"id",
"name"
],
"table": "apsaradb_for_OceanBase_table",// Table name.
"preSql": [ // The SQL statement to execute before the data synchronization task runs.
"delete from @table where db_id = -1"
],
"postSql": [// The SQL statement to execute after the data synchronization task runs.
"update @table set db_modify_time = now() where db_id = 1"
],
"obWriteMode": "insert",
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// Number of error records.
},
"speed":{
"throttle":true,// Specifies whether to enable throttling. If set to true, throttling is enabled. If set to false, throttling is disabled and the mbps parameter does not take effect.
"concurrent":1, // Number of concurrent jobs.
"mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}Writer script parameters
Parameter | Description | Required | Default value |
datasource | The name of the ApsaraDB For OceanBase data source that you added. You can use this parameter if your DataWorks version supports adding ApsaraDB For OceanBase data sources. You can configure the connection using either the jdbcUrl or username parameter. | No | None |
jdbcUrl | The JDBC connection information of the peer database. The jdbcUrl parameter is included in the connection configuration unit.
| Yes | None |
username | The username for the data source. | Yes | None |
password | The password for the specified username. | Yes | None |
table | The name of the table to which you want to write data. Use a JSON array to specify the table name. Note The table parameter must be included in the connection configuration unit. | Yes | None |
column | The fields in the destination table to which you want to write data. Separate the fields with commas (,). Example: Note You must specify the column parameter. It cannot be empty. | Yes | None |
obWriteMode | Controls the mode used to write data to the destination table. This parameter is optional.
| No | insert |
onClauseColumns | Note This parameter is used in Oracle tenant mode. It is required when Set this parameter to the primary key field or a unique constraint field. To specify multiple fields, separate them with commas (,). Example: | No | None |
obUpdateColumns | Note This parameter takes effect when The fields to update when a data write conflict occurs. To specify multiple fields, separate them with commas (,). Example: | No | All fields |
preSql | The standard statement to execute before data is written to the destination table. If the SQL statement needs to operate on a table, use | No | None |
postSql | The standard statement to execute after data is written to the destination table. | No | None |
batchSize | The number of records in a batch commit. This value can significantly reduce the number of network interactions between the data synchronization system and the server, and improve the overall throughput. Note A large fetchSize value (greater than 2048) may cause an out-of-memory (OOM) error during data synchronization. | No | 1,024 |