The Oracle data source provides a bidirectional channel for reading data from and writing data to Oracle. You can configure data synchronization tasks using the codeless UI or the code editor. This topic describes the capabilities of DataWorks for Oracle data synchronization.
Supported versions
Offline read and write
Version | Offline read (Oracle Reader) | Offline write (Oracle Writer) |
Oracle 11.2 or 11gR2 | Supported | Supported |
Oracle 12.1 or 12cR1 | Supported | Supported |
Oracle 12.2 or 12cR2 | Supported (New version features are not supported) | Supported (New version features are not supported) |
Oracle 18.3 | Supported (New version features are not supported) | Supported (New version features are not supported) |
Oracle 19.x | Supported (New version features are not supported) | Supported (New version features are not supported) |
Oracle 21.1 | Supported (New version features are not supported) | Supported (New version features are not supported) |
The Oracle Writer plugin uses the ojdbc6-12.1.1.jar driver.
Real-time read
Supported:
11g R2 and later,12c non-CDB,18c non-CDB,19c non-CDB, or19c CDBdatabases.Not supported:
12c CDBand18c CDBdatabases.
A container database (CDB) is a feature introduced in Oracle 12c that hosts multiple pluggable databases (PDBs).
Limits
The character sets of the Oracle instance, NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET, must be AL32UTF8, AL16UTF16, ZHS16GBK, or UTF8.
A single Oracle instance supports real-time synchronization of up to 500 GB of incremental data per day.
When DataWorks reads data from a secondary Oracle database in real time, only physical standby databases are supported. DataWorks pulls only archived logs. This results in higher latency compared to synchronization from a primary database. The latency depends on the generation speed of the archived logs.
Real-time synchronization does not support views. Offline synchronization supports reading data from views.
Real-time reading of Oracle data in DataWorks is implemented using the Oracle LogMiner log analysis tool. You must enable supplemental logging and archived logs on the primary Oracle database.
Index-organized tables are not supported for synchronization.
Tables where all columns are of a Large Object (LOB) type, such as BLOB, CLOB, or NCLOB, are not supported.
Chinese characters in Oracle table names and column names are not supported.
Oracle table names cannot exceed 30 bytes.
Supported field types
Field type | Offline read (Oracle Reader) | Offline write (Oracle Writer) | Real-time read |
NUMBER | Supported | Supported | Support |
BINARY FLOAT | Supported | Supported | Supported |
BINARY DOUBLE | Supported | Supported | Supported |
CHAR | Supported | Supported | Supported |
NCHAR | Supported | Supported | Supported |
VARCHAR2 | Supported | Supported | Support |
NVARCHAR2 | Supported | Supported | Supported |
DATE | Supported | Supported | Supported |
TIMESTAMP | Supported | Supported | Supported |
TIMESTAMP WITH TIME ZONE | Support | Supported | Not supported |
TIMESTAMP WITH LOCAL TIME ZONE | Supported | Supported | Not supported |
CLOB | Supported | Supported | Supported |
BLOB | Supported | Supported | Supported |
RAW | Supported | Supported | Supported |
ROWID | Not supported | Not supported | Support |
UROWID | Not supported | Not supported | Supported |
FLOAT | Supported | Supported | Supported |
INTERVAL DAY TO SECOND | Not supported | Not supported | Supported |
INTERVAL YEAR TO MONTH | Not supported | Not supported | Supported |
BFILE | Not supported | Not supported | Not supported |
LONG | Not supported | Not supported | Not supported |
LONG RAW | Not supported | Not supported | Not supported |
NCLOB | Supported | Supported | Not supported |
STRUCT | Supported | Supported | Not supported |
User-Defined Types | Not supported | Not supported | Not supported |
AnyType | Not supported | Not supported | Not supported |
AnyData | Not supported | Not supported | Not supported |
AnyDataSet | Not supported | Not supported | Not supported |
XmlType | Not supported | Not supported | Not supported |
Spatial Types | Not supported | Not supported | Not supported |
Media Types | Not supported | Not supported | Not supported |
The following table lists the Oracle data type mappings for Oracle Reader.
Type classification | Oracle data type |
Integer types | NUMBER, RAWID, INTEGER, INT, and SMALLINT |
Floating-point types | NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISIOON, and REAL |
String types | LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING |
Date and time types | TIMESTAMP and DATE |
Boolean type | BIT and BOOL |
Binary types | BLOB, BFILE, RAW, and LONG RAW |
Prerequisites: Prepare the Oracle environment
Before you synchronize data in DataWorks, you must prepare the Oracle environment as described in this topic. This preparation ensures that the service runs as expected when you configure and run Oracle data synchronization tasks. The following sections describe the preparation steps.
Confirm the Oracle database version
The supported Oracle database versions vary depending on the synchronization scenario. Before you start, verify that your Oracle database version is supported.
Run either of the following statements to view the Oracle database version.
Statement 1:
select * from v$version;Statement 2:
select version from v$instance;
If the database version is
12cor18c, run the following statement to check whether the database is aCDB. Data Integration real-time sync tasks do not support12cand18cCDBversions of Oracle databases.select name,cdb,open_mode,con_id from v$database;
Create an account and configure permissions
You need to create a database logon account that has the required permissions for Oracle.
Create an account. For more information, see Create an Oracle account.
Configure permissions.
Run the following commands to grant permissions to the account. Replace
'sync_account'with the account that you created.grant create session to 'sync_account'; // Grants the permission to log on to the database. grant connect to 'sync_account'; // Grants the permission to connect to the database. grant select on nls_database_parameters to 'sync_account'; // Grants the permission to query the nls_database_parameters system configuration. grant select on all_users to 'sync_account'; // Grants the permission to query all users in the database. grant select on all_objects to 'sync_account'; // Grants the permission to query all objects in the database. grant select on DBA_MVIEWS to 'sync_account'; // Grants the permission to view materialized views in the database. grant select on DBA_MVIEW_LOGS to 'sync_account'; // Grants the permission to view materialized view logs in the database. grant select on DBA_CONSTRAINTS to 'sync_account'; // Grants the permission to view constraint information for all tables in the database. grant select on DBA_CONS_COLUMNS to 'sync_account'; // Grants the permission to view information about all columns in specified constraints for all tables in the database. grant select on all_tab_cols to 'sync_account'; // Grants the permission to view information about columns in tables, views, and clusters. grant select on sys.obj$ to 'sync_account'; // Grants the permission to view objects in the database. The sys.obj$ table is the base object table in the Oracle data dictionary and stores all Oracle objects. grant select on SYS.COL$ to 'sync_account'; // Grants the permission to view column definition information in database tables. SYS.COL$ stores column definition information. grant select on sys.USER$ to 'sync_account'; // Grants the permission to view system tables. sys.USER$ is the default service for user sessions. grant select on sys.cdef$ to 'sync_account'; // Grants the permission to view system tables. grant select on sys.con$ to 'sync_account'; // Grants the permission to view constraint information. sys.con$ records Oracle constraint information. grant select on all_indexes to 'sync_account'; // Grants the permission to view all indexes in the database. grant select on v_$database to 'sync_account'; // Grants the permission to view the v_$database view. grant select on V_$ARCHIVE_DEST to 'sync_account'; // Grants the permission to view the V_$ARCHIVE_DEST view. grant select on v_$log to 'sync_account'; // Grants the permission to view the v_$log view. The v_$log view displays log file information from the control file. grant select on v_$logfile to 'sync_account'; // Grants the permission to view the v_$logfile view. The v_$logfile view contains information about Redo log files. grant select on v_$archived_log to 'sync_account'; // Grants the permission to view the v$archived_log view. The v$archived_log view contains information about archived logs. grant select on V_$LOGMNR_CONTENTS to 'sync_account'; // Grants the permission to view the V_$LOGMNR_CONTENTS view. grant select on DUAL to 'sync_account'; // Grants the permission to view the DUAL table. DUAL is a virtual table used to conform to the SELECT syntax. In Oracle, DUAL contains only one record. grant select on v_$parameter to 'sync_account'; // Grants the permission to view the v_$parameter view. The v$parameter view is a dynamic dictionary table in Oracle that stores database parameter settings. grant select any transaction to 'sync_account'; // Grants the permission to view any transaction in the database. grant execute on SYS.DBMS_LOGMNR to 'sync_account'; // Grants the permission to use the Logmnr tool. The Logmnr tool helps you analyze transactions and recover lost data. grant alter session to 'sync_account'; // Grants the permission to modify database connections. grant select on dba_objects to 'sync_account'; // Grants the permission to view all objects in the database. grant select on v_$standby_log to 'sync_account'; // Grants the permission to view the v_$standby_log view. The v_$standby_log view contains archived logs from the standby database. grant select on v_$ARCHIVE_GAP to 'sync_account'; // Grants the permission to query for missing archived logs. grant select on sys.ICOL$ to 'sync_account'; // Grants the permission to view objects in the database. The sys.obj$ table is the base object table in the Oracle data dictionary and stores all Oracle objects. grant select on V$THREAD to 'sync_account'; // Grants the permission to view instance thread information. The V$THREAD view is a thread state table in the Oracle dynamic performance views. It stores the thread status of each instance in a RAC cluster and is used to determine instance status and log sources during incremental parsing.To perform offline full data synchronization, run the following command to grant the account the permission to query all tables.
grant select any table to 'sync_account';For Oracle 12c and later, run the following command to grant the account the log mining permission. For versions earlier than Oracle 12c, you do not need to run this command because the log mining feature is built-in.
grant LOGMINING TO 'sync_account';
Enable archived logs and supplemental logging, and switch Redo log files
Enable archived logs by running the following SQL statements.
shutdown immediate; startup mount; alter database archivelog; alter database open;Enable supplemental logging.
You can enable supplemental logging as needed by running the following SQL statements.
// To capture complete data changes during real-time synchronization, enable supplemental logging. You can enable it at the database level by running the following SQL statements: alter database add supplemental log data(primary key) columns; // Enables supplemental logging for primary key columns of the database. alter database add supplemental log data(unique) columns; // Enables supplemental logging for unique index columns of the database. // You can also enable it at the table level: alter schema.table add supplemental log data(primary key) columns; // Enables supplemental logging for primary key columns of a specific table. alter schema.table add supplemental log data(unique) columns; // Enables supplemental logging for unique index columns of a specific table.Switch Redo log files.
After you enable supplemental logging, run the following command multiple times (five times is recommended) to switch the Redo log files.
alter system switch logfile;NoteRunning the command multiple times ensures that when the current log file is full, the system switches to the next one. This prevents the loss of operation records and facilitates data restoration.
Check the database character encoding
Run the following statement to check the character encoding formats of the database:
select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');v$nls_parameters stores the settings for database parameters.
NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET are the database character set and the national character set. They specify the storage class for the two main types of character data in Oracle.
Data Integration supports only the UTF8, AL32UTF8, AL16UTF16, and ZHS16GBK character encodings for data synchronization. If your database uses an unsupported character encoding, you must change the encoding before you can synchronize the data.
Check the data types of database tables
You can use a SELECT statement to query the data types in a database table. The following example shows how to view the data types in the 'tablename' table.
select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename'; COLUMN_NAME: The name of the table column.
DATA_TYPE: The data type of the corresponding column.
all_tab_columns: A view that stores information about all columns in database tables.
TABLE_NAME: The name of the target table to query. When you run the statement, replace 'tablename' with the name of your table.
You can also run select * from 'tablename'; to query all information about the target table and retrieve the data types.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task: Oracle synchronization process guide
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Guide to configuring an offline sync task for a single table
For more information about the procedure, see Configure using the codeless UI and Configure using the code editor.
For a full list of parameters and a script demo for the code editor, see Appendix: Script demo and parameter description.
Guide to configuring a real-time sync task for a single table
For more information about the procedure, see Configure a real-time sync task in DataStudio.
Guide to configuring synchronization for an entire database, including offline, real-time full and incremental, and sharded databases
For more information about the procedure, see Configure a real-time sync task for an entire database.
FAQ
Real-time sync tasks for Oracle, PolarDB, or MySQL report repeated errors
Data restoration issues in active/standby synchronization
This issue occurs when you use an active/standby disaster recovery architecture for Oracle. If the primary database fails and the system switches to the secondary database, the secondary database continuously restores data from the primary database using archived logs. Due to time differences in data synchronization and factors such as network latency, the data on the secondary database may differ significantly from the data on the primary database. Therefore, the data synchronized from the secondary database is not a complete, up-to-date snapshot.
Consistency constraints
Oracle is a relational database management system (RDBMS) that provides strongly consistent data query interfaces. For example, during a sync task, if another data writer writes data to the database, Oracle Reader does not retrieve the new data because of the database snapshot feature.
The consistency described above applies to data synchronization in a single-threaded model of Oracle Reader. If Oracle Reader extracts data concurrently based on your configuration, strong consistency cannot be guaranteed.
When Oracle Reader splits data based on the splitPk parameter, it starts multiple concurrent tasks to synchronize the data. These tasks do not belong to the same read transaction and run at different times. Therefore, the synchronized data is not a complete and consistent data snapshot.
A technical solution for a consistent snapshot across multiple threads is not currently available. You can use an engineering approach to address this issue. The following solutions involve trade-offs. Choose one based on your requirements.
Use single-threaded synchronization without data sharding. This method is slower but ensures consistency.
Disable other data writers to ensure the data is static. For example, you can lock tables or disable synchronization to the secondary database. This may affect online business.
Database encoding issues
Oracle Reader uses Java Database Connectivity (JDBC) to extract data. JDBC is compatible with various encodings and performs encoding conversion at the underlying layer. Therefore, you do not need to specify an encoding for Oracle Reader. It can automatically retrieve and transcode the encoding.
Methods for incremental data synchronization
Oracle Reader uses `SELECT` statements to extract data. You can use
SELECT…WHERE…to perform incremental data extraction in the following ways:When an online application writes to the database, it populates a `modify` field with a timestamp for changes, including additions, updates, and logical deletions. In this scenario, Oracle Reader can add a `WHERE` clause with the timestamp of the previous synchronization phase.
For new stream data, Oracle Reader can add a `WHERE` clause with the maximum auto-increment ID from the previous phase.
If your business logic does not use a field to distinguish between new and modified data, Oracle Reader cannot perform incremental data synchronization. It can only synchronize the full data.
SQL security
Oracle Reader provides the querySql feature, which lets you customize the `SELECT` extraction statement. Oracle Reader does not perform any security checks on the querySql statement.
Appendix: Script demo and parameter description
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "oracle",
"parameter": {
"selectedDatabase": "AUTOTEST",
"indexes": [],
"datasource": "oracle_test",
"envType": 0,
"useSpecialSecret": true,
"column": [
"id"
],
"where": "",
"splitPk": "id",
"encoding": "UTF-8",
"table": "AUTOTEST.table01"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "odps",
"parameter": {
},
"name": "Writer",
"category": "writer"
},
{
"name": "Processor",
"stepType": null,
"category": "processor",
"copies": 1,
"parameter": {
"nodes": [],
"edges": [],
"groups": [],
"version": "2.0"
}
}
],
"setting": {
"executeMode": null,
"errorLimit": {
"record": ""
},
"speed": {
"concurrent": 2,
"throttle": false
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Reader script parameters
Parameter | Description | Required | Default value |
datasource | The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source. | Yes | None |
selectedDatabase | The schema of the database to synchronize. | Yes | None |
table | The name of the table to synchronize. The format must be Note For example, if selectedDatabase is | Yes | None |
column | The names of the columns to synchronize from the specified table. Use a JSON array to describe the fields. By default, all columns are used, for example, ["*"].
| Yes | None |
splitFactor | The splitting factor. This specifies the number of chunks to split the data into for synchronization. If you configure multiple concurrent threads, the data is split into a number of chunks equal to concurrency × splitFactor. For example, if concurrency is 5 and splitFactor is 5, the data is split into 5 × 5 = 25 chunks and processed by 5 concurrent threads. Note The recommended value range is 1 to 100. A value that is too large may cause an out-of-memory (OOM) error. | No | 5 |
splitMode | The splitting mode. Valid values:
Note The splitMode parameter must be used with the splitPk parameter.
| No | randomSampling |
splitPk | When Oracle Reader extracts data, if you specify the splitPk parameter, it means that you want to use the field represented by splitPk for data sharding. As a result, data synchronization starts concurrent tasks, which improves its efficiency.
Note The splitPK field cannot be ROWID for a view. | No | None |
where | The filter condition. Oracle Reader constructs a SQL statement based on the specified column, table, and where conditions, and then extracts data based on that SQL statement. For example, you can specify row_number() as the `where` condition during testing.
| No | None |
querySql (Advanced mode. Not supported in the codeless UI.) | In some business scenarios, the `where` parameter is not sufficient to describe the filter conditions. You can use this parameter to customize the filter SQL. When you configure this parameter, the data synchronization system ignores the table and column configurations and uses the content of this parameter to filter data. For example, to synchronize data after a multi-table join, use | No | None |
fetchSize | This parameter defines the number of data records to fetch in each batch from the database server. This value determines the number of network interactions between the data synchronization system and the server, which can significantly improve data extraction performance. Note A `fetchSize` value that is too large (>2048) may cause an OOM error in the data synchronization process. | No | 1,024 |
Writer script demo
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"oracle",// The plugin name.
"parameter":{
"postSql":[],// The SQL statement to execute after the data synchronization task.
"datasource":"",
"session":[],// The database connection session parameters.
"column":[// The fields.
"id",
"name"
],
"encoding":"UTF-8",// The encoding format.
"batchSize":1024,// The number of records in a batch.
"table":"",// The table name.
"preSql":[]// The SQL statement to execute before the data synchronization task.
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The number of error records.
},
"speed":{
"throttle":true,// If throttle is false, the mbps parameter is ignored and throttling is disabled. If throttle is true, throttling is enabled.
"concurrent":1, // The number of concurrent jobs.
"mbps":"12"// The throttling rate. 1 mbps = 1 MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}Writer script parameters
Parameter | Description | Required | Default value |
datasource | The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source. | Yes | None |
table | The name of the destination table. If the table schema is different from the configured username, specify the table in the schema.table format. | Yes | None |
writeMode | The import mode. Only insert into is supported. If a primary key or unique index conflict occurs, the conflicting rows fail to be written and are recorded as dirty data. | No | insert into |
column | The fields in the destination table to which you want to write data. Separate the fields with commas. Example: | Yes | None |
preSql | The SQL statement to execute before the data synchronization task starts. The codeless UI supports only one SQL statement. The code editor supports multiple SQL statements, for example, to clear old data. | No | None |
postSql | The SQL statement to execute after the data synchronization task is complete. The codeless UI supports only one SQL statement. The code editor supports multiple SQL statements, for example, to add a timestamp. | No | None |
batchSize | The number of records to submit in a batch. A larger value can significantly reduce network interactions between the data synchronization system and Oracle, improving overall throughput. If this value is too large, it may cause an OOM error in the data synchronization process. | No | 1,024 |