DataWorks provides Oracle Reader and Oracle Writer to read from and write to Oracle data sources. You can use the codeless user interface (UI) or code editor to configure data synchronization tasks for Oracle data sources. This topic describes the supported data synchronization capabilities for Oracle data sources.
Supported versions
Batch data read and write
Version | Batch data read (Oracle Reader) | Batch data write (Oracle Writer) |
Oracle 11.2 or 11gR2 | Supported | Supported |
Oracle 12.1 or 12cR1 | Supported | Supported |
Oracle 12.2 or 12cR2 | Supported (without new features) | Supported (without new features) |
Oracle 18.3 | Supported (without new features) | Supported (without new features) |
Oracle 19.x | Supported (without new features) | Supported (without new features) |
Oracle 21.1 | Supported (without new features) | Supported (without new features) |
The Oracle Writer plug-in uses the ojdbc6-12.1.1.jar driver.
Real-time data read
Supported versions:
11g R2 and later,12c non-CDB,18c non-CDB,19c non-CDB, or19c CDB.Unsupported versions:
12c CDBand18c CDB.To use an Oracle database as a source for a real-time full-database synchronization task, you must enable the archive log feature. For more information, see Prepare an Oracle environment.
A container database (CDB) is a new feature in Oracle 12c and later. A CDB can host multiple pluggable databases (PDBs).
Limitations
The
NLS_CHARACTERSETandNLS_NCHAR_CHARACTERSETof the Oracle instance must be one of the following:AL32UTF8,AL16UTF16,ZHS16GBK, orUTF8.For real-time synchronization, a single Oracle instance supports up to 500 GB of incremental data per day.
DataWorks supports real-time reads only from physical Oracle secondary databases. This method pulls archive logs, which causes higher latency than synchronizing from a primary database. The latency depends on the archive log generation speed.
While real-time synchronization does not support views, batch synchronization supports reading from them.
DataWorks uses the Oracle LogMiner utility for real-time data reads. To synchronize data from either a primary or secondary database, you must enable supplemental logging and archive logs on the primary database.
Index-organized tables are not supported.
Tables containing only LOB-type columns (
BLOB,CLOB, orNCLOB) are not supported.Oracle table names and column names that contain Chinese characters are not supported.
Oracle table names cannot exceed 30 bytes.
Supported data types
Type | Batch read (Oracle Reader) | Batch write (Oracle Writer) | Real-time read |
NUMBER | Supported | Supported | Supported |
BINARY FLOAT | Supported | Supported | Supported |
BINARY DOUBLE | Supported | Supported | Supported |
CHAR | Supported | Supported | Supported |
NCHAR | Supported | Supported | Supported |
VARCHAR2 | Supported | Supported | Supported |
NVARCHAR2 | Supported | Supported | Supported |
DATE | Supported | Supported | Supported |
TIMESTAMP | Supported | Supported | Supported |
TIMESTAMP WITH TIME ZONE | Supported | Supported | Not supported |
TIMESTAMP WITH LOCAL TIME ZONE | Supported | Supported | Not supported |
CLOB | Supported | Supported | Supported |
BLOB | Supported | Supported | Supported |
RAW | Supported | Supported | Supported |
ROWID | Not supported | Not supported | Supported |
UROWID | Not supported | Not supported | Supported |
FLOAT | Supported | Supported | Supported |
INTERVAL DAY TO SECOND | Not supported | Not supported | Supported |
INTERVAL YEAR TO MONTH | Not supported | Not supported | Supported |
BFILE | Not supported | Not supported | Not supported |
LONG | Not supported | Not supported | Not supported |
LONG RAW | Not supported | Not supported | Not supported |
NCLOB | Supported | Supported | Not supported |
STRUCT | Supported | Supported | Not supported |
user-defined types | Not supported | Not supported | Not supported |
AnyType | Not supported | Not supported | Not supported |
AnyData | Not supported | Not supported | Not supported |
AnyDataSet | Not supported | Not supported | Not supported |
XmlType | Not supported | Not supported | Not supported |
spatial types | Not supported | Not supported | Not supported |
media types | Not supported | Not supported | Not supported |
The following table categorizes the Oracle data types for Oracle Reader.
Category | Oracle data type |
integer | NUMBER, RAWID, INTEGER, INT, and SMALLINT |
floating point | NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISION, and REAL |
string | LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING |
date and time | TIMESTAMP and DATE |
boolean | BIT and BOOL |
binary | BLOB, BFILE, RAW, and LONG RAW |
Prepare an Oracle environment
To synchronize data from Oracle using DataWorks, you must first prepare your environment. This topic outlines the required preparations to ensure that data synchronization tasks run as expected.
Check the Oracle database version
The supported Oracle database versions vary by synchronization scenario. Before synchronizing data, verify that your Oracle database version is supported.
To check your Oracle database version, run either of the following statements.
Statement 1:
select * from v$version;Statement 2:
select version from v$instance;
If your Oracle database version is
12cor18c, you need to use the following statement to confirm whether the database is acdbdatabase. DataWorks Data Integration real-time synchronization tasks do not currently support Oraclecdbdatabases of version12cor18c.select name,cdb,open_mode,con_id from v$database;
Create an account and grant privileges
Create a database account with the required privileges in Oracle.
For details, see Create an Oracle account.
Grant privileges.
Run the following commands to grant the required privileges to the account. In the following statements, replace
'synchronization account'with the account you created.grant create session to 'synchronization account'; // Grants the privilege to log on to the database. grant connect to 'synchronization account'; // Grants the privilege to connect to the database. grant select on nls_database_parameters to 'synchronization account'; // Grants the privilege to query the nls_database_parameters system configuration. grant select on all_users to 'synchronization account'; // Grants the privilege to query all users in the database. grant select on all_objects to 'synchronization account'; // Grants the privilege to query all objects in the database. grant select on DBA_MVIEWS to 'synchronization account'; // Grants the privilege to view the materialized views of the database. grant select on DBA_MVIEW_LOGS to 'synchronization account'; // Grants the privilege to view the materialized view logs of the database. grant select on DBA_CONSTRAINTS to 'synchronization account'; // Grants the privilege to view constraint information for all tables in the database. grant select on DBA_CONS_COLUMNS to 'synchronization account'; // Grants the privilege to view information about columns in specified constraints for all tables in the database. grant select on all_tab_cols to 'synchronization account'; // Grants the privilege to view information about columns in tables, views, and clusters. grant select on sys.obj$ to 'synchronization account'; // Grants the privilege to view objects in the database. The sys.obj$ table is a base table in the Oracle data dictionary that stores all Oracle objects. grant select on SYS.COL$ to 'synchronization account'; // Grants the privilege to view column definitions in database tables. SYS.COL$ stores column definition information. grant select on sys.USER$ to 'synchronization account'; // Grants the privilege to view the sys.USER$ system table, which contains details about database users. grant select on sys.cdef$ to 'synchronization account'; // Grants the privilege to view a system table of the database. grant select on sys.con$ to 'synchronization account'; // Grants the privilege to view constraint information. sys.con$ records Oracle constraint information. grant select on all_indexes to 'synchronization account'; // Grants the privilege to view all indexes in the database. grant select on v_$database to 'synchronization account'; // Grants the privilege to view the v_$database view. grant select on V_$ARCHIVE_DEST to 'synchronization account'; // Grants the privilege to view the V_$ARCHIVE_DEST view. grant select on v_$log to 'synchronization account'; // Grants the privilege to view the v_$log view, which displays log file information from the control file. grant select on v_$logfile to 'synchronization account'; // Grants the privilege to view the v_$logfile view, which contains information about redo log files. grant select on v_$archived_log to 'synchronization account'; // Grants the privilege to view the v$archived_log view, which contains information about archived logs. grant select on V_$LOGMNR_CONTENTS to 'synchronization account'; // Grants the privilege to view the V_$LOGMNR_CONTENTS view. grant select on DUAL to 'synchronization account'; // Grants the privilege to view the DUAL table. DUAL is a virtual table used to conform to SELECT syntax rules. In Oracle, DUAL contains only one record. grant select on v_$parameter to 'synchronization account'; // Grants the privilege to view the v_$parameter view. v$parameter is a dynamic data dictionary table that stores database parameter settings. grant select any transaction to 'synchronization account'; // Grants the privilege to view any transaction in the database. grant execute on SYS.DBMS_LOGMNR to 'synchronization account'; // Grants the privilege to use the LogMiner tool, which helps you analyze transactions and recover lost data. grant alter session to 'synchronization account'; // Grants the privilege to modify the database session. grant select on dba_objects to 'synchronization account'; // Grants the privilege to view all objects in the database. grant select on v_$standby_log to 'synchronization account'; // Grants the privilege to view the v_$standby_log view, which contains information about archived logs of the standby database. grant select on v_$ARCHIVE_GAP to 'synchronization account'; // Grants the privilege to query for missing archived logs. grant select on sys.ICOL$ to 'synchronization account'; // Grants the privilege to view information about index-organized table columns. grant select on V_$THREAD to 'synchronization account'; // Grants the privilege to view instance thread information. The V_$THREAD view is a thread status table in Oracle dynamic performance views. It stores the thread status of each instance in a RAC cluster and is used during incremental parsing to determine the instance status and log source.To perform offline full synchronization, also run the following statement to grant the synchronization account the privilege to query all tables.
grant select any table to 'synchronization account';For Oracle 12c and later versions, run the following statement to grant log mining privileges. This command is not required for versions earlier than Oracle 12c, as log mining is a built-in feature.
grant LOGMINING TO 'synchronization account';
Enable logging and switch redo log files
To enable archive logging, run the following SQL statements:
shutdown immediate; startup mount; alter database archivelog; alter database open;Enable supplemental logging.
Enable supplemental logging at the database or table level to meet your business requirements:
// Enable database-level supplemental logging to capture all data changes for real-time synchronization. alter database add supplemental log data(primary key) columns; // Enables supplemental logging for the primary key columns of the database. alter database add supplemental log data(unique) columns; // Enables supplemental logging for the unique index columns of the database. // Alternatively, enable table-level supplemental logging: alter schema.table add supplemental log data(primary key) columns; // Enables supplemental logging for the primary key columns of a specified table. alter schema.table add supplemental log data(unique) columns; // Enables supplemental logging for the unique index columns of a specified table.Switch redo log files.
After enabling supplemental logging, run the following command multiple times (five times is recommended) to switch the redo log files.
alter system switch logfile;NoteRunning this command multiple times ensures that the new supplemental logging settings are applied to all redo log files, which is critical for complete data capture and recovery.
Check the database character encoding
Run the following command to check the character encoding of your database.
select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');The v$nls_parameters view stores database parameter settings.
NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET are the database character set and national character set, respectively. They define the encoding for character data in Oracle.
DataWorks data synchronization supports only the UTF8, AL32UTF8, AL16UTF16, and ZHS16GBK character encodings. If your database uses an unsupported character encoding, change it before performing data synchronization.
Check table data types
Use a SELECT statement to query the data types of a table. The following example shows how to query the data types for a table named 'tablename'.
select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename'; COLUMN_NAME: the name of a column in the table.
DATA_TYPE: the data type of the corresponding column.
all_tab_columns: a view that stores information about all columns in the tables of the database.
TABLE_NAME: the name of the table to query. When you run this statement, replace 'tablename' with the name of your table.
You can also execute select * from 'tablename'; to query all information from the target table and obtain the data types.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.
Oracle data synchronization
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Single-table batch synchronization
See Codeless UI Configuration and Script Mode Configuration.
For a full list of parameters and a sample script, see Appendix: Sample Script and Parameters.
Single-table real-time synchronization
See Configure a real-time synchronization task in DataStudio.
Full-database synchronization
FAQ
Data recovery issues during primary-secondary synchronization
In a disaster recovery setup, a secondary database continuously restores data from the primary database by using binary logs. Due to the time lag inherent in synchronization, especially with network latency, the data on the secondary database can differ significantly from the primary database. This means the synchronized data is not a complete, real-time snapshot.
Consistency constraints
Oracle is a relational database management system (RDBMS) that provides strong consistency for queries. For example, when a synchronization task runs, Oracle Reader reads from a database snapshot. Due to this snapshot isolation, Oracle Reader does not retrieve new data written to the database after the task has started.
A single-threaded model maintains this consistency. If you configure Oracle Reader to extract data in parallel, strong consistency cannot be guaranteed.
When Oracle Reader performs sharding based on the splitPk parameter, it launches multiple parallel tasks to complete data synchronization. These tasks are not part of the same read transaction and are separated by time intervals. As a result, the synchronized data is not a complete, consistent snapshot.
Achieving a consistent snapshot across multiple threads is not technically feasible. The following engineering solutions offer trade-offs. Choose the one that best fits your needs.
Use single-threaded synchronization without sharding. This approach ensures consistency but is slower.
Ensure the data is static by preventing other writers from accessing it. For example, you can use a table lock or pause synchronization to the secondary database. However, this may impact your online services.
Database encoding
Oracle Reader uses Java Database Connectivity (JDBC) to extract data. JDBC handles various character encodings and performs conversions automatically. Therefore, you do not need to specify an encoding.
How incremental data synchronization works
Oracle Reader uses JDBC SELECT statements to extract data. You can perform incremental data synchronization using a
SELECT…WHERE…clause in one of the following ways:For online applications that populate a timestamp column for new, updated, or logical deletes, you can configure the WHERE clause to filter records based on the timestamp of the last synchronization.
For append-only data, you can use the WHERE clause to filter records with an auto-increment ID greater than the maximum ID from the previous synchronization.
If your business logic does not provide a column to distinguish new or modified data, Oracle Reader cannot perform incremental data synchronization and must synchronize the full data instead.
SQL security
The Oracle Reader querySql feature lets you define a custom SELECT statement for data extraction. Oracle Reader does not perform any security validation on the statement you provide in querySql.
Appendix: Script examples and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Script mode configuration. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script example
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "oracle",
"parameter": {
"selectedDatabase": "AUTOTEST",
"indexes": [],
"datasource": "oracle_test",
"envType": 0,
"useSpecialSecret": true,
"column": [
"id"
],
"where": "",
"splitPk": "id",
"encoding": "UTF-8",
"table": "AUTOTEST.table01"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "odps",
"parameter": {
},
"name": "Writer",
"category": "writer"
},
{
"name": "Processor",
"stepType": null,
"category": "processor",
"copies": 1,
"parameter": {
"nodes": [],
"edges": [],
"groups": [],
"version": "2.0"
}
}
],
"setting": {
"executeMode": null,
"errorLimit": {
"record": ""
},
"speed": {
"concurrent": 2,
"throttle": false
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Reader parameters
Parameter | Description | Required | Default |
datasource | The name of the data source. This name must match the name of the data source that you added in the code editor. | Yes | None |
selectedDatabase | The schema of the database from which you want to read data. | Yes | None |
table | The name of the table from which you want to read data. The table name must be in the Note For example, if | Yes | None |
column | The columns to read. To read all columns, specify
| Yes | None |
splitFactor | The sharding factor, which specifies the number of shards for data synchronization. If you configure parallel processing, the number of shards is calculated by using the following formula: Number of parallel threads × sharding factor. For example, if the number of parallel threads is 5 and the sharding factor is 5, the data is divided into 25 shards and processed by 5 parallel threads. Note A value from 1 to 100 is recommended. A value greater than 100 may cause an out-of-memory (OOM) error. | No | 5 |
splitMode | The shard mode. Valid values:
Note The
| No | randomSampling |
splitPk | When Oracle Reader extracts data, if you specify splitPk, it means that you want to use the field represented by splitPk for data sharding. As a result, data synchronization starts parallel tasks, which can improve its efficiency.
Note If you are reading data from a view, you cannot set the | No | None |
where | The WHERE clause used to filter data. Reader constructs an SQL query based on the
| No | None |
querySql (available only in the code editor) | In some scenarios, the where configuration is insufficient to describe the filter conditions. You can use this parameter to define a custom SQL query. When you configure this parameter, the data synchronization system ignores configurations such as table and column and directly uses the content of this parameter to filter the data. For example, to synchronize data from a multi-table join, use a query such as | No | None |
fetchSize | The number of records to fetch from the database server in each batch. This parameter determines the number of network interactions between Data Integration and the database server and can significantly improve data extraction performance. Note If you set this parameter to a value greater than 2,048, an out-of-memory (OOM) error may occur during data synchronization. | No | 1,024 |
Writer script example
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"oracle",// The plugin name.
"parameter":{
"postSql":[],// The SQL statement to execute after the task completes.
"datasource":"",
"session":[],// The parameters for the database connection session.
"column":[// The destination columns.
"id",
"name"
],
"encoding":"UTF-8",// The encoding format.
"batchSize":1024,// The number of records to submit in a batch.
"table":"",// The destination table name.
"preSql":[]// The SQL statement to execute before the task starts.
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The number of error records.
},
"speed":{
"throttle":true,// Enables or disables throttling. If false, the mbps parameter is ignored.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum rate for throttling. Unit: MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}Writer parameters
Parameter | Description | Required | Default |
datasource | The name of the data source. This name must match the name of the data source that you added in the code editor. | Yes | None |
table | The name of the destination table. If the schema of the table is different from the username specified in the data source configuration, you must specify the table name in the | Yes | None |
writeMode | The write mode. Only | No | insert into |
column | The destination columns to which you want to write data. Specify the columns as a JSON array. For example: | Yes | None |
preSql | The SQL statement to execute before the data synchronization task starts. You can execute only one SQL statement in the codeless UI but multiple SQL statements in the code editor. For example, you can specify an SQL statement to clear existing data. | No | None |
postSql | The SQL statement to execute after the data synchronization task is complete. You can execute only one SQL statement in the codeless UI but multiple SQL statements in the code editor. For example, you can specify an SQL statement to add a timestamp. | No | None |
batchSize | The number of records to write in each batch. This parameter can significantly reduce the number of network interactions between Data Integration and Oracle and improve the overall throughput. An excessively large value may cause an OOM error. | No | 1,024 |