DataWorks provides Oracle Reader and Oracle Writer for you to read data from and write data to Oracle data sources. You can use the codeless user interface (UI) or code editor to configure data synchronization tasks for Oracle data sources. This topic describes the capabilities of synchronizing data from or to Oracle data sources.
Supported Oracle versions
Batch data read and write
Version | Oracle Reader for batch data read | Oracle Writer for batch data write |
Oracle 11.2 or 11gR2 | Supported | Supported |
Oracle 12.1 or 12cR1 | Supported | Supported |
Oracle 12.2 or 12cR2 | Supported (Not supported by features of Oracle of later versions) | Supported (Not supported by features of Oracle of later versions) |
Oracle 18.3 | Supported (Not supported by features of Oracle of later versions) | Supported (Not supported by features of Oracle of later versions) |
Oracle 19.x | Supported (Not supported by features of Oracle of later versions) | Supported (Not supported by features of Oracle of later versions) |
Oracle 21.1 | Supported (Not supported by features of Oracle of later versions) | Supported (Not supported by features of Oracle of later versions) |
Oracle Writer uses the ojdbc6-12.1.1.jar driver.
Real-time data read
Supported Oracle database versions:
Versions later than 11g R2
,12c non cdb
,18c non cdb
,19c non cdb
, and19c cdb
Unsupported Oracle database versions:
12c cdb
and18c cdb
An Oracle database of Oracle 12c or a later version can serve as a container database (CDB) to host multiple PDBs.
Limits
The following encoding formats are supported for data synchronization: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK.
A maximum of 500 GB incremental data can be synchronized from an Oracle instance in real time every day.
Real-time synchronization of data from an Oracle data source in Data Integration is implemented based on the Oracle LogMiner utility that is used to analyze log files.
Data of views can be read during batch synchronization.
Data types
Data type | Oracle Reader for batch data read | Oracle Writer for batch data write | Real-time data read |
NUMBER | Supported | Supported | Supported |
BINARY FLOAT | Supported | Supported | Supported |
BINARY DOUBLE | Supported | Supported | Supported |
CHAR | Supported | Supported | Supported |
NCHAR | Supported | Supported | Supported |
VARCHAR2 | Supported | Supported | Supported |
NVARCHAR2 | Supported | Supported | Supported |
DATE | Supported | Supported | Supported |
TIMESTAMP | Supported | Supported | Supported |
TIMESTAMP WITH TIME ZONE | Supported | Supported | Not supported |
TIMESTAMP WITH LOCAL TIME ZONE | Supported | Supported | Not supported |
CLOB | Supported | Supported | Supported |
BLOB | Supported | Supported | Supported |
RAW | Supported | Supported | Supported |
ROWID | Not supported | Not supported | Supported |
UROWID | Not supported | Not supported | Supported |
FLOAT | Supported | Supported | Supported |
INTERVAL DAY TO SECOND | Not supported | Not supported | Supported |
INTERVAL YEAR TO MONTH | Not supported | Not supported | Supported |
BFILE | Not supported | Not supported | Not supported |
LONG | Not supported | Not supported | Not supported |
LONG RAW | Not supported | Not supported | Not supported |
NCLOB | Supported | Supported | Not supported |
STRUCT | Supported | Supported | Not supported |
User-Defined Types | Not supported | Not supported | Not supported |
AnyType | Not supported | Not supported | Not supported |
AnyData | Not supported | Not supported | Not supported |
AnyDataSet | Not supported | Not supported | Not supported |
XmlType | Not supported | Not supported | Not supported |
Spatial Types | Not supported | Not supported | Not supported |
Media Types | Not supported | Not supported | Not supported |
The following table lists the data type mappings based on which Oracle Reader converts data types.
Category | Oracle data type |
Integer | NUMBER, ROWID, INTEGER, INT, and SMALLINT |
Floating point | NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISION, and REAL |
String | LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING |
Date and time | TIMESTAMP and DATE |
Boolean | BIT and BOOLEAN |
Binary | BLOB, BFILE, RAW, and LONG RAW |
Prepare an Oracle environment before data synchronization
Before you use DataWorks to synchronize data from or to an Oracle data source, you must prepare an Oracle environment. This ensures that a data synchronization task can be configured and can synchronize data from or to the Oracle data source as expected. The following information describes how to prepare an Oracle environment for data synchronization from or to an Oracle data source.
Preparation 1: Check the version of your Oracle database
The supported Oracle database versions vary based on synchronization scenarios. Check whether the version of your Oracle database is supported before you synchronize data from or to the database.
You can execute one of the following statements to check the version of your Oracle database:
Statement 1:
select * from v$version;
Statement 2:
select version from v$instance;
If the version of the Oracle database is
12c
,18c
, or19c
, you must execute the following statement to check whether the database can serve as aCDB
. If the Oracle database can serve as aCDB
, this Oracle database is not supported by real-time synchronization tasks of Data Integration in DataWorks.select name,cdb,open_mode,con_id from v$database;
Preparation 2: Prepare an account that has the required permissions
You must prepare an account that is used to log on to the Oracle database. The account must have the required permissions on the Oracle database.
Create an account. For more information, see Create an account to access an Oracle database.
Grant the required permissions to the account.
You can execute the following statements to grant the required permissions to the account. Replace
Account for data synchronization
with the created account when you execute a specific statement.grant create session to 'Account for data synchronization'; // Authorize the synchronization account to access the database. grant connect to 'Account for data synchronization'; // Authorize the synchronization account to connect to the database. grant select on nls_database_parameters to 'Account for data synchronization'; // Authorize the synchronization account to query the settings of nls_database_parameters. grant select on all_users to 'Account for data synchronization'; // Authorize the synchronization account to query all users in the database. grant select on all_objects to 'Account for data synchronization'; // Authorize the synchronization account to query all objects in the database. grant select on DBA_MVIEWS to 'Account for data synchronization'; // Authorize the synchronization account to check the materialized view of the database. grant select on DBA_MVIEW_LOGS to 'Account for data synchronization'; // Authorize the synchronization account to view the materialized view logs of the database. grant select on DBA_CONSTRAINTS to 'Account for data synchronization'; // Authorize the synchronization account to view the constraints on all tables in the database. grant select on DBA_CONS_COLUMNS to 'Account for data synchronization'; // Authorize the synchronization account to view information about the columns that meet the specified constraints on all tables in the database. grant select on all_tab_cols to 'Account for data synchronization'; // Authorize the synchronization account to view information about columns in tables, views, and clusters of the database. grant select on sys.obj$ to 'Account for data synchronization'; // Authorize the synchronization account to view objects in the database. sys.obj$ indicates an object table that is contained in the data dictionary table. The object table contains all objects. grant select on SYS.COL$ to 'Account for data synchronization'; // Authorize the synchronization account to view definitions of columns in tables of the database. SYS.COL$ stores definitions of columns in tables. grant select on sys.USER$ to 'Account for data synchronization'; // Authorize the synchronization account to view the system table of the database. sys.USER$ indicates a default user session service. grant select on sys.cdef$ to 'Account for data synchronization'; // Authorize the synchronization account to view the system table of the database. grant select on sys.con$ to 'Account for data synchronization'; // Authorize the synchronization account to view the constraints of the database. sys.con$ records the constraints. grant select on all_indexes to 'Account for data synchronization'; // Authorize the synchronization account to view all indexes of the database. grant select on v_$database to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$database view of the database. grant select on V_$ARCHIVE_DEST to 'Account for data synchronization'; // Authorize the synchronization account to check the V_$ARCHIVE_DEST view of the database. grant select on v_$log to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$log view of the database. v_$log displays log information about control files. grant select on v_$logfile to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$logfile view of the database. v_$logfile contains information about redo log files. grant select on v_$archived_log to 'Account for data synchronization'; // Authorize the synchronization account to check the v$archived_log view of the database. v$archived_log contains information about archived logs. grant select on V_$LOGMNR_CONTENTS to 'Account for data synchronization'; // Authorize the synchronization account to check the V_$LOGMNR_CONTENTS view of the database. grant select on DUAL to 'Account for data synchronization'; // Authorize the synchronization account to view the DUAL table of the database. DUAL is a virtual table that contains SELECT syntax rules. In Oracle, only one record is retained in the DUAL table. grant select on v_$parameter to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$parameter view of the database. v$parameter is a dynamic dictionary table that stores the values of parameters in the database. grant select any transaction to 'Account for data synchronization'; // Authorize the synchronization account to view transactions of the database. grant execute on SYS.DBMS_LOGMNR to 'Account for data synchronization'; // Authorize the synchronization account to use the LOGMNR tool. The LOGMNR tool helps you analyze transactions and retrieve lost data. grant alter session to 'Account for data synchronization'; // Authorize the synchronization account to modify connection settings of the database. grant select on dba_objects to 'Account for data synchronization'; // Authorize the synchronization account to view all objects in the database. grant select on v_$standby_log to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$standby_log view of the database. v_$standby_log contains archived logs of the secondary database. grant select on v_$ARCHIVE_GAP to 'Account for data synchronization'; // Authorize the synchronization account to query missing archived logs.
To implement batch synchronization of full data, you must also execute the following statement to grant the query permissions on all tables to the synchronization account:
grant select any table to 'Account for data synchronization';
For an Oracle database of Oracle 12c or a later version, you must execute the following statement to grant the log mining permissions to the synchronization account. The log mining feature is a built-in feature for Oracle databases of versions earlier than 12c. You do not need to execute the following statement in such databases.
grant LOGMINING TO 'Account for data synchronization';
Preparation 3: Enable archive logging and supplemental logging, and switch to another redo log file
Execute the following SQL statements to enable archive logging:
shutdown immediate; startup mount; alter database archivelog; alter database open;
Enable supplemental logging.
SQL statements:
alter database add supplemental log data(primary key) columns; // Enable supplemental logging for primary key columns. alter database add supplemental log data(unique) columns; // Enable supplemental logging for unique index columns.
Switch to another redo log file.
After you enable supplemental logging, you must execute the following statement multiple times to switch to another redo log file. We recommend that you execute the following statement for five times:
alter system switch logfile;
NoteThis ensures that data can be written to the next log file after the current log file is full. Records of performed operations will not be lost. This facilitates data recovery.
Preparation 4: Check character encoding formats of your Oracle database
Execute the following statement to check character encoding formats of the database:
select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
v$nls_parameters stores values of parameters in the database.
NLS_CHARACTERSET indicates a database character set. NLS_NCHAR_CHARACTERSET indicates a national character set. These two sets are used to store data of the respective character type.
The following encoding formats are supported for data synchronization: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK. If the database contains the character encoding formats that are not supported for data synchronization, change the formats before you synchronize data.
Preparation 5: Check the data types of tables in your Oracle database
You can execute the SELECT statement to query the data types of tables in the database. Sample statement that is executed to query the data types of the 'tablename' table:
select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename';
COLUMN_NAME: the name of the column.
DATA_TYPE: the data type of the column.
all_tab_columns: the view that stores information about all columns in tables of the database.
TABLE_NAME: the name of the table that you want to query. When you execute the preceding statement, replace 'tablename' with the name of the table that you want to query.
You can also execute the select * from 'tablename';
statement to query the information about the desired table and obtain the data types of columns in the table.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.
Add a data source
Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Configure a real-time synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a real-time synchronization task in DataStudio.
Configure synchronization settings to implement batch synchronization of all data in a database, real-time synchronization of full data or incremental data in a database, and real-time synchronization of data from sharded tables in a sharded database
For more information about the configuration procedure, see Configure a synchronization task in Data Integration.
Additional information
Data synchronization between primary and secondary databases
A secondary Oracle database can be deployed for disaster recovery. The secondary database continuously synchronizes data from the primary database based on binary logs. Data latency between the primary and secondary databases cannot be prevented. This may result in data inconsistency.
Data consistency control
Oracle is a relational database management system (RDBMS) that supports strong consistency for data queries. A database snapshot is created before a synchronization task starts. Oracle Reader reads data from the database snapshot. Therefore, if new data is written to the database during data synchronization, Oracle Reader cannot obtain the new data.
Data consistency cannot be ensured if you enable Oracle Reader to use parallel threads to read data in a synchronization task.
Oracle Reader shards the source table based on the value of the splitPk parameter and uses parallel threads to read data. These parallel threads belong to different transactions and read data at different points in time. Therefore, the parallel threads observe different snapshots.
Data inconsistencies cannot be prevented if parallel threads are used for a synchronization task. The following workarounds can be used:
Enable Oracle Reader to use a single thread to read data in a synchronization task. This indicates that you do not need to specify a shard key for Oracle Reader. This way, data consistency is ensured, but data is synchronized at low efficiency.
Make sure that no data is written to the source table during data synchronization. This ensures that the data in the source table remains unchanged during data synchronization. For example, you can lock the source table or disable data synchronization between primary and secondary databases. This way, data is efficiently synchronized, but your ongoing services may be interrupted.
Character encoding
Oracle Reader uses Java Database Connectivity (JDBC) to read data. This enables Oracle Reader to automatically convert the encoding format of characters. Therefore, you do not need to specify the encoding format.
Incremental data synchronization
Oracle Reader connects to a database by using JDBC and uses a
SELECT statement with a WHERE clause
to read incremental data.For batch data, incremental add, update, and delete operations (including logically delete operations) are distinguished by timestamps. Specify the WHERE clause based on a specific timestamp. The time indicated by the timestamp must be later than the time indicated by the latest timestamp in the previous synchronization.
For streaming data, specify the WHERE clause based on the ID of a specific record. The ID must be greater than the maximum ID involved in the previous synchronization.
If the data that is added or modified cannot be distinguished, Oracle Reader can read only full data.
Syntax validation
Oracle Reader allows you to specify custom SELECT statements by using the querySql parameter but does not verify the syntax of these statements.
Appendix: Code and parameters
Appendix: Configure a batch synchronization task by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.
Code for Oracle Reader
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "oracle",
"parameter": {
"selectedDatabase": "AUTOTEST",
"indexes": [],
"datasource": "oracle_test",
"envType": 0,
"useSpecialSecret": true,
"column": [
"id"
],
"where": "",
"splitPk": "id",
"encoding": "UTF-8",
"table": "AUTOTEST.table01"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "odps",
"parameter": {
},
"name": "Writer",
"category": "writer"
},
{
"name": "Processor",
"stepType": null,
"category": "processor",
"copies": 1,
"parameter": {
"nodes": [],
"edges": [],
"groups": [],
"version": "2.0"
}
}
],
"setting": {
"executeMode": null,
"errorLimit": {
"record": ""
},
"speed": {
"concurrent": 2,
"throttle": false
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Parameters in code for Oracle Reader
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
selectedDatabase | The schema of the database from which you want to read data. | Yes | No default value |
table | The name of the table from which you want to read data. The name of the table must be in the Note For example, if the selectedDatabase parameter is set to | Yes | No default value |
column | The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is ["*"], which indicates all the columns in the source table.
| Yes | No default value |
splitFactor | The sharding factor, which determines the number of parts into which data to be synchronized is sharded. If you configure parallelism for your batch synchronization task, the number of parts is calculated based on the following formula: Number of parallel threads × Sharding factor. For example, if the number of parallel threads and the sharding factor are 5, the number of parts into which data to be synchronized is sharded is 25. Note We recommend that you specify a sharding factor that ranges from 1 to 100. If you specify a sharding factor that is greater than 100, an out of memory (OOM) error may occur. | No | 5 |
splitMode | The shard mode. Valid values:
Note The splitMode parameter must be used together with the splitPk parameter.
| No | randomSampling |
splitPk | The field that is used for data sharding when Oracle Reader reads data. If you configure this parameter, the source table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This improves data synchronization efficiency.
Note If you use Oracle Reader to read data from a view, you cannot set the splitPk parameter to a field of the ROWID data type. | No | No default value |
where | The WHERE clause. Oracle Reader generates an SQL statement based on the settings of the table, column, and where parameters and uses the statement to read data. For example, you can set this parameter to row_number() in a test.
| No | No default value |
querySql (available only in the code editor) | The SQL statement that is used for refined data filtering. If you configure this parameter, Data Integration filters data based on the value of this parameter. For example, if you want to join multiple tables for data synchronization, you can set this parameter to | No | No default value |
fetchSize | The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects read efficiency. Note If you set this parameter to a value greater than 2048, an OOM error may occur during data synchronization. | No | 1,024 |
Code for Oracle Writer
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"oracle",// The plug-in name.
"parameter":{
"postSql":[],// The SQL statement that you want to execute after the synchronization task is run.
"datasource":"",
"session":[],// The settings of the session to the database.
"column":[// The names of the columns.
"id",
"name"
],
"encoding":"UTF-8",// The encoding format.
"batchSize":1024,// The number of data records to write at a time.
"table":"",// The name of the table.
"preSql":[]// The SQL statement that you want to execute before the synchronization task is run.
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate. Unit: MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for Oracle Writer
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
table | The name of the table to which you want to write data. If the table uses the default schema for the destination database, you must set this parameter to the name of the table to which you want to write data. If the table uses a custom schema, you must configure this parameter in the Schema name.Name of the table to which you want to write data format. | Yes | No default value |
writeMode | The write mode. Valid value: insert into. If a primary key conflict or unique index conflict occurs, data cannot be written to the conflicting rows, and the data that is not written to these rows is regarded as dirty data. | No | insert into |
column | The names of the columns to which you want to write data. Separate the names with commas (,), such as | Yes | No default value |
preSql | The SQL statement that you want to execute before the synchronization task is run. For example, you can set this parameter to the SQL statement that is used to delete outdated data. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor. | No | No default value |
postSql | The SQL statement that you want to execute after the synchronization task is run. For example, you can set this parameter to the SQL statement that is used to add a timestamp. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor. | No | No default value |
batchSize | The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and Oracle and increases throughput. If you set this parameter to an excessively large value, an OOM error may occur during data synchronization. | No | 1,024 |