Oracle data synchronization capabilities in DataWorks - DataWorks

The Oracle data source lets you read data from and write data to Oracle. You can configure data synchronization tasks using the codeless UI or the code editor. This topic describes the capabilities of DataWorks for Oracle data synchronization.

Supported versions

Offline read and write

Version	Offline read (Oracle Reader)	Offline write (Oracle Writer)
Oracle 11.2 or 11gR2	Supported	Support
Oracle 12.1 or 12cR1	Support	Support
Oracle 12.2 or 12cR2	Supported (New features are not supported)	Supported (New features are not supported)
Oracle 18.3	Supported (New features are not supported)	Supported (New features are not supported)
Oracle 19.x	Supported (New features are not supported)	Supported (New features are not supported)
Oracle 21.1	Supported (New features are not supported)	Supported (New features are not supported)

Important

The Oracle Writer plugin uses the ojdbc6-12.1.1.jar driver.

Real-time read

Supported:
Databases with versions later than 11g R2, 12c non cdb, 18c non cdb, 19c non cdb, or 19c cdb.
Not supported:
12c cdb and 18c cdb databases.

Note

A container database (CDB) is a feature introduced in Oracle Database 12c and later versions. A CDB can contain multiple pluggable databases (PDBs).

Limits

The NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET character sets of the Oracle instance must be one of the following: AL32UTF8, AL16UTF16, ZHS16GBK, or UTF8.
A single Oracle instance supports real-time synchronization of up to 500 GB of incremental data per day.
When DataWorks reads data from a secondary Oracle database in real time, only physical secondary databases are supported. DataWorks pulls only archived logs. Therefore, the latency is higher than synchronization from a primary database. The latency depends on the generation speed of the archived logs.
Real-time synchronization does not support views. Offline synchronization supports reading data from views.
Real-time data reading from Oracle in DataWorks is based on the Oracle LogMiner log analysis tool. To use this feature, you must enable supplemental logging and archived logs for the primary or secondary database in the primary Oracle database.
Index-organized tables cannot be synchronized.
Tables in which all columns are of a Large Object (LOB) type, such as BLOB, CLOB, or NCLOB, are not supported.
Chinese characters are not supported in Oracle table names and column names.
Oracle table names cannot exceed 30 bytes.

Supported field types

Field type	Offline read (Oracle Reader)	Offline write (Oracle Writer)	Real-time read
NUMBER	Support	Support	Support
BINARY FLOAT	Supported	Supported	Support
BINARY DOUBLE	Support	Support	Supported
CHAR	Support	Support	Supported
NCHAR	Support	Support	Support
VARCHAR2	Support	Support	Support
NVARCHAR2	Support	Supported	Support
DATE	Support	Support	Supported
TIMESTAMP	Supported	Support	Support
TIMESTAMP WITH TIME ZONE	Support	Support	Not supported
TIMESTAMP WITH LOCAL TIME ZONE	Support	Supported	Not supported
CLOB	Support	Support	Supported
BLOB	Supported	Support	Supported
RAW	Support	Support	Support
ROWID	Not supported	Not supported	Support
UROWID	Not supported	Not supported	Support
FLOAT	Support	Support	Supported
INTERVAL DAY TO SECOND	Not supported	Not supported	Support
INTERVAL YEAR TO MONTH	Not supported	Not supported	Support
BFILE	Not supported	Not supported	Not supported
LONG	Not supported	Not supported	Not supported
LONG RAW	Not supported	Not supported	Not supported
NCLOB	Support	Support	Not supported
STRUCT	Support	Supported	Not supported
User-Defined Types	Not supported	Not supported	Not supported
AnyType	Not supported	Not supported	Not supported
AnyData	Not supported	Not supported	Not supported
AnyDataSet	Not supported	Not supported	Not supported
XmlType	Not supported	Not supported	Not supported
Spatial Types	Not supported	Not supported	Not supported
Media Types	Not supported	Not supported	Not supported

The following table lists the Oracle data types and their corresponding classifications in Oracle Reader.

Type classification	Oracle data type
Integer	NUMBER, RAWID, INTEGER, INT, and SMALLINT
Floating-point	NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISIOON, and REAL
String	LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING
Date and time	TIMESTAMP and DATE
Boolean	BIT and BOOL
Binary class	BLOB, BFILE, RAW, and LONG RAW

Prerequisites: Prepare the Oracle environment

Before you can use DataWorks to synchronize data, you must prepare the Oracle environment as described in this topic. This preparation ensures that the Oracle data synchronization tasks can be configured and run properly in DataWorks. The following sections describe how to prepare the Oracle environment.

Confirm the Oracle database version

The supported Oracle database versions vary based on the synchronization scenario. Before you synchronize data, check whether your Oracle database version is supported.

You can run either of the following statements to view the version of your Oracle database.
- Statement 1:
```
select * from v$version;
```
- Statement 2:
```
select version from v$instance;
```
If the database version is 12c or 18c, run the following statement to check whether it is a cdb database. Real-time sync tasks in Data Integration do not support 12c and 18c cdb Oracle databases.
```
select name,cdb,open_mode,con_id from v$database;
```

Create an account and configure permissions

You need to plan a database logon account for subsequent operations. This account requires the necessary Oracle permissions.

Create an account. For more information, see Create an Oracle account.

Configure permissions.

You can use the following commands to grant permissions to the account. When you run the statements, replace 'sync_account' with the account that you created.

grant create session to 'sync_account';  // Grants the logon permission to the sync account.
grant connect to 'sync_account';  // Grants the database connection permission to the sync account.
grant select on nls_database_parameters to 'sync_account';  // Grants the permission to query the nls_database_parameters system configuration.
grant select on all_users to 'sync_account';  // Grants the permission to query all users in the database.
grant select on all_objects to 'sync_account';  // Grants the permission to query all objects in the database.
grant select on DBA_MVIEWS to 'sync_account';  // Grants the permission to view materialized views in the database.
grant select on DBA_MVIEW_LOGS to 'sync_account';  // Grants the permission to view materialized view logs in the database.
grant select on DBA_CONSTRAINTS to 'sync_account';  // Grants the permission to view the constraint information of all tables.
grant select on DBA_CONS_COLUMNS to 'sync_account';  // Grants the permission to view information about all columns in the specified constraints of all tables.
grant select on all_tab_cols to 'sync_account';  // Grants the permission to view information about columns in tables, views, and clusters.
grant select on sys.obj$ to 'sync_account';  // Grants the permission to view objects in the database. The sys.obj$ table is the base object table in the Oracle data dictionary and stores all Oracle objects.
grant select on SYS.COL$ to 'sync_account';  // Grants the permission to view column definition information. SYS.COL$ stores column definitions.
grant select on sys.USER$ to 'sync_account';  // Grants the permission to view system tables. sys.USER$ is the default service for user sessions.
grant select on sys.cdef$ to 'sync_account';  // Grants the permission to view system tables.
grant select on sys.con$ to 'sync_account';  // Grants the permission to view constraint information. sys.con$ records Oracle constraint information.
grant select on all_indexes to 'sync_account';  // Grants the permission to view all indexes in the database.
grant select on v_$database to 'sync_account';  // Grants the permission to view the v_$database view.
grant select on V_$ARCHIVE_DEST to 'sync_account';  // Grants the permission to view the V_$ARCHIVE_DEST view.
grant select on v_$log to 'sync_account';  // Grants the permission to view the v_$log view. v_$log displays log file information from the control file.
grant select on v_$logfile to 'sync_account';  // Grants the permission to view the v_$logfile view. v_$logfile contains information about redo log files.
grant select on v_$archived_log to 'sync_account';  // Grants the permission to view the v$archived_log view. v$archived_log contains information about archived logs.
grant select on V_$LOGMNR_CONTENTS to 'sync_account';  // Grants the permission to view the V_$LOGMNR_CONTENTS view.
grant select on DUAL to 'sync_account';   // Grants the permission to view the DUAL table. DUAL is a virtual table used to conform to the SELECT syntax rules. In Oracle, DUAL contains only one record.
grant select on v_$parameter to 'sync_account';  // Grants the permission to view the v_$parameter view. v$parameter is a dynamic dictionary table that stores database parameter settings.
grant select any transaction to 'sync_account';  // Grants the permission to view any transaction.
grant execute on SYS.DBMS_LOGMNR to 'sync_account';  // Grants the permission to use the Logmnr tool. The Logmnr tool helps you analyze transactions and recover lost data.
grant alter session to 'sync_account';  // Grants the permission to alter database sessions.
grant select on dba_objects to 'sync_account';  // Grants the permission to view all objects in the database.
grant select on v_$standby_log to 'sync_account';  // Grants the permission to view the v_$standby_log view. v_$standby_log contains archived logs of the secondary database.
grant select on v_$ARCHIVE_GAP to 'sync_account';  // Grants the permission to query for missing archived logs.
grant select on sys.ICOL$ to 'sync_account';  // Grants the permission to view objects in the database. The sys.obj$ table is the base object table in the Oracle data dictionary and stores all Oracle objects.
grant select on V$THREAD to 'sync_account';  // Grants the permission to view instance thread information. The V$THREAD view is a thread state table in Oracle's dynamic performance views. It stores the thread status of each instance in a RAC cluster and is used to determine the instance status and log source during incremental parsing.

To perform offline full data synchronization, you must also run the following command to grant the query permission on all tables to the sync account.

grant select any table to 'sync_account';

For Oracle 12c and later versions, run the following command to grant the log mining permission to the sync account. For versions earlier than Oracle 12c, this command is not required because the log mining feature is built-in.

grant LOGMINING TO 'sync_account';

Enable archived logs and supplemental logging, and switch redo log files

Enable archived logs. The SQL statements are as follows.

shutdown immediate;
startup mount;
alter database archivelog;
alter database open;

Enable supplemental logging.

To enable the appropriate supplemental logging, use the following SQL statement.

// For real-time synchronization to capture complete data changes, you must enable supplemental logging. You can enable it at the database level. The SQL statements are as follows:
alter database add supplemental log data(primary key) columns; // Enables supplemental logging for primary key columns of the database.
alter database add supplemental log data(unique) columns; // Enables supplemental logging for unique index columns of the database.

// You can also enable it at the table level:
alter schema.table add supplemental log data(primary key) columns; // Enables supplemental logging for primary key columns of a specific table.
alter schema.table add supplemental log data(unique) columns; // Enables supplemental logging for unique index columns of a specific table.

Switch redo log files.
After you enable supplemental logging, run the following command multiple times (five times is recommended) to switch the redo log files.
```
alter system switch logfile;
```
Note
Running the command multiple times ensures that the system switches to the next log file when the current log file is full. This process prevents the loss of operation records and helps with subsequent data restoration.

Check the database character encoding

Run the following command in your database to check its character encoding.

select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');

v$nls_parameters stores the settings of database parameters.
NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET are the database character set and the national character set, respectively. They indicate the storage class for the two main types of character data in Oracle.

Note

Data Integration supports only the UTF8, AL32UTF8, AL16UTF16, and ZHS16GBK encoding formats for data synchronization. If your database uses an unsupported character encoding, modify it before you synchronize data.

Check the data types of database tables

You can use a SELECT statement to query the data types in a database table. The following example shows how to view the data types in the 'tablename' table.

select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename';

COLUMN_NAME: The name of the table column.
DATA_TYPE: The data type of the corresponding column.
all_tab_columns: A view that stores information about all columns in the database tables.
TABLE_NAME: The name of the target table to query. When you run the statement, replace 'tablename' with the actual name of your table.

You can also run select * from 'tablename'; to query all information about the target table and retrieve the data types.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task: Oracle synchronization process guide

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Guide to configuring an offline sync task for a single table

For more information about the procedure, see Configure a task in the codeless UI and Configure a task in the code editor.
For all parameters and a script demo for the code editor, see Appendix: Script demo and parameter descriptions.

Guide to configuring a real-time sync task for a single table

For more information about the procedure, see Configure a real-time sync task in DataStudio.

Guide to configuring whole-database synchronization

For more information about the procedure, see Configure a real-time sync task for an entire database.

FAQ

Real-time sync tasks for Oracle, PolarDB, and MySQL report repeated errors
Data restoration for active/standby synchronization
Active/standby synchronization issues can occur when you use an active/standby disaster recovery architecture for Oracle. If the primary database fails and the system switches to the secondary database, the secondary database continuously recovers data from the primary database using binlogs. Due to the time lag in data synchronization between the primary and secondary databases, network latency and other conditions can cause significant discrepancies between the data recovered by the secondary database and the data in the primary database. As a result, the data synchronized from the secondary database is not a complete snapshot of the data at that point in time.
Consistency constraints
Oracle is a relational database management system (RDBMS) that provides strongly consistent data query interfaces. For example, during the execution of a sync task, if another data writer writes data to the database, Oracle Reader does not retrieve the new data because of the snapshot feature of the database.
The consistency described above is achieved when Oracle Reader uses a single-threaded model for data synchronization. If Oracle Reader uses concurrent tasks to pull data based on your configurations, strong data consistency cannot be guaranteed.
When Oracle Reader splits data based on splitPk, it starts multiple concurrent tasks to synchronize the data. These tasks do not belong to the same read transaction and have time intervals between them. Therefore, the synchronized data is not a complete and consistent data snapshot.
A technical solution for consistent snapshots in a multi-threaded environment is not currently available. You can address this issue only from an engineering perspective. Engineering solutions involve trade-offs. The following are possible solutions. You can choose one based on your needs.
- Use single-threaded synchronization. This means that data is not split into chunks. The disadvantage is that the speed is slow, but it ensures consistency.
- Disable other data writers to ensure that the data is static. For example, you can lock tables or disable synchronization to the secondary database. The disadvantage is that this may affect online services.
Database encoding issues
Oracle Reader uses Java Database Connectivity (JDBC) to pull data. JDBC is compatible with various encodings and performs encoding conversion at the underlying layer. Therefore, you do not need to specify an encoding for Oracle Reader. It can automatically obtain and transcode the encoding.
Methods for incremental data synchronization
Oracle Reader uses SELECT…WHERE… statements to pull data. Therefore, you can use these statements to perform incremental data synchronization in the following ways:
- When an online application writes data to the database, it populates a `modify` field with a timestamp for changes, such as additions, updates, and logical deletions. For these applications, Oracle Reader only needs to add a `WHERE` clause with the timestamp of the last synchronization.
- For new, stream-like data, Oracle Reader can add a `WHERE` clause with the maximum auto-increment ID from the previous stage.
If your business logic does not use a field to distinguish between new and modified data, Oracle Reader cannot perform incremental data synchronization. It can only synchronize the full data.
SQL security
Oracle Reader provides the querySql feature, which lets you customize SELECT statements for data pulling. Oracle Reader does not perform any security checks on the querySql.

Appendix: Script demo and parameter descriptions

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "oracle",
            "parameter": {
                "selectedDatabase": "AUTOTEST",
                "indexes": [],
                "datasource": "oracle_test",
                "envType": 0,
                "useSpecialSecret": true,
                "column": [
                    "id"
                ],
                "where": "",
                "splitPk": "id",
                "encoding": "UTF-8",
                "table": "AUTOTEST.table01"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "odps",
            "parameter": {
            },
            "name": "Writer",
            "category": "writer"
        },
        {
            "name": "Processor",
            "stepType": null,
            "category": "processor",
            "copies": 1,
            "parameter": {
                "nodes": [],
                "edges": [],
                "groups": [],
                "version": "2.0"
            }
        }
    ],
    "setting": {
        "executeMode": null,
        "errorLimit": {
            "record": ""
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader script parameters

Parameter	Description	Required	Default value
datasource	The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source.	Yes	None
selectedDatabase	The schema of the database to be synchronized.	Yes	None
table	The name of the table to be synchronized. The format must be `schema.tableName`. Note For example, if selectedDatabase is `AUTOTEST` and the table name is `table01`, set table to `AUTOTEST.table01`.	Yes	None
column	The set of columns to be synchronized. Use a JSON array to describe the field information. By default, all columns are used, for example, ["*"]. Column pruning is supported. You can export a subset of columns. Column reordering is supported. You can export columns in an order different from the table schema. Constants are supported. You must configure them in JSON format. `["id", "1", "'mingya.wmy'", "null", "to_char(a + 1)", "2.3" , "true"]` id is a regular column name. 1 is an integer constant. 'mingya.wmy' is a string constant. Note that it must be enclosed in single quotation marks. null is a null pointer. to_char(a + 1) is an expression. 2.3 is a floating-point number. true is a Boolean value. column is required and cannot be empty.	Yes	None
splitFactor	The splitting factor. You can configure the number of chunks to split the data into for synchronization. If you configure multiple concurrent tasks, the data is split into concurrency × splitFactor chunks. For example, if concurrency is 5 and splitFactor is 5, the data is split into 5 × 5 = 25 chunks and processed by five concurrent threads. Note The recommended value range is 1 to 100. A value that is too large may cause an out-of-memory (OOM) error.	No	5
splitMode	The splitting mode. Valid values: averageInterval: average sampling. Finds the maximum and minimum values of splitPK and then splits the data evenly based on the number of chunks. randomSampling: random sampling. Randomly selects a number of data points as split points. Note The splitMode parameter must be used with the splitPk parameter. If splitPk is a numeric type, set splitMode to averageInterval. If splitPk is a string type, set splitMode to randomSampling.	No	randomSampling
splitPk	When Oracle Reader extracts data, if you specify splitPk, it indicates that you want to use the field specified by splitPk for data sharding. As a result, data synchronization starts concurrent tasks, which improves the efficiency of data synchronization. We recommend that you use the primary key of the table as the splitPk. The primary key is usually evenly distributed, which helps prevent data hot spots in the resulting shards. splitPk supports only numeric or string types. The splitMode parameter must be used with the splitPk parameter. If splitPk is a numeric type, set splitMode to averageInterval. If splitPk is a string type, set splitMode to randomSampling. If you do not specify splitPk, the data in the table is not split. Oracle Reader uses a single channel to synchronize the full data. Note The splitPK field cannot be ROWID for views.	No	None
where	The filter condition. Oracle Reader constructs a SQL statement based on the specified column, table, and where clause, and then pulls data based on that SQL statement. For example, you can specify row_number() as the where condition during testing. The where clause can be used for incremental business data synchronization. If you do not configure the where clause or leave it empty, the entire table is synchronized.	No	None
querySql (Advanced mode, not supported in the codeless UI)	In some business scenarios, the where clause is not sufficient to describe the filter conditions. You can use this parameter to customize the filter SQL. When you configure this parameter, the data synchronization system ignores configurations such as table and column and uses this SQL to filter data. For example, to join multiple tables before synchronization, use `select a,b from table_a join table_b on table_a.id = table_b.id`. When you configure querySql, Oracle Reader ignores the table, column, and where configurations.	No	None
fetchSize	This parameter defines the number of records to fetch in each batch from the database server. This value determines the number of network interactions between the data synchronization system and the server, which can significantly improve data pulling performance. Note A fetchSize value that is too large (>2,048) may cause an OOM error in the data synchronization process.	No	1,024

Writer script demo

{
    "type":"job",
    "version":"2.0",// Version number.
    "steps":[
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"oracle",// Plugin name.
            "parameter":{
                "postSql":[],// SQL statement to execute after the data synchronization task.
                "datasource":"",
                "session":[],// Database connection session parameters.
                "column":[// Fields.
                    "id",
                    "name"
                ],
                "encoding":"UTF-8",// Encoding format.
                "batchSize":1024,// The number of records in a batch.
                "table":"",// Table name.
                "preSql":[]// SQL statement to execute before the data synchronization task.
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// Number of error records.
        },
        "speed":{
            "throttle":true,// If throttle is false, mbps is ignored and throttling is disabled. If throttle is true, throttling is enabled.
            "concurrent":1, // Job concurrency.
            "mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer script parameters

Parameter	Description	Required	Default value
datasource	The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source.	Yes	None
table	The name of the destination table. If the schema of the table is different from the username configured above, use the schema.table format for the table information.	Yes	None
writeMode	The import mode. Only insert into is supported. If a primary key or unique index conflict occurs, the conflicting rows are not written and are treated as dirty data.	No	insert into
column	The fields in the destination table to which data is to be written. Separate field names with commas. For example, `"column": ["id","name","age"]`. To write to all columns in order, use an asterisk (). For example, `"column":[""]`.	Yes	None
preSql	The SQL statement to execute before the data synchronization task starts. The codeless UI supports only one SQL statement. The code editor supports multiple SQL statements, for example, to clear old data.	No	None
postSql	The SQL statement to execute after the data synchronization task is complete. The codeless UI supports only one SQL statement. The code editor supports multiple SQL statements, for example, to add a timestamp.	No	None
batchSize	The number of records in a batch. A larger value can significantly reduce the number of network interactions between the data synchronization system and Oracle, and improve overall throughput. If this value is set too high, it may cause an OOM error in the data synchronization process.	No	1,024