All Products
Search
Document Center

DataWorks:Oracle data source

Last Updated:Oct 17, 2025

The Oracle data source provides a bidirectional channel for reading data from and writing data to Oracle. You can configure data synchronization tasks using the codeless UI or the code editor. This topic describes the capabilities of DataWorks for Oracle data synchronization.

Supported versions

Offline read and write

Version

Offline read (Oracle Reader)

Offline write (Oracle Writer)

Oracle 11.2 or 11gR2

Supported

Supported

Oracle 12.1 or 12cR1

Supported

Supported

Oracle 12.2 or 12cR2

Supported (New version features are not supported)

Supported (New version features are not supported)

Oracle 18.3

Supported (New version features are not supported)

Supported (New version features are not supported)

Oracle 19.x

Supported (New version features are not supported)

Supported (New version features are not supported)

Oracle 21.1

Supported (New version features are not supported)

Supported (New version features are not supported)

Important

The Oracle Writer plugin uses the ojdbc6-12.1.1.jar driver.

Real-time read

  • Supported:

    11g R2 and later, 12c non-CDB, 18c non-CDB, 19c non-CDB, or 19c CDB databases.

  • Not supported:

    12c CDB and 18c CDB databases.

Note

A container database (CDB) is a feature introduced in Oracle 12c that hosts multiple pluggable databases (PDBs).

Limits

  • The character sets of the Oracle instance, NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET, must be AL32UTF8, AL16UTF16, ZHS16GBK, or UTF8.

  • A single Oracle instance supports real-time synchronization of up to 500 GB of incremental data per day.

  • When DataWorks reads data from a secondary Oracle database in real time, only physical standby databases are supported. DataWorks pulls only archived logs. This results in higher latency compared to synchronization from a primary database. The latency depends on the generation speed of the archived logs.

  • Real-time synchronization does not support views. Offline synchronization supports reading data from views.

  • Real-time reading of Oracle data in DataWorks is implemented using the Oracle LogMiner log analysis tool. You must enable supplemental logging and archived logs on the primary Oracle database.

  • Index-organized tables are not supported for synchronization.

  • Tables where all columns are of a Large Object (LOB) type, such as BLOB, CLOB, or NCLOB, are not supported.

  • Chinese characters in Oracle table names and column names are not supported.

  • Oracle table names cannot exceed 30 bytes.

Supported field types

Field type

Offline read (Oracle Reader)

Offline write (Oracle Writer)

Real-time read

NUMBER

Supported

Supported

Support

BINARY FLOAT

Supported

Supported

Supported

BINARY DOUBLE

Supported

Supported

Supported

CHAR

Supported

Supported

Supported

NCHAR

Supported

Supported

Supported

VARCHAR2

Supported

Supported

Support

NVARCHAR2

Supported

Supported

Supported

DATE

Supported

Supported

Supported

TIMESTAMP

Supported

Supported

Supported

TIMESTAMP WITH TIME ZONE

Support

Supported

Not supported

TIMESTAMP WITH LOCAL TIME ZONE

Supported

Supported

Not supported

CLOB

Supported

Supported

Supported

BLOB

Supported

Supported

Supported

RAW

Supported

Supported

Supported

ROWID

Not supported

Not supported

Support

UROWID

Not supported

Not supported

Supported

FLOAT

Supported

Supported

Supported

INTERVAL DAY TO SECOND

Not supported

Not supported

Supported

INTERVAL YEAR TO MONTH

Not supported

Not supported

Supported

BFILE

Not supported

Not supported

Not supported

LONG

Not supported

Not supported

Not supported

LONG RAW

Not supported

Not supported

Not supported

NCLOB

Supported

Supported

Not supported

STRUCT

Supported

Supported

Not supported

User-Defined Types

Not supported

Not supported

Not supported

AnyType

Not supported

Not supported

Not supported

AnyData

Not supported

Not supported

Not supported

AnyDataSet

Not supported

Not supported

Not supported

XmlType

Not supported

Not supported

Not supported

Spatial Types

Not supported

Not supported

Not supported

Media Types

Not supported

Not supported

Not supported

The following table lists the Oracle data type mappings for Oracle Reader.

Type classification

Oracle data type

Integer types

NUMBER, RAWID, INTEGER, INT, and SMALLINT

Floating-point types

NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISIOON, and REAL

String types

LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING

Date and time types

TIMESTAMP and DATE

Boolean type

BIT and BOOL

Binary types

BLOB, BFILE, RAW, and LONG RAW

Prerequisites: Prepare the Oracle environment

Before you synchronize data in DataWorks, you must prepare the Oracle environment as described in this topic. This preparation ensures that the service runs as expected when you configure and run Oracle data synchronization tasks. The following sections describe the preparation steps.

Confirm the Oracle database version

The supported Oracle database versions vary depending on the synchronization scenario. Before you start, verify that your Oracle database version is supported.

  1. Run either of the following statements to view the Oracle database version.

    • Statement 1:

      select * from v$version;
    • Statement 2:

      select version from v$instance;
  2. If the database version is 12c or 18c, run the following statement to check whether the database is a CDB. Data Integration real-time sync tasks do not support 12c and 18c CDB versions of Oracle databases.

    select name,cdb,open_mode,con_id from v$database;

Create an account and configure permissions

You need to create a database logon account that has the required permissions for Oracle.

  1. Create an account. For more information, see Create an Oracle account.

  2. Configure permissions.

    Run the following commands to grant permissions to the account. Replace 'sync_account' with the account that you created.

    grant create session to 'sync_account';  // Grants the permission to log on to the database.
    grant connect to 'sync_account';  // Grants the permission to connect to the database.
    grant select on nls_database_parameters to 'sync_account';  // Grants the permission to query the nls_database_parameters system configuration.
    grant select on all_users to 'sync_account';  // Grants the permission to query all users in the database.
    grant select on all_objects to 'sync_account';  // Grants the permission to query all objects in the database.
    grant select on DBA_MVIEWS to 'sync_account';  // Grants the permission to view materialized views in the database.
    grant select on DBA_MVIEW_LOGS to 'sync_account';  // Grants the permission to view materialized view logs in the database.
    grant select on DBA_CONSTRAINTS to 'sync_account';  // Grants the permission to view constraint information for all tables in the database.
    grant select on DBA_CONS_COLUMNS to 'sync_account';  // Grants the permission to view information about all columns in specified constraints for all tables in the database.
    grant select on all_tab_cols to 'sync_account';  // Grants the permission to view information about columns in tables, views, and clusters.
    grant select on sys.obj$ to 'sync_account';  // Grants the permission to view objects in the database. The sys.obj$ table is the base object table in the Oracle data dictionary and stores all Oracle objects.
    grant select on SYS.COL$ to 'sync_account';  // Grants the permission to view column definition information in database tables. SYS.COL$ stores column definition information.
    grant select on sys.USER$ to 'sync_account';  // Grants the permission to view system tables. sys.USER$ is the default service for user sessions.
    grant select on sys.cdef$ to 'sync_account';  // Grants the permission to view system tables.
    grant select on sys.con$ to 'sync_account';  // Grants the permission to view constraint information. sys.con$ records Oracle constraint information.
    grant select on all_indexes to 'sync_account';  // Grants the permission to view all indexes in the database.
    grant select on v_$database to 'sync_account';  // Grants the permission to view the v_$database view.
    grant select on V_$ARCHIVE_DEST to 'sync_account';  // Grants the permission to view the V_$ARCHIVE_DEST view.
    grant select on v_$log to 'sync_account';  // Grants the permission to view the v_$log view. The v_$log view displays log file information from the control file.
    grant select on v_$logfile to 'sync_account';  // Grants the permission to view the v_$logfile view. The v_$logfile view contains information about Redo log files.
    grant select on v_$archived_log to 'sync_account';  // Grants the permission to view the v$archived_log view. The v$archived_log view contains information about archived logs.
    grant select on V_$LOGMNR_CONTENTS to 'sync_account';  // Grants the permission to view the V_$LOGMNR_CONTENTS view.
    grant select on DUAL to 'sync_account';   // Grants the permission to view the DUAL table. DUAL is a virtual table used to conform to the SELECT syntax. In Oracle, DUAL contains only one record.
    grant select on v_$parameter to 'sync_account';  // Grants the permission to view the v_$parameter view. The v$parameter view is a dynamic dictionary table in Oracle that stores database parameter settings.
    grant select any transaction to 'sync_account';  // Grants the permission to view any transaction in the database.
    grant execute on SYS.DBMS_LOGMNR to 'sync_account';  // Grants the permission to use the Logmnr tool. The Logmnr tool helps you analyze transactions and recover lost data.
    grant alter session to 'sync_account';  // Grants the permission to modify database connections.
    grant select on dba_objects to 'sync_account';  // Grants the permission to view all objects in the database.
    grant select on v_$standby_log to 'sync_account';  // Grants the permission to view the v_$standby_log view. The v_$standby_log view contains archived logs from the standby database.
    grant select on v_$ARCHIVE_GAP to 'sync_account';  // Grants the permission to query for missing archived logs.
    grant select on sys.ICOL$ to 'sync_account';  // Grants the permission to view objects in the database. The sys.obj$ table is the base object table in the Oracle data dictionary and stores all Oracle objects.
    grant select on V$THREAD to 'sync_account';  // Grants the permission to view instance thread information. The V$THREAD view is a thread state table in the Oracle dynamic performance views. It stores the thread status of each instance in a RAC cluster and is used to determine instance status and log sources during incremental parsing.

    To perform offline full data synchronization, run the following command to grant the account the permission to query all tables.

    grant select any table to 'sync_account'; 

    For Oracle 12c and later, run the following command to grant the account the log mining permission. For versions earlier than Oracle 12c, you do not need to run this command because the log mining feature is built-in.

    grant LOGMINING TO 'sync_account';

Enable archived logs and supplemental logging, and switch Redo log files

  1. Enable archived logs by running the following SQL statements.

    shutdown immediate;
    startup mount;
    alter database archivelog;
    alter database open;
  2. Enable supplemental logging.

    You can enable supplemental logging as needed by running the following SQL statements.

    // To capture complete data changes during real-time synchronization, enable supplemental logging. You can enable it at the database level by running the following SQL statements:
    alter database add supplemental log data(primary key) columns; // Enables supplemental logging for primary key columns of the database.
    alter database add supplemental log data(unique) columns; // Enables supplemental logging for unique index columns of the database.
    
    // You can also enable it at the table level:
    alter schema.table add supplemental log data(primary key) columns; // Enables supplemental logging for primary key columns of a specific table.
    alter schema.table add supplemental log data(unique) columns; // Enables supplemental logging for unique index columns of a specific table.
  3. Switch Redo log files.

    After you enable supplemental logging, run the following command multiple times (five times is recommended) to switch the Redo log files.

    alter system switch logfile;
    Note

    Running the command multiple times ensures that when the current log file is full, the system switches to the next one. This prevents the loss of operation records and facilitates data restoration.

Check the database character encoding

Run the following statement to check the character encoding formats of the database:

select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
  • v$nls_parameters stores the settings for database parameters.

  • NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET are the database character set and the national character set. They specify the storage class for the two main types of character data in Oracle.

Note

Data Integration supports only the UTF8, AL32UTF8, AL16UTF16, and ZHS16GBK character encodings for data synchronization. If your database uses an unsupported character encoding, you must change the encoding before you can synchronize the data.

Check the data types of database tables

You can use a SELECT statement to query the data types in a database table. The following example shows how to view the data types in the 'tablename' table.

select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename'; 
  • COLUMN_NAME: The name of the table column.

  • DATA_TYPE: The data type of the corresponding column.

  • all_tab_columns: A view that stores information about all columns in database tables.

  • TABLE_NAME: The name of the target table to query. When you run the statement, replace 'tablename' with the name of your table.

You can also run select * from 'tablename'; to query all information about the target table and retrieve the data types.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task: Oracle synchronization process guide

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Guide to configuring an offline sync task for a single table

Guide to configuring a real-time sync task for a single table

For more information about the procedure, see Configure a real-time sync task in DataStudio.

Guide to configuring synchronization for an entire database, including offline, real-time full and incremental, and sharded databases

For more information about the procedure, see Configure a real-time sync task for an entire database.

FAQ

  • Real-time sync tasks for Oracle, PolarDB, or MySQL report repeated errors

  • Data restoration issues in active/standby synchronization

    This issue occurs when you use an active/standby disaster recovery architecture for Oracle. If the primary database fails and the system switches to the secondary database, the secondary database continuously restores data from the primary database using archived logs. Due to time differences in data synchronization and factors such as network latency, the data on the secondary database may differ significantly from the data on the primary database. Therefore, the data synchronized from the secondary database is not a complete, up-to-date snapshot.

  • Consistency constraints

    Oracle is a relational database management system (RDBMS) that provides strongly consistent data query interfaces. For example, during a sync task, if another data writer writes data to the database, Oracle Reader does not retrieve the new data because of the database snapshot feature.

    The consistency described above applies to data synchronization in a single-threaded model of Oracle Reader. If Oracle Reader extracts data concurrently based on your configuration, strong consistency cannot be guaranteed.

    When Oracle Reader splits data based on the splitPk parameter, it starts multiple concurrent tasks to synchronize the data. These tasks do not belong to the same read transaction and run at different times. Therefore, the synchronized data is not a complete and consistent data snapshot.

    A technical solution for a consistent snapshot across multiple threads is not currently available. You can use an engineering approach to address this issue. The following solutions involve trade-offs. Choose one based on your requirements.

    • Use single-threaded synchronization without data sharding. This method is slower but ensures consistency.

    • Disable other data writers to ensure the data is static. For example, you can lock tables or disable synchronization to the secondary database. This may affect online business.

  • Database encoding issues

    Oracle Reader uses Java Database Connectivity (JDBC) to extract data. JDBC is compatible with various encodings and performs encoding conversion at the underlying layer. Therefore, you do not need to specify an encoding for Oracle Reader. It can automatically retrieve and transcode the encoding.

  • Methods for incremental data synchronization

    Oracle Reader uses `SELECT` statements to extract data. You can use SELECT…WHERE… to perform incremental data extraction in the following ways:

    • When an online application writes to the database, it populates a `modify` field with a timestamp for changes, including additions, updates, and logical deletions. In this scenario, Oracle Reader can add a `WHERE` clause with the timestamp of the previous synchronization phase.

    • For new stream data, Oracle Reader can add a `WHERE` clause with the maximum auto-increment ID from the previous phase.

    If your business logic does not use a field to distinguish between new and modified data, Oracle Reader cannot perform incremental data synchronization. It can only synchronize the full data.

  • SQL security

    Oracle Reader provides the querySql feature, which lets you customize the `SELECT` extraction statement. Oracle Reader does not perform any security checks on the querySql statement.

Appendix: Script demo and parameter description

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "oracle",
            "parameter": {
                "selectedDatabase": "AUTOTEST",
                "indexes": [],
                "datasource": "oracle_test",
                "envType": 0,
                "useSpecialSecret": true,
                "column": [
                    "id"
                ],
                "where": "",
                "splitPk": "id",
                "encoding": "UTF-8",
                "table": "AUTOTEST.table01"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "odps",
            "parameter": {
            },
            "name": "Writer",
            "category": "writer"
        },
        {
            "name": "Processor",
            "stepType": null,
            "category": "processor",
            "copies": 1,
            "parameter": {
                "nodes": [],
                "edges": [],
                "groups": [],
                "version": "2.0"
            }
        }
    ],
    "setting": {
        "executeMode": null,
        "errorLimit": {
            "record": ""
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader script parameters

Parameter

Description

Required

Default value

datasource

The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source.

Yes

None

selectedDatabase

The schema of the database to synchronize.

Yes

None

table

The name of the table to synchronize. The format must be schema.tableName.

Note

For example, if selectedDatabase is AUTOTEST and the table name is table01, set table to AUTOTEST.table01.

Yes

None

column

The names of the columns to synchronize from the specified table. Use a JSON array to describe the fields. By default, all columns are used, for example, ["*"].

  • You can crop columns, which means you can export a subset of columns.

  • You can reorder columns, which means you can export columns in an order different from the table schema.

  • You can configure constants in JSON format.

    ["id", "1", "'mingya.wmy'", "null", "to_char(a + 1)", "2.3" , "true"]
    • id is a regular column name.

    • 1 is an integer constant.

    • 'mingya.wmy' is a string constant. Note that it must be enclosed in single quotation marks.

    • null is a null pointer.

    • to_char(a + 1) is an expression.

    • 2.3 is a floating-point number.

    • true is a Boolean value.

  • You must explicitly specify a value for column. It cannot be empty.

Yes

None

splitFactor

The splitting factor. This specifies the number of chunks to split the data into for synchronization. If you configure multiple concurrent threads, the data is split into a number of chunks equal to concurrency × splitFactor. For example, if concurrency is 5 and splitFactor is 5, the data is split into 5 × 5 = 25 chunks and processed by 5 concurrent threads.

Note

The recommended value range is 1 to 100. A value that is too large may cause an out-of-memory (OOM) error.

No

5

splitMode

The splitting mode. Valid values:

  • averageInterval: Splits the data based on the average interval. This mode finds the maximum and minimum values of the splitPK and splits the data evenly based on the number of chunks.

  • randomSampling: Splits the data based on random sampling. This mode randomly selects a number of data points from all data as split points.

Note

The splitMode parameter must be used with the splitPk parameter.

  • When splitPk is a numeric type, set splitMode to averageInterval.

  • When `splitPk` is a string, splitMode must be set to randomSampling.

No

randomSampling

splitPk

When Oracle Reader extracts data, if you specify the splitPk parameter, it means that you want to use the field represented by splitPk for data sharding. As a result, data synchronization starts concurrent tasks, which improves its efficiency.

  • We recommend that you use the primary key of the table as the splitPk. The primary key is usually evenly distributed, which helps prevent data hot spots in the resulting shards.

  • The splitPk parameter supports only numeric or string types. It must be used with the splitMode parameter.

    • When splitPk is a numeric type, set splitMode to averageInterval.

    • When `splitPk` is a string, splitMode must be set to randomSampling.

  • If you do not specify splitPk, the table is not sharded. Oracle Reader uses a single channel to synchronize the full data.

Note

The splitPK field cannot be ROWID for a view.

No

None

where

The filter condition. Oracle Reader constructs a SQL statement based on the specified column, table, and where conditions, and then extracts data based on that SQL statement. For example, you can specify row_number() as the `where` condition during testing.

  • The `where` condition can be used for incremental business data synchronization.

  • If the `where` condition is not configured or is empty, the entire table is synchronized.

No

None

querySql (Advanced mode. Not supported in the codeless UI.)

In some business scenarios, the `where` parameter is not sufficient to describe the filter conditions. You can use this parameter to customize the filter SQL. When you configure this parameter, the data synchronization system ignores the table and column configurations and uses the content of this parameter to filter data. For example, to synchronize data after a multi-table join, use select a,b from table_a join table_b on table_a.id = table_b.id. When you configure querySql, Oracle Reader ignores the table, column, and where configurations.

No

None

fetchSize

This parameter defines the number of data records to fetch in each batch from the database server. This value determines the number of network interactions between the data synchronization system and the server, which can significantly improve data extraction performance.

Note

A `fetchSize` value that is too large (>2048) may cause an OOM error in the data synchronization process.

No

1,024

Writer script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"oracle",// The plugin name.
            "parameter":{
                "postSql":[],// The SQL statement to execute after the data synchronization task.
                "datasource":"",
                "session":[],// The database connection session parameters.
                "column":[// The fields.
                    "id",
                    "name"
                ],
                "encoding":"UTF-8",// The encoding format.
                "batchSize":1024,// The number of records in a batch.
                "table":"",// The table name.
                "preSql":[]// The SQL statement to execute before the data synchronization task.
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The number of error records.
        },
        "speed":{
            "throttle":true,// If throttle is false, the mbps parameter is ignored and throttling is disabled. If throttle is true, throttling is enabled.
            "concurrent":1, // The number of concurrent jobs.
            "mbps":"12"// The throttling rate. 1 mbps = 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer script parameters

Parameter

Description

Required

Default value

datasource

The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source.

Yes

None

table

The name of the destination table. If the table schema is different from the configured username, specify the table in the schema.table format.

Yes

None

writeMode

The import mode. Only insert into is supported. If a primary key or unique index conflict occurs, the conflicting rows fail to be written and are recorded as dirty data.

No

insert into

column

The fields in the destination table to which you want to write data. Separate the fields with commas. Example: "column": ["id","name","age"]. To write to all columns in order, use an asterisk (*). Example: "column":["*"].

Yes

None

preSql

The SQL statement to execute before the data synchronization task starts. The codeless UI supports only one SQL statement. The code editor supports multiple SQL statements, for example, to clear old data.

No

None

postSql

The SQL statement to execute after the data synchronization task is complete. The codeless UI supports only one SQL statement. The code editor supports multiple SQL statements, for example, to add a timestamp.

No

None

batchSize

The number of records to submit in a batch. A larger value can significantly reduce network interactions between the data synchronization system and Oracle, improving overall throughput. If this value is too large, it may cause an OOM error in the data synchronization process.

No

1,024