All Products
Search
Document Center

DataWorks:Oracle data source

Last Updated:Jun 09, 2026

DataWorks provides Oracle Reader and Oracle Writer to read from and write to Oracle data sources. You can use the codeless user interface (UI) or code editor to configure data synchronization tasks for Oracle data sources. This topic describes the supported data synchronization capabilities for Oracle data sources.

Supported versions

Batch data read and write

Version

Batch data read (Oracle Reader)

Batch data write (Oracle Writer)

Oracle 11.2 or 11gR2

Supported

Supported

Oracle 12.1 or 12cR1

Supported

Supported

Oracle 12.2 or 12cR2

Supported (without new features)

Supported (without new features)

Oracle 18.3

Supported (without new features)

Supported (without new features)

Oracle 19.x

Supported (without new features)

Supported (without new features)

Oracle 21.1

Supported (without new features)

Supported (without new features)

Important

The Oracle Writer plug-in uses the ojdbc6-12.1.1.jar driver.

Real-time data read

  • Supported versions:

    11g R2 and later, 12c non-CDB, 18c non-CDB, 19c non-CDB, or 19c CDB.

  • Unsupported versions:

    12c CDB and 18c CDB.

  • To use an Oracle database as a source for a real-time full-database synchronization task, you must enable the archive log feature. For more information, see Prepare an Oracle environment.

Note

A container database (CDB) is a new feature in Oracle 12c and later. A CDB can host multiple pluggable databases (PDBs).

Limitations

  • The NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET of the Oracle instance must be one of the following: AL32UTF8, AL16UTF16, ZHS16GBK, or UTF8.

  • For real-time synchronization, a single Oracle instance supports up to 500 GB of incremental data per day.

  • DataWorks supports real-time reads only from physical Oracle secondary databases. This method pulls archive logs, which causes higher latency than synchronizing from a primary database. The latency depends on the archive log generation speed.

  • While real-time synchronization does not support views, batch synchronization supports reading from them.

  • DataWorks uses the Oracle LogMiner utility for real-time data reads. To synchronize data from either a primary or secondary database, you must enable supplemental logging and archive logs on the primary database.

  • Index-organized tables are not supported.

  • Tables containing only LOB-type columns (BLOB, CLOB, or NCLOB) are not supported.

  • Oracle table names and column names that contain Chinese characters are not supported.

  • Oracle table names cannot exceed 30 bytes.

Supported data types

Type

Batch read (Oracle Reader)

Batch write (Oracle Writer)

Real-time read

NUMBER

Supported

Supported

Supported

BINARY FLOAT

Supported

Supported

Supported

BINARY DOUBLE

Supported

Supported

Supported

CHAR

Supported

Supported

Supported

NCHAR

Supported

Supported

Supported

VARCHAR2

Supported

Supported

Supported

NVARCHAR2

Supported

Supported

Supported

DATE

Supported

Supported

Supported

TIMESTAMP

Supported

Supported

Supported

TIMESTAMP WITH TIME ZONE

Supported

Supported

Not supported

TIMESTAMP WITH LOCAL TIME ZONE

Supported

Supported

Not supported

CLOB

Supported

Supported

Supported

BLOB

Supported

Supported

Supported

RAW

Supported

Supported

Supported

ROWID

Not supported

Not supported

Supported

UROWID

Not supported

Not supported

Supported

FLOAT

Supported

Supported

Supported

INTERVAL DAY TO SECOND

Not supported

Not supported

Supported

INTERVAL YEAR TO MONTH

Not supported

Not supported

Supported

BFILE

Not supported

Not supported

Not supported

LONG

Not supported

Not supported

Not supported

LONG RAW

Not supported

Not supported

Not supported

NCLOB

Supported

Supported

Not supported

STRUCT

Supported

Supported

Not supported

user-defined types

Not supported

Not supported

Not supported

AnyType

Not supported

Not supported

Not supported

AnyData

Not supported

Not supported

Not supported

AnyDataSet

Not supported

Not supported

Not supported

XmlType

Not supported

Not supported

Not supported

spatial types

Not supported

Not supported

Not supported

media types

Not supported

Not supported

Not supported

The following table categorizes the Oracle data types for Oracle Reader.

Category

Oracle data type

integer

NUMBER, RAWID, INTEGER, INT, and SMALLINT

floating point

NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISION, and REAL

string

LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING

date and time

TIMESTAMP and DATE

boolean

BIT and BOOL

binary

BLOB, BFILE, RAW, and LONG RAW

Prepare an Oracle environment

To synchronize data from Oracle using DataWorks, you must first prepare your environment. This topic outlines the required preparations to ensure that data synchronization tasks run as expected.

Check the Oracle database version

The supported Oracle database versions vary by synchronization scenario. Before synchronizing data, verify that your Oracle database version is supported.

  1. To check your Oracle database version, run either of the following statements.

    • Statement 1:

      select * from v$version;
    • Statement 2:

      select version from v$instance;
  2. If your Oracle database version is 12c or 18c, you need to use the following statement to confirm whether the database is a cdb database. DataWorks Data Integration real-time synchronization tasks do not currently support Oracle cdb databases of version 12c or 18c.

    select name,cdb,open_mode,con_id from v$database;

Create an account and grant privileges

Create a database account with the required privileges in Oracle.

  1. For details, see Create an Oracle account.

  2. Grant privileges.

    Run the following commands to grant the required privileges to the account. In the following statements, replace 'synchronization account' with the account you created.

    grant create session to 'synchronization account';  // Grants the privilege to log on to the database.
    grant connect to 'synchronization account';  // Grants the privilege to connect to the database.
    grant select on nls_database_parameters to 'synchronization account';  // Grants the privilege to query the nls_database_parameters system configuration.
    grant select on all_users to 'synchronization account';  // Grants the privilege to query all users in the database.
    grant select on all_objects to 'synchronization account';  // Grants the privilege to query all objects in the database.
    grant select on DBA_MVIEWS to 'synchronization account';  // Grants the privilege to view the materialized views of the database.
    grant select on DBA_MVIEW_LOGS to 'synchronization account';  // Grants the privilege to view the materialized view logs of the database.
    grant select on DBA_CONSTRAINTS to 'synchronization account';  // Grants the privilege to view constraint information for all tables in the database.
    grant select on DBA_CONS_COLUMNS to 'synchronization account';  // Grants the privilege to view information about columns in specified constraints for all tables in the database.
    grant select on all_tab_cols to 'synchronization account';  // Grants the privilege to view information about columns in tables, views, and clusters.
    grant select on sys.obj$ to 'synchronization account';  // Grants the privilege to view objects in the database. The sys.obj$ table is a base table in the Oracle data dictionary that stores all Oracle objects.
    grant select on SYS.COL$ to 'synchronization account';  // Grants the privilege to view column definitions in database tables. SYS.COL$ stores column definition information.
    grant select on sys.USER$ to 'synchronization account';  // Grants the privilege to view the sys.USER$ system table, which contains details about database users.
    grant select on sys.cdef$ to 'synchronization account';  // Grants the privilege to view a system table of the database.
    grant select on sys.con$ to 'synchronization account';  // Grants the privilege to view constraint information. sys.con$ records Oracle constraint information.
    grant select on all_indexes to 'synchronization account';  // Grants the privilege to view all indexes in the database.
    grant select on v_$database to 'synchronization account';  // Grants the privilege to view the v_$database view.
    grant select on V_$ARCHIVE_DEST to 'synchronization account';  // Grants the privilege to view the V_$ARCHIVE_DEST view.
    grant select on v_$log to 'synchronization account';  // Grants the privilege to view the v_$log view, which displays log file information from the control file.
    grant select on v_$logfile to 'synchronization account';  // Grants the privilege to view the v_$logfile view, which contains information about redo log files.
    grant select on v_$archived_log to 'synchronization account';  // Grants the privilege to view the v$archived_log view, which contains information about archived logs.
    grant select on V_$LOGMNR_CONTENTS to 'synchronization account';  // Grants the privilege to view the V_$LOGMNR_CONTENTS view.
    grant select on DUAL to 'synchronization account';   // Grants the privilege to view the DUAL table. DUAL is a virtual table used to conform to SELECT syntax rules. In Oracle, DUAL contains only one record.
    grant select on v_$parameter to 'synchronization account';  // Grants the privilege to view the v_$parameter view. v$parameter is a dynamic data dictionary table that stores database parameter settings.
    grant select any transaction to 'synchronization account';  // Grants the privilege to view any transaction in the database.
    grant execute on SYS.DBMS_LOGMNR to 'synchronization account';  // Grants the privilege to use the LogMiner tool, which helps you analyze transactions and recover lost data.
    grant alter session to 'synchronization account';  // Grants the privilege to modify the database session.
    grant select on dba_objects to 'synchronization account';  // Grants the privilege to view all objects in the database.
    grant select on v_$standby_log to 'synchronization account';  // Grants the privilege to view the v_$standby_log view, which contains information about archived logs of the standby database.
    grant select on v_$ARCHIVE_GAP to 'synchronization account';  // Grants the privilege to query for missing archived logs.
    grant select on sys.ICOL$ to 'synchronization account';  // Grants the privilege to view information about index-organized table columns.
    grant select on V_$THREAD to 'synchronization account';  // Grants the privilege to view instance thread information. The V_$THREAD view is a thread status table in Oracle dynamic performance views. It stores the thread status of each instance in a RAC cluster and is used during incremental parsing to determine the instance status and log source.

    To perform offline full synchronization, also run the following statement to grant the synchronization account the privilege to query all tables.

    grant select any table to 'synchronization account'; 

    For Oracle 12c and later versions, run the following statement to grant log mining privileges. This command is not required for versions earlier than Oracle 12c, as log mining is a built-in feature.

    grant LOGMINING TO 'synchronization account';

Enable logging and switch redo log files

  1. To enable archive logging, run the following SQL statements:

    shutdown immediate;
    startup mount;
    alter database archivelog;
    alter database open;
  2. Enable supplemental logging.

    Enable supplemental logging at the database or table level to meet your business requirements:

    // Enable database-level supplemental logging to capture all data changes for real-time synchronization.
    alter database add supplemental log data(primary key) columns; // Enables supplemental logging for the primary key columns of the database.
    alter database add supplemental log data(unique) columns; // Enables supplemental logging for the unique index columns of the database.
    
    // Alternatively, enable table-level supplemental logging:
    alter schema.table add supplemental log data(primary key) columns; // Enables supplemental logging for the primary key columns of a specified table.
    alter schema.table add supplemental log data(unique) columns; // Enables supplemental logging for the unique index columns of a specified table.
  3. Switch redo log files.

    After enabling supplemental logging, run the following command multiple times (five times is recommended) to switch the redo log files.

    alter system switch logfile;
    Note

    Running this command multiple times ensures that the new supplemental logging settings are applied to all redo log files, which is critical for complete data capture and recovery.

Check the database character encoding

Run the following command to check the character encoding of your database.

select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
  • The v$nls_parameters view stores database parameter settings.

  • NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET are the database character set and national character set, respectively. They define the encoding for character data in Oracle.

Note

DataWorks data synchronization supports only the UTF8, AL32UTF8, AL16UTF16, and ZHS16GBK character encodings. If your database uses an unsupported character encoding, change it before performing data synchronization.

Check table data types

Use a SELECT statement to query the data types of a table. The following example shows how to query the data types for a table named 'tablename'.

select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename'; 
  • COLUMN_NAME: the name of a column in the table.

  • DATA_TYPE: the data type of the corresponding column.

  • all_tab_columns: a view that stores information about all columns in the tables of the database.

  • TABLE_NAME: the name of the table to query. When you run this statement, replace 'tablename' with the name of your table.

You can also execute select * from 'tablename'; to query all information from the target table and obtain the data types.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Oracle data synchronization

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Single-table batch synchronization

Single-table real-time synchronization

See Configure a real-time synchronization task in DataStudio.

Full-database synchronization

See Configure real-time full-database synchronization.

FAQ

  • Recurring task failures for Oracle, PolarDB, and MySQL

  • Data recovery issues during primary-secondary synchronization

    In a disaster recovery setup, a secondary database continuously restores data from the primary database by using binary logs. Due to the time lag inherent in synchronization, especially with network latency, the data on the secondary database can differ significantly from the primary database. This means the synchronized data is not a complete, real-time snapshot.

  • Consistency constraints

    Oracle is a relational database management system (RDBMS) that provides strong consistency for queries. For example, when a synchronization task runs, Oracle Reader reads from a database snapshot. Due to this snapshot isolation, Oracle Reader does not retrieve new data written to the database after the task has started.

    A single-threaded model maintains this consistency. If you configure Oracle Reader to extract data in parallel, strong consistency cannot be guaranteed.

    When Oracle Reader performs sharding based on the splitPk parameter, it launches multiple parallel tasks to complete data synchronization. These tasks are not part of the same read transaction and are separated by time intervals. As a result, the synchronized data is not a complete, consistent snapshot.

    Achieving a consistent snapshot across multiple threads is not technically feasible. The following engineering solutions offer trade-offs. Choose the one that best fits your needs.

    • Use single-threaded synchronization without sharding. This approach ensures consistency but is slower.

    • Ensure the data is static by preventing other writers from accessing it. For example, you can use a table lock or pause synchronization to the secondary database. However, this may impact your online services.

  • Database encoding

    Oracle Reader uses Java Database Connectivity (JDBC) to extract data. JDBC handles various character encodings and performs conversions automatically. Therefore, you do not need to specify an encoding.

  • How incremental data synchronization works

    Oracle Reader uses JDBC SELECT statements to extract data. You can perform incremental data synchronization using a SELECT…WHERE… clause in one of the following ways:

    • For online applications that populate a timestamp column for new, updated, or logical deletes, you can configure the WHERE clause to filter records based on the timestamp of the last synchronization.

    • For append-only data, you can use the WHERE clause to filter records with an auto-increment ID greater than the maximum ID from the previous synchronization.

    If your business logic does not provide a column to distinguish new or modified data, Oracle Reader cannot perform incremental data synchronization and must synchronize the full data instead.

  • SQL security

    The Oracle Reader querySql feature lets you define a custom SELECT statement for data extraction. Oracle Reader does not perform any security validation on the statement you provide in querySql.

Appendix: Script examples and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Script mode configuration. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script example

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "oracle",
            "parameter": {
                "selectedDatabase": "AUTOTEST",
                "indexes": [],
                "datasource": "oracle_test",
                "envType": 0,
                "useSpecialSecret": true,
                "column": [
                    "id"
                ],
                "where": "",
                "splitPk": "id",
                "encoding": "UTF-8",
                "table": "AUTOTEST.table01"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "odps",
            "parameter": {
            },
            "name": "Writer",
            "category": "writer"
        },
        {
            "name": "Processor",
            "stepType": null,
            "category": "processor",
            "copies": 1,
            "parameter": {
                "nodes": [],
                "edges": [],
                "groups": [],
                "version": "2.0"
            }
        }
    ],
    "setting": {
        "executeMode": null,
        "errorLimit": {
            "record": ""
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader parameters

Parameter

Description

Required

Default

datasource

The name of the data source. This name must match the name of the data source that you added in the code editor.

Yes

None

selectedDatabase

The schema of the database from which you want to read data.

Yes

None

table

The name of the table from which you want to read data. The table name must be in the schema.tableName format.

Note

For example, if selectedDatabase is set to AUTOTEST and the table name is table01, set table to AUTOTEST.table01.

Yes

None

column

The columns to read. To read all columns, specify ["*"].

  • You can read a subset of columns (column pruning).

  • You can specify columns in a different order than in the source table schema (column reordering).

  • You can configure constants in JSON format.

    ["id", "1", "'mingya.wmy'", "null", "to_char(a + 1)", "2.3" , "true"]
    • id is a normal column name.

    • 1 is an integer constant.

    • 'mingya.wmy' is a string constant. Note that the value must be enclosed in a pair of single quotation marks.

    • null is a null pointer.

    • to_char(a + 1) is an expression.

    • 2.3 is a floating-point number.

    • true is a Boolean value.

  • The column parameter must be specified and cannot be empty.

Yes

None

splitFactor

The sharding factor, which specifies the number of shards for data synchronization. If you configure parallel processing, the number of shards is calculated by using the following formula: Number of parallel threads × sharding factor. For example, if the number of parallel threads is 5 and the sharding factor is 5, the data is divided into 25 shards and processed by 5 parallel threads.

Note

A value from 1 to 100 is recommended. A value greater than 100 may cause an out-of-memory (OOM) error.

No

5

splitMode

The shard mode. Valid values:

  • averageInterval: average sampling. In this mode, the system finds the minimum and maximum values of the splitPK column and creates even intervals for data sharding.

  • randomSampling: random sampling. In this mode, the system randomly samples the data to determine the boundaries for each shard.

Note

The splitMode parameter must be used with the splitPk parameter.

  • If splitPk is set to a column of a numeric data type, splitMode must be set to averageInterval.

  • If splitPk is set to a column of a string data type, splitMode must be set to randomSampling.

No

randomSampling

splitPk

When Oracle Reader extracts data, if you specify splitPk, it means that you want to use the field represented by splitPk for data sharding. As a result, data synchronization starts parallel tasks, which can improve its efficiency.

  • We recommend that you set the splitPk parameter to the primary key of the table. Primary keys are usually evenly distributed, which helps prevent data hotspots in the shards.

  • The splitPk parameter supports only columns of numeric and string data types. It must be used with the splitMode parameter.

    • If splitPk is set to a column of a numeric data type, splitMode must be set to averageInterval.

    • If splitPk is set to a column of a string data type, splitMode must be set to randomSampling.

  • If you do not specify the splitPk parameter, Data Integration does not shard the table, and the Reader uses a single thread to perform a full table synchronization.

Note

If you are reading data from a view, you cannot set the splitPk parameter to a ROWID column.

No

None

where

The WHERE clause used to filter data. Reader constructs an SQL query based on the column, table, and where parameters to extract data. For example, you can specify row_number() for testing.

  • You can use the where clause to perform incremental synchronization.

  • If this parameter is not specified or is left empty, a full table synchronization is performed.

No

None

querySql (available only in the code editor)

In some scenarios, the where configuration is insufficient to describe the filter conditions. You can use this parameter to define a custom SQL query. When you configure this parameter, the data synchronization system ignores configurations such as table and column and directly uses the content of this parameter to filter the data. For example, to synchronize data from a multi-table join, use a query such as select a,b from table_a join table_b on table_a.id = table_b.id. When you configure querySql, Oracle Reader directly ignores the table, column, and where configurations.

No

None

fetchSize

The number of records to fetch from the database server in each batch. This parameter determines the number of network interactions between Data Integration and the database server and can significantly improve data extraction performance.

Note

If you set this parameter to a value greater than 2,048, an out-of-memory (OOM) error may occur during data synchronization.

No

1,024

Writer script example

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"oracle",// The plugin name.
            "parameter":{
                "postSql":[],// The SQL statement to execute after the task completes.
                "datasource":"",
                "session":[],// The parameters for the database connection session.
                "column":[// The destination columns.
                    "id",
                    "name"
                ],
                "encoding":"UTF-8",// The encoding format.
                "batchSize":1024,// The number of records to submit in a batch.
                "table":"",// The destination table name.
                "preSql":[]// The SQL statement to execute before the task starts.
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The number of error records.
        },
        "speed":{
            "throttle":true,// Enables or disables throttling. If false, the mbps parameter is ignored.
            "concurrent":1, // The maximum number of parallel threads.
            "mbps":"12"// The maximum rate for throttling. Unit: MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer parameters

Parameter

Description

Required

Default

datasource

The name of the data source. This name must match the name of the data source that you added in the code editor.

Yes

None

table

The name of the destination table. If the schema of the table is different from the username specified in the data source configuration, you must specify the table name in the schema.table format.

Yes

None

writeMode

The write mode. Only insert into is supported. If a primary key or unique index conflict occurs, Data Integration does not write the conflicting rows and records them as dirty data.

No

insert into

column

The destination columns to which you want to write data. Specify the columns as a JSON array. For example: "column": ["id","name","age"]. To write data to all columns in order, use an asterisk (*). For example: "column":["*"].

Yes

None

preSql

The SQL statement to execute before the data synchronization task starts. You can execute only one SQL statement in the codeless UI but multiple SQL statements in the code editor. For example, you can specify an SQL statement to clear existing data.

No

None

postSql

The SQL statement to execute after the data synchronization task is complete. You can execute only one SQL statement in the codeless UI but multiple SQL statements in the code editor. For example, you can specify an SQL statement to add a timestamp.

No

None

batchSize

The number of records to write in each batch. This parameter can significantly reduce the number of network interactions between Data Integration and Oracle and improve the overall throughput. An excessively large value may cause an OOM error.

No

1,024