This topic describes the data types and parameters supported by Oracle Reader and how to configure it by using the codeless user interface (UI) and code editor.

Oracle Reader allows you to read data from Oracle. Oracle Reader connects to a remote Oracle database and runs a SELECT statement to select and read data from the database.
Note Currently, Relational Database Service (RDS) and Distributed Relational Database Service (DRDS) do not support the Oracle storage engine.
Specifically, Oracle Reader connects to a remote Oracle database through Java Database Connectivity (JDBC), generates a SELECT statement based on your configurations, and then sends the statement to the database. The Oracle database runs the statement and returns the result. Then, Oracle Reader assembles the returned data to abstract datasets in custom data types supported by Data Integration, and sends the datasets to a writer.
  • Oracle Reader generates the SELECT statement based on the table, column, and where parameters that you have configured, and sends the generated SELECT statement to the Oracle database.
  • If you specify the querySql parameter, Oracle Reader directly sends the value of this parameter to the Oracle database.

Data types

Oracle Reader supports most Oracle data types. Make sure that your data types are supported.

The following table lists the data types supported by Oracle Reader.
Category Oracle data type
Integer NUMBER, ROWID, INTEGER, INT, and SMALLINT
Floating point NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISION, and REAL
String LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING
Date and time TIMESTAMP and DATE
Boolean BIT and BOOLEAN
Binary BLOB, BFILE, RAW, and LONG RAW

Parameters

Parameter Description Required Default value
datasource The connection name. It must be identical to the name of the added connection. You can add connections in the code editor. Yes None
table The name of the table to be synchronized. Yes None
column The columns to be synchronized from the source table. The columns are described in a JSON array. The default value is [ * ], which indicates all columns.
  • Column pruning is supported. You can select and export specific columns.
  • Change of the column order is supported, which means that you can export the columns in an order different from that specified in the schema of the table.
  • Constants are supported. The column names must be arranged in JSON format.
    ["id", "1", "'mingya.wmy'", "null", "to_char(a + 1)", "2.3" , "true"]
    • id: a column name.
    • 1: an integer constant.
    • 'mingya.wmy': a string constant, which is enclosed in single quotation marks (' ').
    • null: a null pointer.
    • to_char(a + 1): a function expression.
    • 2.3: a floating-point constant.
    • true: a Boolean value.
  • The column parameter must be specified.
Yes None
splitPk The field used for data sharding when Oracle Reader extracts data. If you specify the splitPk parameter, the table is sharded based on the shard key specified by this parameter. Data Integration then runs concurrent threads to synchronize data. This improves efficiency.
  • We recommend that you set the splitPk parameter to the primary key of the table. Based on the primary key, data can be well distributed to different shards, but not intensively distributed to certain shards.
  • The data types supported by the splitPk parameter include INTEGER, STRING, FLOAT, and DATE.
  • If you do not specify the splitPk parameter or leave it empty, Oracle Reader synchronizes data through a single thread.
No None
where The WHERE clause. Oracle Reader generates a SELECT statement based on the table, column, and where parameters that you have configured, and uses the generated SELECT statement to select and read data. For example, set this parameter to row_number().
  • You can use the WHERE clause to synchronize incremental data.
  • If you do not specify the where parameter or leave it empty, all data is synchronized.
No None
querySql (only available in the code editor) The SELECT statement used for refined data filtering. If you specify this parameter, Data Integration directly filters data based on this parameter. For example, if you want to join multiple tables for data synchronization, set this parameter to select a,b from table_a join table_b on table_a.id = table_b.id. If you specify the querySql parameter, Oracle Reader ignores the table, column, and where parameters that you have configured. No None
fetchSize The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects reading efficiency.
Note A value greater than 2048 may lead to out of memory (OOM) during the data synchronization process.
No 1024

Configure Oracle Reader by using the codeless UI

  1. Configure the connections.
    Configure the source and destination connections for the sync node.Connections
    Parameter Description
    Connection The datasource parameter in the preceding parameter description. Select a connection type, and enter the name of a connection that has been configured in DataWorks.
    Table The table parameter in the preceding parameter description.
    Filter The filter condition for the data to be synchronized. Currently, filtering based on the limit keyword is not supported. The SQL syntax is determined by the selected connection.
    Shard Key The shard key. You can use a column in the source table as the shard key. We recommend that you use the primary key or an indexed column. Only integer fields are supported.
    If data sharding is performed based on the configured shard key, data can be read concurrently to improve data synchronization efficiency.
    Note The Shard Key parameter appears only when you configure the source connection for a sync node.
  2. Configure field mapping, that is, the column parameter in the preceding parameter description.
    Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right. You can click Add to add a field or move the pointer over a field and click the Delete icon to delete the field.Mappings
    Parameter Description
    Map Fields with the Same Name Click Map Fields with the Same Name to establish a mapping between fields with the same name. Note that the data types of the fields must match.
    Fields in the Same Line Click Map Fields in the Same Line to establish a mapping for fields in the same row. Note that the data types of the fields must match.
    Delete All Mappings Click Delete All Mappings to remove mappings that have been established.
    Auto Layout Click Auto Layout. The fields are automatically sorted based on specified rules.
    Change Fields Click the Change Fields icon. In the Change Fields dialog box that appears, you can manually edit fields in the source table. Each field occupies a row. The first and the last blank rows are included, whereas other blank rows are ignored.
    Add
    • Click Add to add a field. You can enter constants. Each constant must be enclosed in single quotation marks (' '), such as 'abc' and '123'.
    • You can use scheduling parameters, such as ${bizdate}.
    • You can enter functions supported by relational databases, such as now() and count(1).
    • Fields that cannot be parsed are indicated by Unidentified.
  3. Configure channel control policies.Channel
    Parameter Description
    Expected Maximum Concurrency The maximum number of concurrent threads to read and write data to data storage within the sync node. You can configure the concurrency for a node on the codeless UI.
    Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
    Dirty Data Records Allowed The maximum number of dirty data records allowed.
    Resource Group The resource group used for running the sync node. If a large number of nodes including this sync node are deployed on the default resource group, the sync node may need to wait for resources. We recommend that you purchase an exclusive resource group for data integration or add a custom resource group. For more information, see DataWorks exclusive resources and Add a custom resource group.

Configure Oracle Reader by using the code editor

In the following code, a node is configured to read data from an Oracle database.
{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"oracle",
            "parameter":{
                "fetchSize":1024, // The number of data records to read at a time.
                "datasource":"", // The connection name.
                "column":[ // The columns to be synchronized.
                    "id",
                    "name"
                ],
                "where":"", // The WHERE clause.
                "splitPk":"", // The shard key.
                "table":"" // The name of the table to be synchronized.
            },
            "name":"Reader",
            "category":"reader"
        },
        {// The following template is used to configure Stream Writer. For more information, see the corresponding topic.
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1,// The maximum number of concurrent threads.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
} "to":"Writer"
            }
        ]
    }
}

Additional instructions

  • Data synchronization between primary and secondary databases

    A secondary Oracle database can be deployed for disaster recovery. The secondary database continuously synchronizes data from the primary database based on binlogs. Especially when network conditions are unfavorable, data latency between the primary and secondary databases is unavoidable, which can lead to data inconsistency.

  • Concurrency control

    Oracle is a relational database management system (RDBMS), which supports strong consistency for data queries. A database snapshot is created before a sync node starts. Oracle Reader reads data from the database snapshot. Therefore, if new data is written to the database during data synchronization, Oracle Reader cannot obtain the new data.

    Data consistency cannot be guaranteed when you enable Oracle Reader to run concurrent threads on a single sync node.

    Oracle Reader shards the table based on the splitPk parameter and runs multiple concurrent threads to synchronize data. These concurrent threads belong to different transactions. They read data at different time points. This means that the concurrent threads observe different snapshots.

    Theoretically, the data inconsistency issue is unavoidable if a single sync node includes multiple threads. However, two workarounds are available:
    • Do not enable concurrent threads on a single sync node. Essentially, do not specify the splitPk parameter. In this way, data consistency is guaranteed although data is synchronized at a low efficiency.
    • Disable writers to make sure that the data is unchanged during data synchronization. For example, lock the table and disable data synchronization between primary and secondary databases. In this way, data is synchronized efficiently but your ongoing services may be interrupted.
  • Character encoding

    Oracle Reader uses JDBC, which can automatically convert the encoding of characters. Therefore, you do not need to specify the encoding format.

  • Incremental data synchronization
    Oracle Reader connects to a database through JDBC and uses a SELECT statement with a WHERE clause to read incremental data in the following ways:
    • For data in batches, incremental add, update, and delete operations (including logical delete operations) are distinguished by timestamps. Specify the WHERE clause based on the timestamp. The timestamp must be later than the latest timestamp in the last synchronization.
    • For streaming data, specify the WHERE clause based on the data record ID. The data record ID must be larger than the maximum ID involved in the last synchronization.

    If incremental data cannot be distinguished, Oracle Reader cannot perform incremental synchronization but can perform full synchronization only.

  • Syntax validation

    Oracle Reader allows you to specify custom SELECT statements by using the querySql parameter but does not verify the syntax of the custom SELECT statements.