edit-icon download-icon

Configure Oracle reader

Last Updated: Apr 17, 2018

he Oracle Reader plug-in provides the ability to read data from Oracle. At the underlying implementation level, Oracle Reader connects to a remote Oracle database through JDBC and runs the SELECT statements to extract data from the database.

On the public cloud, RDS, or DRDS does not provide the Oracle storage engine. Currently, Oracle Reader is mainly used for private cloud data migration and Data Integration projects.

In short, Oracle Reader connects to a remote Oracle database through the JDBC connector. The SELECT SQL query statements are generated and sent to the remote the remote Oracle database based on your configuration. Then, the SQL statements are run and the returned results are assembled into abstract datasets by using the custom data types of data synchronization. Datasets are passed to the downstream writer for processing.

  • Oracle Reader concatenates the table, column, and WHERE information you configured into SQL statements and sends them to the Oracle database.

  • Oracle directly sends the querySql information that you configured to the Oracle database.

Oracle Reader supports most data types in DB2. Check whether your data type is supported.

Oracle Reader converts Oracle data types as follows.

category Oracle data type
Integer NUMBER, RAWID, INTEGER, INT, SMALLINT
Floating point NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISION, REAL
String LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NCHAR VARYING
Date and time TIMESTAMP, DATE
Boolean bit, bool
Binary BLOB, BFILE, RAW, LONG RAW

Parameter description

Parameter Description Required Default value
datasource Data source name. It must be identical to the data source name added. Adding data source is supported in script mode. Yes None
table The name of the table selected for synchronization. Yes None
column The column name set to be synchronized in the configured table. Field information is described with arrays in JSON. [ * ] indicates all columns by default.
- Column pruning is supported, which means you can select some columns to export.
- Change of column order is supported, which means you can export the columns in an order different from the schema order of the table.
- Constant configuration is supported. You must follow the JSON format, for example, ["id", "1", "'mingya.wmy'", "null", "to_char(a + 1)", "2.3" , "true"], where id refers to the ordinary column name, 1 is an integer constant, ‘mingya.wmy’ is a string constant (note that a pair of single quotation marks must be added), null refers to null pointer, , to_char(a + 1) is an expression, 2.3 is a floating number, and true is a Boolean value.
- Column is required and cannot be blank.
Yes None
splitPk If you specify the splitPk when using the Oracle Reader to extract data, it means that you want to use the fields represented by splitPk for data sharding. Then, the data synchronization system starts concurrent tasks to synchronize data, which greatly improves the efficiency of data synchronization.
- We recommend that splitPk users use the table primary key. The table primary key is usually more uniform, so the split slice is also not easy to show the data hot spot.
- The data types supported by splitPk include the integer, string, floating point, and date.
- If splitPk is left blank, it indicates that no table splitting is required and Oracle Reader synchronizes full data through a single channel.
No Null
where Filtering condition. Oracle Reader concatenates an SQL command based on specified column, table, and where conditions and extracts data according to the SQL. For example, you can set the where condition as row_number() during a test. In actual service scenarios, incremental synchronization typically synchronizes the data generated on the current day. You can specify the where condition as id > 2 and sex = 1.
- The where condition can be effectively used for incremental synchronization.
- If the where condition is not set or is left null, full table data synchronization is applied.
No None
querySql
(advanced configuration, which is not supported in wizard mode)
In some service scenarios, the where condition is insufficient for filtering. In such cases, you can customize a SQL filter using this parameter. When this item is configured, the data synchronization system filters data using this configuration item directly instead of such configuration items as table and column. For example, for data synchronization after multi-table join, use select a,b from table_a join table_b on table_a.id = table_b.id.
When querySql is configured, Oracle Reader directly ignores the configuration of table, column, and where conditions.
No None
fetchSize It defines the pieces of batch data that the plug-in and database server can fetch each time. The value determines the number of network interactions between the data synchronization system and the server, which can greatly improve the data extraction performance.
Note: A value greater than 2048 may lead to OOM for data synchronization.
No 1024

Development in wizard mode

wizardmodeOracle

Note:

When you enter a table name, a less complete table name entry covers a larger scope of fuzzy matches and takes longer time to load matched table names, which may result in timeout. By entering a more complete table name, the timeout issue can be effectively avoided.

  • Data Sources: datasource in the preceding parameter description. Select Oracle.

  • Table: table in the preceding parameter description. Select the table to be synchronized.

  • Data Filtering: where in the preceding parameter description. You can enter a WHERE filtering statement (with WHERE keyword not required) based on the corresponding SQL syntax. This filtering condition can be used as an incremental synchronization criterion.

  • the split button: splitPk in the preceding parameter description. Supports only integer fields. During data reading, the data is split based on the configured fields to achieve concurrent reading, improving data synchronization efficiency. The splitting key configuration item is displayed only when the synchronization task is for importing RDS or Oracle data to MaxCompute.

  • Field Mapping: column in the preceding parameter description.

Notes:

  • You can enter constants. The value must be embraced with a pair of single quotation marks, such as ‘abc’ or ‘123’.
  • Scheduling parameters can be used together, such as ${bdp.system.bizdate}.
  • You can enter functions supported by relational databases, such as now() and count(1).
  • If the value you entered cannot be parsed, the type is displayed as ‘-‘.

Development in script mode

Configure a job to synchronously extract data from an Oracle database.

  1. {
  2. "type": "job",
  3. "traceId": "You can enter a trace ID for your job here. We recommend you to organize your trace ID in the following format: service name + job ID",
  4. "version": "1.0",
  5. "configuration": {
  6. "reader": {
  7. "plugin": "oracle",
  8. "parameter": {
  9. "datasource": "datasourceName",
  10. "table": "tableName",
  11. "column": [
  12. "id",
  13. "name",
  14. "age"
  15. ],
  16. "fetchSize": 1024,
  17. "splitPk": "id",
  18. "where": "id > 100"
  19. }
  20. },
  21. "writer": {
  22. }
  23. }
  24. }

Configure a job to synchronize tasks from a database with custom SQLs to MaxCompute.

  1. {
  2. "type": "job",
  3. "traceId": "You can enter a trace ID for your job here. We recommend you to organize your trace ID in the following format: service name + job ID",
  4. "version": "1.0",
  5. "configuration": {
  6. "reader": {
  7. "plugin": "oracle",
  8. "parameter": {
  9. "datasource": "datasourceName",
  10. "querySql": "SELECT * from tableName",
  11. "fetchSize": 1024
  12. }
  13. },
  14. "writer": {
  15. }
  16. }
  17. }

Supplemental instructions

Data restoration of master/slave synchronization

Master/slave synchronization means that Oracle uses a master/slave disaster recovery mode in which the slave database continuously restores data from the master database through binlog. Due to the time difference in master/slave data synchronization, especially in some special situations such as network latency, the restored data in the slave database after synchronization are significantly different from the data of the master database, that is to say, the data synchronized in the slave database are not a full image of the master database at the current time.

To resolve this issue, the preSql function is provided, whose information is available later.

Consistency constraints

Oracle is an RDBMS system in terms of data storage, which can provide APIs for querying strong consistency data. For example, if another data writer writes data to the database during a synchronization task, Oracle Reader does not get the newly written data because of the snapshot features of the database. For the snapshot features of databases, see MVCC Wikipedia.

These are the features of data synchronization consistency in the single-threaded model of Oracle Reader. Because Oracle Reader uses concurrent data extraction according to your configuration information, robust data consistency cannot be guaranteed. After Oracle Reader completes data sharding based on splitPk, multiple concurrent tasks are successively enabled to synchronize data. Since multiple concurrent tasks do not belong to the same read transaction and time intervals exist between the concurrent tasks, the data is not complete and consistent data snapshot information.

Currently, the consistency snapshot demands in the multi-threaded model cannot be met technically, which can only be solved from the engineering point of view. The engineering approaches have both advantages and disadvantages. The following solutions are provided for your consideration:

  • Use single-threaded synchronization without data sharding. This is slow but can make sure robust data consistency.

  • Close other data writers to make sure the current data is static. For example, you can lock the table or close slave database synchronization. However, such operation may affect online services.

Database encoding

Oracle Reader extracts data using JDBC at the underlying level. JDBC is applicable to all types of encodings and can complete transcoding at the underlying level. Therefore, Oracle Reader can obtain the encoding and complete transcoding automatically without the need to specify the encoding.

Oracle Reader cannot identify the inconsistency between the encoding written to the underlying layer of the Oracle system and the configured encoding, nor provide a solution. Due to this issue, the exported codes may contain garbage codes.

Incremental data synchronization

Since Oracle Reader extracts data using JDBC SELECT statements, you can extract incremental data using the SELECT…WHERE…conditions in either of the following ways:

  • When database online applications write data into the database, the modify field is filled with the modification timestamp, including addition, update, and deletion (logical deletion). For this type of applications, Oracle Reader only requires the WHERE condition followed by the timestamp of the last synchronization phase.
  • For new streamline data, Oracle Reader requires the WHERE condition followed by the maximum auto-increment ID of the last synchronization phase.

In case that no field is provided for the business to identify the addition or modification of data, Oracle Reader cannot perform incremental data synchronization and can only perform full data synchronization.

SQL security

Oracle Reader provides querySql statements for you to SELECT data by yourself. Oracle Reader conducts no security verification on querySql. The security during use is ensured by the data synchronization users.

Thank you! We've received your feedback.