Before you synchronize data from Oracle to MaxCompute, you can refer to the operations described in this topic to configure data sources. The configurations of data sources include network environments, whitelists, and permissions. You must configure a source Oracle data source and a destination MaxCompute data source.

Prerequisites

Before you configure data sources, make sure that the following operations are performed:
  • Prepare data sources: An Oracle database and a MaxCompute project are prepared.
  • Plan and prepare resources: An exclusive resource group for Data Integration is purchased and configured. For more information, see Plan and configure resources.
  • Evaluate and plan the network environment: Before you perform data integration, connect data sources to exclusive resource groups for Data Integration based on your business requirements. After data sources and exclusive resource groups for Data Integration are connected, you can refer to the operations in this topic to configure access settings such as vSwitches and whitelists.
    • If data sources and exclusive resource groups for Data Integration reside in the same region and virtual private cloud (VPC), they are automatically connected.
    • If data sources and exclusive resource groups for Data Integration reside in different network environments, you must connect data sources and resource groups by using methods such as a VPN gateway.
  • Prepare the MaxCompute client: The MaxCompute client is installed. You need to use the MaxCompute client to configure attributes for the destination MaxCompute data source. For more information, see MaxCompute client.

Background information

Before you synchronize data from the source to the destination, make sure that the data sources and the exclusive resource group for Data Integration are connected. In addition, you must create an account and authorize the account to access the data sources. The Oracle database must contain only the database versions, character encoding formats, and data types that are supported by Data Integration.
  • Configure whitelists for the data sources
    If the data sources and the exclusive resource group for Data Integration reside in the same VPC, you must add the CIDR block of the vSwitch that is bound to the exclusive resource group for Data Integration during network configuration to the whitelists of the data sources. This ensures that the exclusive resource group for Data Integration can be used to access the data sources. VPC connection
  • Create an account and grant permissions the account

    You must create an account that can be used to access the data sources, read data from the source, and write data to the destination during the data synchronization process.

  • Check whether the database version is supported by real-time sync nodes of Data Integration in DataWorks.

    Data Integration synchronizes data from an Oracle database in real time by using Oracle Logminer to analyze log files. Only data in the following versions of Oracle databases can be synchronized in real time: 10g, 11g, 12c non cdb, 18c non cdb, and 19c non cdb. Data in databases of the 12c cdb, 18c cdb, or 19c cdb version cannot be synchronized in real time. Container database (CDB) is a new feature that is introduced in Oracle 12c and later versions. A CDB can contain multiple pluggable databases (PDBs).

    1. To query the version of the Oracle database, execute one of the following statements:
      • Statement 1:
        select * from v$version;
      • Statement 2:
        select version from v$instance;
    2. If the version of the Oracle database is 12c, 18c, or 19c, you must execute the following statement to check whether the database is a CDB. Data Integration does not support real-time synchronization of data from an Oracle CDB.
      select name,cdb,open_mode,con_id from v$database;
    Note If the database version that you use is not supported by real-time sync nodes of Data Integration, change the database version at the earliest opportunity. Otherwise, the nodes cannot be executed.
  • Configure log permissions
    You must enable database-level archived log files and redo log files, and supplemental logging for the Oracle database to be configured as a source data source.
    • Archived log files: Oracle uses archived log files to save all the redo records. When the database fails, you can use the archived logs to restore the database.
    • Redo log files: Oracle uses redo log files to ensure that database transactions can be re-executed. This way, data can be recovered in the case of a failure such as power outage.
    • Supplemental logging: Supplemental logging is used to supplement the information recorded in redo log files. In Oracle, a redo log file is used to record the values of the fields that are modified. Supplemental logging is used to supplement the change history in the redo log file. This ensures that the redo log file contains complete information that describes data changes. If operations such as data recovery and data synchronization are performed, you can view complete statements and data updates. Some features of the Oracle database can be better implemented after supplemental logging is enabled. Therefore, you must enable supplemental logging for the database.

      For example, if you do not enable supplemental logging, after you execute the UPDATE statement, the redo log file records only the values of the fields that are modified when the UPDATE statement is executed. If you enable supplemental logging, the redo log file records the values of fields before and after a modification. The conditions that are used to modify destination fields are also recorded. When an exception such as power outage occurs in the database, you can recover data based on the modification details.

      We recommend that you enable supplemental logging for primary key columns or unique index columns.
      • After you enable supplemental logging for primary key columns, the columns that compose a primary key are recorded in logs if the database is updated.
      • After you enable supplemental logging for unique index columns, the columns that compose a unique key or bitmap index are recorded in logs if a column is modified.
    Before you use Data Integration to synchronize data from an Oracle database, make sure that you have enabled archived log files and supplemental logging for the database. To check whether database-level archived log files and supplementary logging are enabled for the database, execute the following SQL statement:
    select log_mode, supplemental_log_data_pk, supplemental_log_data_ui from v$database;
    • If the returned result of log_mode is ARCHIVELOG, archived log files are enabled for the database. If the returned result is not ARCHIVELOG, you must enable archived log files for the database.
    • If the returned results of supplemental_log_data_pk and supplemental_log_data_ui are YES, supplemental logging is enabled for the database. If the returned results are FALSE, you must enable supplementary logging for the database.
  • Check character encoding formats

    Make sure that the Oracle database contains only the character encoding formats that are supported by Data Integration to prevent data synchronization failures. The following character encoding formats are supported by Data Integration: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK.

  • Check data types

    Make sure that the Oracle database contains only the data types that are supported by Data Integration to prevent data synchronization failures. The following data types are not supported by Data Integration for real-time data synchronization: LONG, BFILE, LONG RAW, and NCLOB.

Limits

  • You can configure the supplemental logging feature only in a primary Oracle database. Supplemental logging can be enabled for a primary or secondary database.
  • The following character encoding formats are supported by Data Integration: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK.
  • The following data types are not supported by Data Integration for real-time data synchronization: LONG, BFILE, LONG RAW, and NCLOB.
  • Only data in the following versions of Oracle databases can be synchronized in real time: 10g, 11g, 12c non cdb, 18c non cdb, and 19c non cdb. Data in databases of the 12c cdb, 18c cdb, or 19c cdb version cannot be synchronized in real time.

Usage notes

  • Real-time sync nodes of Data Integration can subscribe to online redo log files only in primary Oracle databases. Real-time sync nodes of Data Integration can subscribe to only archived log files in secondary Oracle databases. If you have a high requirement on the timeliness of your real-time sync nodes, we recommend that you subscribe to real-time incremental change logs in primary Oracle databases. If you subscribe to archived log files in a secondary Oracle database, the minimum latency between the generation and the receiving of archived logs is determined by the interval at which Oracle automatically switches between two archived log files. Therefore, the timeliness of real-time sync nodes cannot be ensured.
  • We recommend that you retain the archived log files of your Oracle database for three days. When a large amount of data is written to an Oracle database, the speed of real-time data synchronization may be slower than that of log generation. This provides you sufficient time to trace data when issues occur during data synchronization. You can analyze archive log files to troubleshoot issues and recover data.
  • Real-time sync nodes of Data Integration cannot perform truncate operations on tables without a primary key in an Oracle database. A real-time sync node analyzes logs of a table without a primary key by performing a logminer operation. In this process, the real-time sync node uses rowid to find rows. Truncate operations modify the rowid values in a table. This causes the failure of real-time sync nodes.
  • When you run real-time sync nodes on a DataWorks server with the configurations of 24 vCPUs and 192 GiB memory, the utilization of a single CPU core on an Oracle database server varies based on the status of logs in the database. If most logs are not update logs and the sync nodes can process about 30 to 50 thousand data records every second, the utilization of a single CPU core can be 25% to 35%. If most logs are update logs, the DataWorks server encounters a performance bottleneck during real-time synchronization, and the utilization of a single CPU core on the Oracle database server is only 1% to 5%.

Configure a source Oracle data source

  1. Configure a whitelist for the Oracle database.
    Add the CIDR block of the VPC in which the exclusive resource group resides to the whitelist of the Oracle database.
    1. View and record the elastic IP address (EIP) and CIDR block of the exclusive resource group.
    2. Add the EIP and CIDR block of the exclusive resource group to the whitelist of the Oracle database.
  2. Create an account and grant the required permissions to the account.
    You must create an account to log on to the database. The account must have the required permissions on the Oracle database.
    1. Create an account.
      For more information, see Create an Oracle database account.
    2. Grant permissions to the account.
      You can run the following commands to grant permissions to the account. Replace Account for data synchronization with the created account when you execute the statements.
      grant create session to 'Account for data synchronization';  // Authorize the account to access the database. 
      grant connect to 'Account for data synchronization';  // Authorize the account to connect to the database. 
      grant select on nls_database_parameters to 'Account for data synchronization';  // Authorize the account to query the settings of nls_database_parameters. 
      grant select on all_users to 'Account for data synchronization';  // Authorize the account to query all users in the database. 
      grant select on all_objects to 'Account for data synchronization';  // Authorize the account to query all objects in the database. 
      grant select on DBA_MVIEWS to 'Account for data synchronization';  // Authorize the account to check the materialized view of the database. 
      grant select on DBA_MVIEW_LOGS to 'Account for data synchronization';  // Authorize the account to view the materialized view logs of the database. 
      grant select on DBA_CONSTRAINTS to 'Account for data synchronization';  // Authorize the account to view the constraints on all tables of the database. 
      grant select on DBA_CONS_COLUMNS to 'Account for data synchronization';  // Authorize the account to view information about all columns under specified constraints on all tables of the database. 
      grant select on all_tab_cols to 'Account for data synchronization';  // Authorize the account to view information about columns in tables, views, and clusters of the database. 
      grant select on sys.obj$ to 'Account for data synchronization';  // Authorize the account to view objects in the database. sys.obj$ indicates an object table that is contained in the data dictionary table. The object table contains all objects. 
      grant select on SYS.COL$ to 'Account for data synchronization';  // Authorize the account to view definitions of columns in tables of the database. SYS.COL$ stores column definitions. 
      grant select on sys.USER$ to 'Account for data synchronization';  // Authorize the account to view the system table of the database. sys.USER$ indicates a default user session service. 
      grant select on sys.cdef$ to 'Account for data synchronization';  // Authorize the account to view the system table of the database. 
      grant select on sys.con$ to 'Account for data synchronization';  // Authorize the account to view the constraints of the database. sys.con$ records the constraints. 
      grant select on all_indexes to 'Account for data synchronization';  // Authorize the account to view all indexes of the database. 
      grant select on v_$database to 'Account for data synchronization';  // Authorize the account to check the v_$database view of the database. 
      grant select on V_$ARCHIVE_DEST to 'Account for data synchronization';  // Authorize the account to check the V_$ARCHIVE_DEST view of the database. 
      grant select on v_$log to 'Account for data synchronization';  // Authorize the account to check the v_$log view of the database. v_$log displays log information about control files. 
      grant select on v_$logfile to 'Account for data synchronization';  // Authorize the account to check the v_$logfile view of the database. v_$logfile contains information about redo log files. 
      grant select on v_$archived_log to 'Account for data synchronization';  // Authorize the account to check the v$archived_log view of the database. v$archived_log contains information about archived logs. 
      grant select on V_$LOGMNR_CONTENTS to 'Account for data synchronization';  // Authorize the account to check the V_$LOGMNR_CONTENTS view of the database. 
      grant select on DUAL to 'Account for data synchronization';   // Authorize the account to view the DUAL table of the database. DUAL is a virtual table that contains SELECT syntax rules. In Oracle, only one record is retained in the DUAL table. 
      grant select on v_$parameter to 'Account for data synchronization';  // Authorize the account to check the v_$parameter view of the database. v$parameter is a dynamic dictionary table that stores the values of parameters in the database. 
      grant select any transaction to 'Account for data synchronization';  // Authorize the account to view transactions of the database. 
      grant execute on SYS.DBMS_LOGMNR to 'Account for data synchronization';  // Authorize the account to use the LOGMNR tool. The LOGMNR tool helps you analyze transactions and retrieve lost data. 
      grant alter session to 'Account for data synchronization';  // Authorize the account to modify connection configurations of the database. 
      grant select on dba_objects to 'Account for data synchronization';  // Authorize the account to view all objects of the database. 
      grant select on v_$standby_log to 'Account for data synchronization';  // Authorize the account to check the v_$standby_log view of the database. v_$standby_log contains archived logs of the secondary database. 
      grant select on v_$ARCHIVE_GAP to 'Account for data synchronization';  // Authorize the account to query missing archived logs. 
      To synchronize full data in batch mode, you must also run the following command to grant the query permission on all tables to the account:
      grant select any table to 'Account for data synchronization'; 
      In Oracle 12c or a later version, you must run the following command to grant the log mining permission to the account. The log mining feature is built in Oracle versions earlier than 12c. You do not need to run the command in these versions.
      grant LOGMINING TO 'Account for data synchronization';
  3. Enable archived log files and supplemental logging, and switch a redo log file.
    Log on to the primary database and perform the following steps:
    1. Enable archived log files. SQL statements:
      shutdown immediate;
      startup mount;
      alter database archivelog;
      alter database open;
    2. Enable supplemental logging.

      SQL statements:

      alter database add supplemental log data(primary key) columns; // Enable supplemental logging for primary key columns. 
      alter database add supplemental log data(unique) columns; // Enable supplemental logging for unique index columns. 
    3. Switch a redo log file.
      After you enable supplemental logging, you must run the following command multiple times to switch a redo log file. We recommend that you run the following command for five times:
      alter system switch logfile;
      Note This ensures that data can be written to the next log file after the current log file is full. Data about historical operations will not be lost. This facilitates data recovery.
  4. Check character encoding formats of the database.
    Run the following command to check character encoding formats of the database:
    select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
    • v$nls_parameters stores values of parameters in the database.
    • NLS_CHARACTERSET indicates a database character set. NLS_NCHAR_CHARACTERSET indicates a national character set. These two sets are used to store data of the character type.
    The following character encoding formats are supported by Data Integration: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK. If the database contains the character encoding formats that are not supported by Data Integration, change the formats before you synchronize data.
  5. Check the data types of tables in the database.
    You can execute the SELECT statement to query the data types of tables in the database. Sample statement that is executed to query the data types of the 'tablename' table:
    select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename'; 
    • COLUMN_NAME: the name of the column.
    • DATA_TYPE: the data type of the column.
    • all_tab_columns: the view that stores information about all columns in tables of the database.
    • TABLE_NAME: the name of the table to query. When you execute the preceding statement, replace 'tablename' with the name of the table to query.
    You can also execute the select * from 'tablename'; statement to query the information about the table and obtain the data types of columns.
    The following data types are not supported by Data Integration for real-time data synchronization: LONG, BFILE, LONG RAW, and NCLOB. If a table contains one of these data types, remove the table from the real-time sync solution or change the data type before you synchronize data.

Configure the destination MaxCompute data source

  1. Log on to the MaxCompute client by using the account of a project owner.
    For more information, see MaxCompute client.
  2. Enable the atomicity, consistency, isolation, durability (ACID) property for the MaxCompute project.
    Run the following command on the MaxCompute client:
    setproject odps.sql.acid.table.enable=true;
  3. Optional:Enable the MaxCompute V2.0 data type edition.
    If you need to use the TIMESTAMP data type in MaxCompute V2.0, run the following command to enable the MaxCompute V2.0 data type edition:
    setproject odps.sql.type.system.odps2=true;
  4. Create an Alibaba Cloud account.
    This account is used to add a data source and access MaxCompute for data synchronization. For more information about how to create an Alibaba Cloud account, see Create an Alibaba Cloud account.

    After the Alibaba Cloud account is created, you can record the AccessKey ID and AccessKey secret of the account for future use.

What to do next

After data sources are configured, the source data source, destination data source, and exclusive resource group for data integration are connected. Then, the exclusive resource group for data integration can be used to access data sources. You can add the source data source and destination data source to DataWorks, and associate them with a data synchronization solution when you create the solution.

For more information about how to add a data source, see Add a data source.