This topic describes how to use the data synchronization feature of DataWorks to migrate data from Oracle to MaxCompute.

Prerequisites

  1. The Elastic Compute Service (ECS) instance where the Oracle database resides is configured.

    In this example, the Oracle database is installed on an ECS instance. The following figure shows the configuration of the ECS instance. To enable communication between the ECS instance and the resource group for running the data synchronization node, you must configure a public IP address for the ECS instance. In addition, you must configure a security group rule for the ECS instance to make sure that the common port 1521 of the Oracle database is accessible. For more information about how to configure a security group rule for the ECS instance, see Modify security group rules.

    As shown in the preceding figure, the type of the ECS instance is ecs.c5.xlarge. The ECS instance resides in a Virtual Private Cloud (VPC) in the China (Hangzhou) region.

  2. MaxCompute is activated and a MaxCompute project is created.

    In this example, a MaxCompute project is created in the China (Hangzhou) region, and relevant DataWorks services are activated. For more information, see Activate MaxCompute.

Background information

In this example, DataWorks Oracle Reader is used to read test data from the Oracle database.

A basic DataWorks workspace is used.

Procedure

  1. Prepare test data.
    1. Prepare test data in the Oracle database.
      1. In the Oracle database, create the DTSTEST.GOOD_SALE table that contains the CREATE_TIME, CATEGORY, BRAND, BUYER_ID, TRANAS_NUM, TRANS_AMOUNT, and CLICK_CNT columns.
      2. Insert test data to the DTSTEST.GOOD_SALE table. In this example, run the following statements to insert test data:
        insert into good_sale values('28-December-19','Kitchenware','Brand B','hanmeimei','6','80.6','4');
        insert into good_sale values('21-December-19','Fresh food','Brand A','lilei','7','440.6','5');
        insert into good_sale values('29-December-19','Clothing','Brand C','lily','12','351.9','9');
        commit;
      3. Run the select * from good_sale; statement to check whether the test data exists in the Oracle table.
    2. Create a MaxCompute table.
      1. Log on to the DataWorks console. Click Workspaces in the left-side navigation pane. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
      2. On the DataStudio page, right-click a workflow and choose Create > MaxCompute > Table.
      3. In the Create Table dialog box, set Table Name to good_sale and click Commit.
      4. On the tab of the created table, click DDL Statement.
      5. In the DDL Statement dialog box that appears, enter the table creation statement, click Generate Table Schema, and click OK in the Confirm dialog box that appears. In this topic, the following statement is used as an example:
        CREATE TABLE `good_sale` (
            `create_time` string,
            `category` string,
            `brand` string,
            `buyer_id` string,
            `trans_num` bigint,
            `trans_amount` double,
            `click_cnt` bigint
        ) ;

        When creating the MaxCompute table, make sure that the data types of the MaxCompute table map those of the Oracle table. For more information about the data types supported by Oracle Reader.

        You can also use the command-line tool odpscmd to create the MaxCompute table. For more information about how to install and configure the odpscmd tool, see Install and configure the odpscmd client.

        Note To enable MaxCompute to support more data types, we recommend that you run the following commands on the odpscmd tool:
        set odps.sql.type.system.odps2=true;
        set odps.sql.hive.compatible=true;
      6. Click Commit to Production Environment for the MaxCompute table to take effect.
  2. Synchronize data from the Oracle table to the MaxCompute table.
    1. Create a connection.

      After you create a DataWorks workspace, the MaxCompute connection odps_first is created by default. Therefore, you only need to create an Oracle connection.

      1. In the DataWorks console, click Workspaces in the left-side navigation pane. On the Workspaces page, find the target workspace and click Data Integration in the Actions column. On the Data Integration page that appears, click Connection in the left-side navigation pane. On the page that appears, click Add Connection.
      2. In the Add Connection dialog box, click Oracle.
      3. Set parameters for the Oracle connection. Make sure that the IP address and port number in the Java Database Connectivity (JDBC) URL are reachable.
      4. Click Test Connection.
      5. After the connectivity test is passed, click Complete.
    2. Configure a data synchronization node.
      1. Go to the DataStudio page. Right-click a workflow and choose Create > Data Integration > Batch Synchronization.
      2. In the Create Node dialog box that appears, enter a name in Node Name and click Commit.
      3. On the tab of the created node, set Connection to the Oracle connection and Table to the Oracle table in the Source section of the Connections step. Click Map Fields with the Same Name in the Mappings step.
      4. Use the default values for other parameters. Then click Run. If a message appears indicating that the node is run successfully, the data is synchronized. If the node fails to run, check logs for troubleshooting.

Result

  1. Click Ad-Hoc Query in the navigation submenu.
  2. Move the pointer over the plus sign (+) in the upper-left corner and choose Create > ODPS SQL.
  3. In the Create Node dialog box that appears, set relevant parameters and click Commit to create an ad hoc query node. On the tab of the ad hoc query node, run the following SQL statement to view data synchronized to the good_sale table:
    -- Check whether data is synchronized to the MaxCompute table.
    select * from good_sale;

    You can also run the select * FROM good_sale; statement on the odpscmd tool to query data synchronized to the good_sale table.