This topic describes how to use an exclusive resource group for data integration to migrate data from a user-created MySQL database on an Elastic Compute Service (ECS) instance to MaxCompute.

Prerequisites

  • An ECS instance is purchased and bound to a Virtual Private Cloud (VPC) but not a classic network. A MySQL database is deployed on the ECS instance. An account for connecting to the database is created and the database contains test data. In this topic, use the following SQL statements to create a table for the MySQL database and insert test data to the table:
    CREATE TABLE IF NOT EXISTS good_sale(
       create_time timestamp,
       category varchar(20),
       brand varchar(20),
       buyer_id varchar(20),
       trans_num varchar(20),
       trans_amount DOUBLE,
       click_cnt varchar(20)
       );
    insert into good_sale values('2018-08-21','coat','brandA','lilei',3,500.6,7),
    ('2018-08-22','food','brandB','lilei',1,303,8),
    ('2018-08-22','coat','brandC','hanmeimei',2,510,2),
    ('2018-08-22','bath','brandA','hanmeimei',1,442.5,1),
    ('2018-08-22','food','brandD','hanmeimei',2,234,3),
    ('2018-08-23','coat','brandB','jimmy',9,2000,7),
    ('2018-08-23','food','brandA','jimmy',5,45.1,5),
    ('2018-08-23','coat','brandE','jimmy',5,100.2,4),
    ('2018-08-24','food','brandG','peiqi',10,5560,7),
    ('2018-08-24','bath','brandF','peiqi',1,445.6,2),
    ('2018-08-24','coat','brandA','ray',3,777,3),
    ('2018-08-24','bath','brandG','ray',3,122,3),
    ('2018-08-24','coat','brandC','ray',1,62,7) ;
  • The private IP address, VPC, and VSwitch of your ECS instance are recorded.
  • A security group rule is added for the ECS instance to allow access requests on the port used by the MySQL database. The MySQL database uses port 3306 by default. For more information, see Add security group rules. The name of the security group is recorded.
  • A DataWorks workspace that belongs to the same region as the ECS instance is created. In this example, create a workspace that is in the basic mode and uses MaxCompute as the compute engine. For more information, see Create a workspace.
  • An exclusive resource group for data integration is purchased and bound to the VPC where the ECS instance resides. The exclusive resource group and the ECS instance are in the same zone. For more information, see Use exclusive resource groups for data integration. After the exclusive resource group is bound to the VPC, you can view information about the exclusive resource group on the Resource Groups page.
  • The VPC, VSwitch, and security group of the exclusive resource group are the same as those of the ECS instance. To view the VPC, VSwitch, and security group of the exclusive resource group, follow these steps: Go to the Resource Groups page, find the exclusive resource group on the Exclusive Resource Groups tab, and then click Add VPC Binding in the Actions column. On the page that appears, you can view the VPC, VSwitch, and security group of the exclusive resource group.

Background information

An exclusive resource group can transmit your data in a fast and stable manner. An exclusive resource group for data integration can properly access a data store only when the exclusive resource group belongs to the same zone as the data store and the same region as the corresponding workspace. In this example, the target data store is a user-created MySQL database on an ECS instance.

Procedure

  1. Create a connection to the MySQL database in the DataWorks console.
    1. Log on to the DataWorks console by using your Alibaba Cloud account.
    2. In the left-side navigation pane, click Workspaces. On the Workspaces page that appears, find the target workspace and click Data Integration in the Actions column.
    3. On the page that appears, click Connection in the left-side navigation pane.
    4. On the Data Source page that appears, click Add a Connection in the upper-right corner.
    5. In the Add Connection dialog box that appears, select MySQL in the Relational Database section.
    6. In the Add MySQL Connection dialog box that appears, set the parameters as required.
      In this example, set Connect To to Connection Mode and set JDBC URL to a JDBC URL in the jdbc:mysql://ServerIP:Port/Database format. Replace ServerIP and Port in the format with the private IP address of the ECS instance and the default port number 3306 of the MySQL database, respectively. Enter the username and password that you created for the MySQL database and click Complete.
      Note Currently, DataWorks cannot test the connectivity of a user-created MySQL database in a VPC. Therefore, connectivity testing will fail, which is normal.
  2. Create a MaxCompute table.
    You need to create a MaxCompute table in the DataWorks console to receive test data from the MySQL database.
    1. Click the DataWorks icon in the upper-left corner and choose All Products > DataStudio.
    2. On the page that appears, create a workflow.
    3. Right-click the workflow you just created and choose Create > MaxComputeTable.
    4. In the Create Table dialog box that appears, set Please select an Engine type and Table Name and click Commit. A table is created. In this example, set Table Name to good_sale, which is the same as that of the table in the MySQL database. On the configuration tab of the table, click DDL Statement, enter the table creation statement, and then click Generate Table Schema.
      In this example, enter the following SQL statement. Pay attention to data type conversion.
      CREATE TABLE IF NOT EXISTS good_sale(
         create_time string,
         category STRING,
         brand STRING,
         buyer_id STRING,
         trans_num BIGINT,
         trans_amount DOUBLE,
         click_cnt BIGINT
         );
    5. Set Display Name and click Commit to Production Environment. The MaxCompute table named good_sale is committed.
  3. Configure a batch sync node.
    1. Right-click the workflow you just created and choose Create > Data IntegrationBatch Synchronization to create a batch sync node.
    2. On the configuration tab of the batch sync node, set Connection under Source to the MySQL connection you created and Connection under Target to ODPS odps_first. Click the Switch to Code Editor icon to switch to the code editor.
      If you cannot set Table under Source or an error is returned when you attempt to switch to the code editor, ignore the issue.
    3. Click Resource Group configuration in the right-side navigation pane. In the Resource Group configuration pane that appears, set Programme to Exclusive Resource Groups and select the exclusive resource group you purchased.
      If you do not change the resource group to your exclusive resource group for Data Integration, your sync nodes will fail to be run.
    4. Enter the code of the batch sync node. In this example, enter the following code.
      {
          "type": "job",
          "steps": [
              {
                  "stepType": "mysql",
                  "parameter": {
                      "column": [// The columns in the source table.
                          "create_time",
                          "category",
                          "brand",
                          "buyer_id",
                          "trans_num",
                          "trans_amount",
                          "click_cnt"
                      ],
                      "connection": [
                          {
                              "datasource": "shuai",// The name of the source data store.
                              "table": [
                                  "good_sale"// The name of the table in the source data store. The name must be enclosed in brackets ([ ]).
                              ]
                          }
                      ],
                      "where": "",
                      "splitPk": "",
                      "encoding": "UTF-8"
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "odps",
                  "parameter": {
                      "partition": "",
                      "truncate": true,
                      "datasource": "odps_first",// The name of the destination data store.
                      "column": [// The columns in the destination table.
                          "create_time",
                          "category",
                          "brand",
                          "buyer_id",
                          "trans_num",
                          "trans_amount",
                          "click_cnt"
                      ],
                      "emptyAsNull": false,
                      "table": "good_sale"// The name of the destination table.
                  },
                  "name": "Writer",
                  "category": "writer"
              }
          ],
          "version": "2.0",
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          },
          "setting": {
              "errorLimit": {
                  "record": "0"
              },
              "speed": {
                  "throttle": false,
                  "concurrent": 2
              }
          }
      }
    5. Click the Run icon. You can view the operational logs of the batch sync node on the Runtime Log tab.

Result

To check whether the test data is synchronized to the MaxCompute table, create an ODPS SQL node.On the configuration tab of the ODPS SQL node, enter the statement select * from good_sale; and click the Run icon. If the test data appears, it is synchronized to the MaxCompute table.