This topic describes how to use an exclusive resource group for Data Integration to migrate data from a user-created MySQL database on an Elastic Compute Service (ECS) instance to MaxCompute.

Prerequisites

  • An ECS instance is purchased and bound to a virtual private cloud (VPC) but not the classic network. A MySQL database that stores test data is deployed on the ECS instance. An account used to connect to the database is created. In this example, use the following statements to create a table in the MySQL database and insert test data to the table:
    CREATE TABLE IF NOT EXISTS good_sale(
       create_time timestamp,
       category varchar(20),
       brand varchar(20),
       buyer_id varchar(20),
       trans_num varchar(20),
       trans_amount DOUBLE,
       click_cnt varchar(20)
       );
    insert into good_sale values('2018-08-21','coat','brandA','lilei',3,500.6,7),
    ('2018-08-22','food','brandB','lilei',1,303,8),
    ('2018-08-22','coat','brandC','hanmeimei',2,510,2),
    ('2018-08-22','bath','brandA','hanmeimei',1,442.5,1),
    ('2018-08-22','food','brandD','hanmeimei',2,234,3),
    ('2018-08-23','coat','brandB','jimmy',9,2000,7),
    ('2018-08-23','food','brandA','jimmy',5,45.1,5),
    ('2018-08-23','coat','brandE','jimmy',5,100.2,4),
    ('2018-08-24','food','brandG','peiqi',10,5560,7),
    ('2018-08-24','bath','brandF','peiqi',1,445.6,2),
    ('2018-08-24','coat','brandA','ray',3,777,3),
    ('2018-08-24','bath','brandG','ray',3,122,3),
    ('2018-08-24','coat','brandC','ray',1,62,7) ;
  • The private IP address, VPC, and vSwitch of your ECS instance are noted.
  • A security group rule is added for the ECS instance to allow access requests on the port used by the MySQL database. By default, the MySQL database uses port 3306. For more information, see Add a security group rule. The name of the security group is noted.
  • A DataWorks workspace is created. In this example, create a DataWorks workspace that is in basic mode and uses a MaxCompute compute engine. Make sure that the created DataWorks workspace belongs to the same region as the ECS instance. For more information about how to create a workspace, see Create a workspace.
  • An exclusive resource group for Data Integration is purchased and bound to the VPC where the ECS instance resides. The exclusive resource group and the ECS instance are in the same zone. For more information, see Create and use an exclusive resource group for Data Integration. After the exclusive resource group is bound to the VPC, you can view information about the exclusive resource group on the Resource Groups page.
  • Check Network Settings, vSwitch, and security group of the exclusive resource group are the same as those of the ECS instance.

Background information

An exclusive resource group can transmit your data in a fast and stable manner. Make sure that the exclusive resource group for Data Integration belongs to the same zone in the same region as the data store that needs to be accessed. Make sure that the exclusive resource group for Data Integration belongs to the same region as the DataWorks workspace. In this example, the data store that needs to be accessed is a user-created MySQL database on an ECS instance.

Procedure

  1. Create a connection to the MySQL database in the DataWorks console.
    1. Log on to the DataWorks console by using your Alibaba Cloud account.
    2. On the Workspaces page, find the required workspace and click Data Integration.
    3. In the left-side navigation pane, click Connection.
    4. On the Data Source page, click New data source.
    5. In the Add data source dialog box, select MySQL.
    6. In the Add MySQL data source dialog box, set the parameters. For more information, see Add a MySQL data source.
      For example, set the Data source type parameter to Connection string mode. Use the private IP address of the ECS instance and the default port number 3306 of the MySQL database when you specify the Java Database Connectivity (JDBC) URL.Connection string mode
      Note DataWorks cannot test the connectivity of a user-created MySQL database in a VPC. Therefore, it is normal that a connectivity test fails.
    7. Find the required resource group and click Test connectivity.
      During data synchronization, a sync node uses only one resource group. You must test the connectivity of all the resource groups for Data Integration on which your sync nodes will be run and make sure that the resource groups can connect to the data store. This ensures that your sync nodes can be run as expected. For more information, see Establish a network connection between a resource group and a data source.
    8. After the connection passes the connectivity test, click Complete.
  2. Create a MaxCompute table.
    You must create a table in DataWorks to receive test data from the MySQL database.
    1. Click the Icon icon in the upper-left corner and choose All Products > DataStudio.
    2. Create a workflow. For more information, see Create an auto triggered workflow.
    3. Right-click the created workflow and choose Create Table > MaxCompute > Table.
    4. Enter a name for your MaxCompute table. In this example, set the Table Name parameter to good_sale, which is the same as the name of the table in the MySQL database. Click DDL, enter the table creation statement, and then click Generate Table Schema.
      In this example, enter the following table creation statement. Pay attention to data type conversion.
      CREATE TABLE IF NOT EXISTS good_sale(
         create_time string,
         category STRING,
         brand STRING,
         buyer_id STRING,
         trans_num BIGINT,
         trans_amount DOUBLE,
         click_cnt BIGINT
         );
    5. Set the Display Name parameter and click Commit to Production Environment. The MaxCompute table named good_sale is created.
  3. Configure a data integration node.
    1. Right-click the workflow you just created and choose Create Node > Offline synchronization to create a data integration node.
    2. Set the Connection parameter under Source to the created MySQL connection and the Connection parameter under Target to odps_first. Click the Switch to Code Editor icon to switch to the code editor.
      If you cannot set the Table parameter under Source or an error is returned when you attempt to switch to the code editor, ignore the issue.
    3. Click the Resource Group configuration tab in the right-side navigation pane and select an exclusive resource group that you have purchased.
      If you do not select the exclusive resource group as the resource group for Data Integration of your node, the node may fail to be run.
    4. Enter the following code for the data integration node:
      {
          "type": "job",
          "steps": [
              {
                  "stepType": "mysql",
                  "parameter": {
                      "column": [// The columns in the source table.
                          "create_time",
                          "category",
                          "brand",
                          "buyer_id",
                          "trans_num",
                          "trans_amount",
                          "click_cnt"
                      ],
                      "connection": [
                          {
                              "datasource": "shuai",// The source connection.
                              "table": [
                                  "good_sale"// The name of the table in the source database. The name must be enclosed in brackets [].
                              ]
                          }
                      ],
                      "where": "",
                      "splitPk": "",
                      "encoding": "UTF-8"
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "odps",
                  "parameter": {
                      "partition": "",
                      "truncate": true,
                      "datasource": "odps_first",// The destination connection.
                      "column": [// The columns in the destination table.
                          "create_time",
                          "category",
                          "brand",
                          "buyer_id",
                          "trans_num",
                          "trans_amount",
                          "click_cnt"
                      ],
                      "emptyAsNull": false,
                      "table": "good_sale"// The name of the destination table.
                  },
                  "name": "Writer",
                  "category": "writer"
              }
          ],
          "version": "2.0",
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          },
          "setting": {
              "errorLimit": {
                  "record": "0"
              },
              "speed": {
                  "throttle": false,
                  "concurrent": 2
              }
          }
      }
    5. Click the Run icon. You can view the Runtime Log tab in the lower part of the page to check whether the test data is synchronized to MaxCompute.

Result

To query data in the MaxCompute table, create an ODPS SQL node.Enter the statement select * from good_sale ;, and click the Run icon. If the test data appears, it is synchronized to the MaxCompute table.