This topic describes how to create a synchronization node to export data from MaxCompute to a MySQL data source.

Prerequisites

  • An ApsaraDB RDS for MySQL instance is created. The ID of the ApsaraDB RDS for MySQL instance is obtained. A whitelist is configured for the instance in the ApsaraDB for RDS console. The address information about the resource group on which the synchronization node to create will be run is added to the whitelist. For more information, see Create an ApsaraDB RDS for MySQL instance.
    Note If you use a custom resource group to run the synchronization node, you must add the IP addresses of the servers in the custom resource group to the whitelist of the ApsaraDB RDS for MySQL instance.
  • The odps_result table is created in the ApsaraDB RDS for MySQL database to which the data is synchronized. You can create the table by executing the following statement:
    CREATE TABLE `ODPS_RESULT` (
    `education`  varchar(255) NULL ,
    `num`  int(10) NULL 
    );
    After the table is created, execute the desc odps_result; statement to view the table details.

Background information

You can use Data Integration to periodically synchronize the business data generated in a business system to a DataWorks workspace. You can create SQL nodes to compute the data and use Data Integration to periodically synchronize the computing results to your specified data source for further display or use. Workflow

Data Integration can import data from and export data to various data sources, such as ApsaraDB RDS, MySQL, SQL Server, PostgreSQL, MaxCompute, ApsaraDB for Memcache, Distribute Relational Database Service (DRDS), Object Storage Service (OSS), Oracle, FTP, Dameng, Hadoop Distributed File System (HDFS), and MongoDB. For more information about the data sources, see Supported data sources, readers, and writers.

Add a data source

Note Only the workspace administrator can add data sources. Members of other roles can only view data sources.
  1. Go to the Data Source page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region where the required workspace resides, find the workspace and click Data Integration in the Actions column.
    4. In the left-side navigation pane of the Data Integration page, choose Data Source > Data Sources to go to the Data Source page.
  2. On the Data Source page, click Add data source in the upper-right corner.
  3. In the Add data source dialog box, click MySQL in the Relational Database section.
  4. In the Add MySQL data source dialog box, set the parameters as required.
    In this example, set the Data source type parameter to Alibaba Cloud instance mode. Then, set other parameters as described in the following table. Alibaba Cloud instance mode
    Parameter Description
    Data Source Type The mode in which the data source is added. Set this parameter to Alibaba Cloud Instance Mode.
    Data Source Name The name of the data source. The name can contain letters, digits, and underscores (_) and must start with a letter.
    Data Source Description The description of the data source. The description can be a maximum of 80 characters in length.
    Environment The environment in which the data source is used. Valid values: Development and Production.
    Note This parameter is displayed only when the workspace is in standard mode.
    Region The region where the data source resides.
    RDS Instance ID The ID of the ApsaraDB RDS for MySQL instance. You can log on to the ApsaraDB RDS console to obtain the ID.
    RDS Instance Account ID The ID of the Alibaba Cloud account that is used to purchase the ApsaraDB RDS for MySQL instance. You can use the Alibaba Cloud account to log on to the DataWorks console, move the pointer over the profile picture in the upper-right corner, and then view the ID of the account.
    Default Database Name
    The name of the default ApsaraDB RDS for MySQL database. The following descriptions provide instructions for you to configure a data synchronization node or solution that uses a MySQL data source:
    • When you configure a database-level real-time or batch data synchronization node or solution that uses a MySQL data source, you can select one or more databases on which you have access permissions in the ApsaraDB for MySQL instance.
    • If you select multiple databases when you configure a batch synchronization node, you must add a data source for each database.
    Username The username used to log on to the ApsaraDB RDS for MySQL database.
    Password The password that is used to log on to the ApsaraDB RDS for MySQL database. Do not use at signs (@) in the password.
  5. Test the connectivity between the data source and resource groups.
    Click the Data Integration tab next to the Resource Group connectivity parameter, find the resource group for Data Integration that you want to use, and then click Test connectivity in the Actions column. Click the Schedule tab and perform the same operations to test the connectivity between the data source and the resource group for scheduling that you want to use. If the connectivity status is Connected, the resource groups are connected to the data source.
    Note
    • A synchronization node uses only one resource group of a specific type.
    • To ensure that your synchronization nodes can be run as expected, you must test the connectivity between the data source and all types of resource groups on which your synchronization nodes will be run.
    • For more information, see Select a network connectivity solution.
    Testing connectivity
  6. After the data source passes the connectivity test, click Complete.

Create and configure a synchronization node

This section describes how to create and configure a synchronization node named write_result and use the node to synchronize data in the result_table table to your MySQL data source. Perform the following operations:

  1. Go to the DataStudio page and create a batch synchronization node named write_result.
     create a batch synchronization node
  2. On the DataStudio page, configure the insert_data node as the ancestor node of the write_result node.
    Dependency
  3. On the configuration page of the batch synchronization node, select ODPS from the Connection drop-down list, select the odps_first database from the next drop-down list, and then select result_table from the Table drop-down list. result_table is the source table from which the data is synchronized.
  4. Select the odps_result table that you create in the ApsaraDB RDS for MySQL database as the destination table.
  5. In the Mappings section, configure field mappings. Make sure that fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right.
  6. In the Channel section, configure the maximum transmission rate and dirty data check rules.
    After you complete the preceding steps, you can configure channel control policies for the synchronization node. Channel section
    Parameter Description
    Expected Maximum Concurrency The maximum number of parallel threads that the synchronization node uses to read data from the source or write data to the destination. You can configure the parallelism for the synchronization node on the codeless UI.
    Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and specify a maximum transmission rate to prevent heavy read workloads on the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to an appropriate value based on the configurations of the source.
    Dirty Data Records Allowed The maximum number of dirty data records allowed.
    Distributed Execution The distributed execution mode. In distributed mode, your node can be sliced to multiple ECS instances for parallel execution. This speeds up synchronization. If a large number of synchronization nodes are run in parallel, excessive access requests are sent to the data source. Evaluate the load on the data source before you use this mode. You can use this mode only when you use exclusive resource groups for Data Integration.
  7. Preview and save the configuration.
    After the node is configured, you can scroll up and down to view the node configuration. After you confirm that the configuration is correct, click the Save icon in the top toolbar.

Commit the synchronization node

Return to the workflow after you save the synchronization node. Click the submit icon in the top toolbar to commit the synchronization node to the scheduling system. The scheduling system automatically runs the node at the scheduled time from the next day based on your settings.

What to do next

Now you have learned how to create a synchronization node to export data to a specific data store. You can proceed with the next tutorial. In the next tutorial, you will learn how to configure properties and dependencies for a synchronization node. For more information, see Configure recurrence and dependencies for a node.