This topic describes how to create a batch synchronization node to export data from MaxCompute to a MySQL database.


An ApsaraDB Relational Database Service (RDS) for MySQL instance is created. The ID of the ApsaraDB RDS for MySQL instance is obtained. A whitelist is configured for the instance in the ApsaraDB for RDS console. For more information, see Create an ApsaraDB RDS MySQL instance.
Note If you use a custom resource group to run the batch synchronization node, you must also add the server IP address of the custom resource group to the whitelist of the ApsaraDB RDS for MySQL instance.

Background information

You can use Data Integration in DataWorks to periodically transfer the business data generated in a business system to a workspace. After the data is computed in SQL nodes, Data Integration periodically exports the computing results to your specified data store for further display or use.Flowchart

Currently, Data Integration can import data from and export data to various data stores, such as RDS, MySQL, SQL Server, PostgreSQL, MaxCompute, Memcache, Distribute Relational Database Service (DRDS), Object Storage Service (OSS), Oracle, FTP, DM, Hadoop Distributed File System (HDFS), and MongoDB.

Add a connection

Note Only the workspace administrator can add connections, and members of other roles can only view the connections.
  1. Log on to the DataWorks console as the workspace administrator. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Integration in the Actions column.
  2. On the page that appears, click Connection in the left-side navigation pane. The Data Source page appears.
  3. On the Data Source page, click Add Connection in the upper-right corner.
  4. In the Add Connection dialog box that appears, select MySQL.
  5. In the Add MySQL Connection dialog box that appears, set required parameters. In this example, set Connect To to ApsaraDB for RDS.
    Parameter Description
    Connect To The type of the connection. Set the value to ApsaraDB for RDS.
    Connection Name The name of the connection. The name can contain letters, digits, and underscores (_) and must start with a letter.
    Description The description of the connection. The description cannot exceed 80 characters in length.
    Applicable Environment The environment in which the connection is used. Valid values: Development and Production.
    Note This parameter is available only when the workspace is in standard mode.
    Region The region of the ApsaraDB RDS for MySQL instance.
    RDS Instance ID The ID of the ApsaraDB RDS for MySQL instance. You can view the ID in the ApsaraDB for RDS console.
    RDS Instance Account ID The ID of the Alibaba Cloud account used to purchase the ApsaraDB RDS for MySQL instance. You can view your account ID on the Security Settings page after logging on to the Alibaba Cloud console with your Alibaba Cloud account.
    Database Name The name of the ApsaraDB RDS for MySQL database.
    Username The username for logging on to the database.
    Password The password for logging on to the database.
  6. Click Test Connection.
  7. If the connectivity test is successful, click Complete.

Verify that a table exists in the destination MySQL database

Use the following table creation statement to create the odps_result table in the MySQL database:
`education`  varchar(255) NULL ,
`num`  int(10) NULL 

After the table is created, run the desc odps_result; statement to view the table details.

Create and configure a batch synchronization node

This section describes how to create and configure batch synchronization node write_result to export data in the result_table table to your MySQL database. The procedure is as follows:

  1. Go to the DataStudio page and create a batch synchronization node named write_result.
  2. Set the insert_data node as the ancestor node of the write_result node.
  3. In the Source section, set Connection to ODPS > odps_first and Table to result_table.
  4. In the Target section, set Connection to MySQL > odps_result.
  5. Configure the mapping between the fields in the source and destination tables.
    Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right. You can click Add to add a field, or move the pointer over a field and click the Delete icon to delete the field.
  6. In the Channel section, configure the synchronization rate limit and dirty data check rules.
    Configure the channel
    Parameter Description
    Expected Maximum Concurrency The maximum number of concurrent threads to read data from or write data to data stores within the batch synchronization node. You can configure the concurrency for a node on the codeless user interface (UI).
    Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
    Dirty Data Records Allowed The maximum number of dirty data records allowed.
    Resource Group The servers on which the batch synchronization node is run. If an excessively large number of nodes run on the default resource group, some nodes may be delayed due to insufficient resources. In this case, we recommend that you purchase exclusive resources for data integration or add a custom resource group. For more information, see DataWorks exclusive resources.
  7. Preview and save the configuration.
    After completing the configuration, scroll up and down to view the node configuration. Verify that the configuration is correct and click Save icon.

Commit the batch synchronization node

Return to the workflow after saving the batch synchronization node. Click Commit icon on the toolbar to commit the batch synchronization node to the scheduling system. The scheduling system automatically runs the node at the scheduled time from the next day based on your settings.

What to do next

Now you have learned how to create a batch synchronization node to export data to a specific data store. You can proceed with the next tutorial. In the next tutorial, you will learn how to configure recurrence and dependencies for a batch synchronization node. For more information, see Configure recurrence and dependencies for a node.