This topic describes how to create a sync node to export data from MaxCompute to a MySQL data store.

Prerequisites

An ApsaraDB RDS for MySQL instance is created. The ID of the ApsaraDB RDS for MySQL instance is obtained. A whitelist is configured for the instance in the ApsaraDB for RDS console. The address information about the resource group on which the sync node to create will be run is added to the whitelist. For more information, see Create an ApsaraDB RDS for MySQL instance.
Note If you use a custom resource group to run the sync node, you must add the IP addresses of the servers in the custom resource group to the whitelist of the ApsaraDB RDS for MySQL instance.

Background information

You can use Data Integration in DataWorks to periodically transfer the business data generated in a business system to a workspace. After the data is computed in SQL nodes, Data Integration periodically exports the computing results to your specified data store for further display or use.Flowchart

Data Integration can import data from and export data to various data stores, such as Relational Database Service (RDS), MySQL, SQL Server, PostgreSQL, MaxCompute, Memcache, Distribute Relational Database Service (DRDS), Object Storage Service (OSS), Oracle, FTP, Dameng, Hadoop Distributed File System (HDFS), and MongoDB. For more information about the data stores, see Supported data stores and plug-ins.

Add a connection

Note Only the workspace administrator can add connections, and members of other roles can only view the connections.
  1. Go to the Data Source page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where the target workspace resides. Find the target workspace and click Data Integration in the Actions column.
    4. On the page that appears, click Connection in the left-side navigation pane. The Data Source page appears.
  2. On the Data Source page, click Add a Connection in the upper-right corner.
  3. In the Add data source dialog box, click MySQL in the Relational Database section.
  4. In the Add MySQL data source dialog box, set the parameters as required.
    In this example, set Data source type to Alibaba Cloud instance mode. Then, set other parameters as described in the following table.Alibaba Cloud instance mode
  5. Click Test Connection in the Actions column of each resource group.
    A sync node only uses one resource group. Therefore, you must test the connectivity of all the resource groups for Data Integration that your sync nodes use to connect to the data store so that sync nodes can run properly. For more information, see Test data store connectivity.
  6. After the connection passes the connectivity test, click Complete.

Verify that a table exists in the destination MySQL data store

Run the following statement to create the odps_result table in the MySQL data store:
CREATE TABLE `ODPS_RESULT` (
`education`  varchar(255) NULL ,
`num`  int(10) NULL 
);

After the table is created, run the desc odps_result; statement to view the table details.

Create and configure a sync node

This section describes how to create and configure a sync node named write_result and use the node to export data in the result_table table to your MySQL data store. To create and configure a sync node, perform the following steps:

  1. Go to the DataStudio page and create a sync node named write_result.
  2. Configure the insert_data node as the parent node of the write_result node.
    Dependency
  3. In the Select data source section, set Data source and Table below Data source to ODPS > odps_first and result_table, respectively.
  4. Set Data source below Data destination to MySQL > odps_result.
  5. In the Filed Mapping section, configure field mappings. Make sure that fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right.
  6. In the Channel control section, configure the maximum transmission rate and dirty data check rules.
    Channel control section
    Parameter Description
    Maximum number of concurrent tasks expected The maximum number of concurrent threads that the batch sync node uses to read data from the source data store and write data to the destination data store. You can configure the concurrency for the sync node on the codeless user interface (UI).
    Synchronization rate Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
    The number of error records exceeds The maximum number of dirty data records allowed.
  7. Preview and save the configuration.
    After you complete the configuration, scroll up and down to view the node configuration. After you confirm that the configuration is correct, click the Save icon icon in the toolbar.

Commit the sync node

Return to the workflow after you save the sync node. Click the Submit icon icon in the toolbar to commit the sync node to the scheduling system. The scheduling system automatically runs the node at the scheduled time from the next day based on your settings.

What to do next

Now you have learned how to create a sync node to export data to a specific data store. You can proceed with the next tutorial. In the next tutorial, you will learn how to configure recurrence and dependencies for a sync node. For more information, see Configure recurrence and dependencies for a node.