This topic describes how to create a batch synchronization node to export data from MaxCompute to a MySQL database.
Currently, Data Integration can import data from and export data to various data stores, such as RDS, MySQL, SQL Server, PostgreSQL, MaxCompute, Memcache, Distribute Relational Database Service (DRDS), Object Storage Service (OSS), Oracle, FTP, DM, Hadoop Distributed File System (HDFS), and MongoDB.
Add a connection
- Log on to the DataWorks console as the workspace administrator. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Integration in the Actions column.
- On the page that appears, click Connection in the left-side navigation pane. The page appears.
- On the Data Source page, click Add Connection in the upper-right corner.
- In the Add Connection dialog box that appears, select MySQL.
- In the Add MySQL Connection dialog box that appears, set required parameters. In this example, set Connect To
to ApsaraDB for RDS.
Parameter Description Connect To The type of the connection. Set the value to. Connection Name The name of the connection. The name can contain letters, digits, and underscores (_) and must start with a letter. Description The description of the connection. The description cannot exceed 80 characters in length. Applicable Environment The environment in which the connection is used. Valid values: Development and Production.Note This parameter is available only when the workspace is in standard mode. Region The region of the ApsaraDB RDS for MySQL instance. RDS Instance ID The ID of the ApsaraDB RDS for MySQL instance. You can view the ID in the ApsaraDB for RDS console. RDS Instance Account ID The ID of the Alibaba Cloud account used to purchase the ApsaraDB RDS for MySQL instance. You can view your account ID on the Security Settings page after logging on to the Alibaba Cloud console with your Alibaba Cloud account. Database Name The name of the ApsaraDB RDS for MySQL database. Username The username for logging on to the database. Password The password for logging on to the database.
- Click Test Connection.
- If the connectivity test is successful, click Complete.
Verify that a table exists in the destination MySQL database
CREATE TABLE `ODPS_RESULT` ( `education` varchar(255) NULL , `num` int(10) NULL );
After the table is created, run the
desc odps_result; statement to view the table details.
Create and configure a batch synchronization node
This section describes how to create and configure batch synchronization node write_result to export data in the result_table table to your MySQL database. The procedure is as follows:
- Go to the DataStudio page and create a batch synchronization node named write_result.
- Set the insert_data node as the ancestor node of the write_result node.
- In the Source section, set Connection to and Table to result_table.
- In the Target section, set Connection to MySQL > odps_result.
- Configure the mapping between the fields in the source and destination tables.
Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right. You can click Add to add a field, or move the pointer over a field and click the Delete icon to delete the field.
- In the Channel section, configure the synchronization rate limit and dirty data check
Parameter Description Expected Maximum Concurrency The maximum number of concurrent threads to read data from or write data to data stores within the batch synchronization node. You can configure the concurrency for a node on the codeless user interface (UI). Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value. Dirty Data Records Allowed The maximum number of dirty data records allowed. Resource Group The servers on which the batch synchronization node is run. If an excessively large number of nodes run on the default resource group, some nodes may be delayed due to insufficient resources. In this case, we recommend that you purchase exclusive resources for data integration or add a custom resource group. For more information, see DataWorks exclusive resources.
- Preview and save the configuration.
After completing the configuration, scroll up and down to view the node configuration. Verify that the configuration is correct and click .
Commit the batch synchronization node
Return to the workflow after saving the batch synchronization node. Click on the toolbar to commit the batch synchronization node to the scheduling system. The scheduling system automatically runs the node at the scheduled time from the next day based on your settings.
What to do next
Now you have learned how to create a batch synchronization node to export data to a specific data store. You can proceed with the next tutorial. In the next tutorial, you will learn how to configure recurrence and dependencies for a batch synchronization node. For more information, see Configure recurrence and dependencies for a node.