This topic describes how to use DataWorks to collect data to MaxCompute.
Background information
Note
- You can use the data stores that are prepared for you in this workshop. You can also use your own data stores.
- The prepared data stores reside in the China (Shanghai) region. We recommend that you use a workspace in the China (Shanghai) region to make sure that the prepared data stores are accessible when you create connections to these data stores.
Create a connection to an OSS bucket from which you want to read data
Create a connection to an ApsaraDB RDS instance from which you want to read data
Create a workflow
Configure the workshop_start node
Create tables to which you want to write data
Configure batch sync nodes
Note In a workspace in standard mode, we recommend that you do not run batch sync nodes
in the development environment. This means that directly running nodes on their configuration
tabs is not recommended. Instead, we recommend that you deploy the nodes in the production
environment and then run the nodes in test run mode to obtain complete operational
logs.
After the nodes are deployed in the production environment, you can apply for the permissions to read data from and write data to the tables in the development environment.
- Configure the oss_synchronization node.
- Configure the rds_synchronization node.
Commit the workflow
- On the Data Analytics tab, double-click the new workflow. On the workflow configuration tab that appears,
click the
icon in the top toolbar.
- In the Commit dialog box, select the nodes to be committed, enter your comments in the Change description field, and then select Ignore I/O Inconsistency Alerts.
- Click Commit. The Committed successfully message appears.
Run the workflow
Verify data synchronization to MaxCompute
What to do next
You have learned how to collect and synchronize data. You can now proceed with the next tutorial. The next tutorial describes how to compute and analyze collected data. For more information, see Process data.