This topic describes how to export MaxCompute data to other data sources by using
the Data Integration service of DataWorks.
Background information
You can use one of the following methods to export data:
- Use the codeless user interface (UI). After you create a batch synchronization node in the DataWorks console, configure
a source, a destination, and field mappings on the codeless UI to export data.
- Use the code editor. After you create a batch synchronization node in the DataWorks console, switch to
the code editor. Then, write code to configure a source, a destination, and field
mappings to export data.
Limits
Each batch synchronization node can export data from only one table. If you want to
export data from multiple tables, you must create multiple batch synchronization nodes.
Add a MaxCompute data source to DataWorks
- Go to the Data Source page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where your workspace resides. Find your
workspace and click Data Integration in the Actions column.
- On the page that appears, click Connection in the left-side navigation pane. The page appears.
- On the Data Source page, click New data source in the upper-right corner.
- In the Add data source dialog box, click MaxCompute (ODPS) in the Big Data Storage section.
- In the Add MaxCompute (ODPS) data source dialog box, set the parameters as required.

Parameter |
Description |
Data Source Name |
The name of the connection. The name can contain letters, digits, and underscores
(_), and must start with a letter.
|
Description |
The description of the connection. The description can be up to 80 characters in length. |
Applicable environment |
The environment in which the connection is used. Valid values: Development and Production.
Note This parameter is displayed only when the workspace is in standard mode.
|
ODPS Endpoint |
The endpoint of the MaxCompute project. This parameter is read-only, and the value
is automatically obtained from system configurations.
|
Tunnel Endpoint |
The endpoint of the MaxCompute Tunnel service. For more information, see Configure endpoints.
|
ODPS project name |
The name of the MaxCompute project. |
AccessKey ID |
The AccessKey ID of the account that you can use to connect to the MaxCompute project.
You can view the AccessKey ID on the Security Management page.
|
AccessKey Secret |
The AccessKey secret of the account that you can use to connect to the MaxCompute
project.
|
- On the Data Integration tab, click Test connectivity in the Operation column of each resource group.
A sync node uses only one resource group. To ensure that your sync nodes can be properly
run, you must test the connectivity of all the resource groups for Data Integration
on which your sync nodes will be run. If you need to test the connectivity of multiple
resource groups for Data Integration at a time, select the resource groups and click
Batch test connectivity. For more information, see
Select a network connectivity solution.
- After the connection passes the connectivity test, click Complete.
Add the destination to DataWorks
Add the destination to the data source list of DataWorks based on the data source
type. For more information about how to add a data source, see Configure data sources.
Create a workflow
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- After you select the region where the required workspace resides, find the workspace
and click Data Analytics.
- On the DataStudio page, move the pointer over the
icon and select Workflow.
- In the Create Workflow dialog box, set the Workflow Name and Description parameters.
Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits,
underscores (_), and periods (.). It is not case-sensitive.
- Click Create.
Create a batch sync node
- Click the workflow that you created in the previous step to show its content and right-click
Data Integration.
- Choose .
- In the Create Node dialog box, set the Node Name and Location parameters.
Notice The node name must be 1 to 128 characters in length and can contain letters, digits,
underscores (_), and periods (.). It is not case-sensitive.
- Click Commit.
Configure and run the batch synchronization node by using the codeless UI
- Configure the source.
Select
ODPS from the
Connection drop-down list below Source and select the added MaxCompute data source from the
drop-down list on the right side of Connection. Then, select the source table from
the
Table drop-down list. If the table is a partitioned table, specify Partition Key Column.

- Configure the destination.
Select the data source type of the destination from the
Connection drop-down list below Target and select the name of the destination from the drop-down
list on the right side of Connection. Then, select the destination table from the
Table drop-down list.

- Configure field mappings.
Configure mappings between the fields in the source table and the fields in the destination
table.

- Configure channel control.
- Configure scheduling properties.
Configure scheduling properties on the Properties tab to filter data.
- In the top toolbar, click the
icon to save the configurations and click the
icon to run the batch synchronization node.
Configure and run the batch synchronization node by using the code editor
- Import a template.
Select
ODPS from the
Source Connection Type drop-down list and select the added MaxCompute data source from the
Connection drop-down list below Source Connection Type. Select the data source type of the destination
from the
Target Connection Type drop-down list and select the name of the destination from the
Connection drop-down list below Target Connection Type. Then, click OK.

- Configure the source.
Configure the source and the source table.
{
"stepType": "odps",
"parameter": {
"partition": [],
"datasource": "odps_first",
"envType": 0,
"column": [
"*"
],
"table": ""
},
"name": "Reader",
"category": "reader"
},
- stepType: the data source type of the source. Set this parameter to odps.
- partition: the partition information of the source table. You can run the
show partitions <table_name>;
command to view the partition information of the table. For more information, see
View partition information.
- datasource: the name of the MaxCompute data source.
- column: the name of the source column.
- table: the name of the source table. You can run the
show tables;
command to view the table name. For more information, see List tables and views in a project.
- name and category: Set name to Reader and category to reader. This way, the data source is configured as the source.
- Configure the destination.
Configure the destination and the destination table.
{
"stepType":"mysql",
"parameter":{
"partition":"",
"truncate":true,
"datasource":"",
"column":[
"*"
],
"table":""
},
"name":"Writer",
"category":"writer"
}
- stepType: the data source type of the destination.
- partition: the partition information of the destination table.
- datasource: the name of the destination.
- column: the name of the destination column. Make sure that the column has a one-to-one mapping
with the column that is specified in Step 2.
- table: the name of the destination table.
- name and category: Set name to Writer and category to writer. This way, the data source is configured as the destination.
- Configure channel control.
"setting": {
"errorLimit": {
"record": "1024"
},
"speed": {
"throttle": false,
"concurrent": 1
}
},
- record: the maximum number of dirty data records allowed.
- throttle: specifies whether throttling is enabled.
- concurrent: the maximum number of parallel threads that the batch synchronization node uses
to read data from the source or write data to the destination.
- Configure scheduling properties.
- In the top toolbar, click the
icon to save the configurations and click the
icon to run the batch synchronization node.
Check the synchronization results
Go to the destination and check whether the data in the MaxCompute table is exported
to the destination table: