You can export data from MaxCompute to other data sources in offline mode by using the Data Integration service of DataWorks. Then, you can process the exported data. This topic describes how to export data from MaxCompute to other data sources by using the Data Integration service of DataWorks.
Background information
You can use one of the following methods to export data:
Use the codeless user interface (UI). After you create a batch synchronization node in the DataWorks console, configure a source, a destination, and field mappings on the codeless UI to export data.
Use the code editor. After you create a batch synchronization node in the DataWorks console, switch to the code editor. Then, write code to configure a source, a destination, and field mappings to export data.
Prerequisites
Make sure that the following requirements are met:
The MaxCompute table from which you want to export data is prepared.
For more information about how to create tables and write data to tables, see Table operations and Insert or update data into a table or a static partition (INSERT INTO and INSERT OVERWRITE).
The destination and destination table are prepared.
Limits
Each batch synchronization node can export data from only one table. If you want to export data from multiple tables, you must create multiple batch synchronization nodes.
Procedure
Perform the following steps to export MaxCompute data by using Data Integration:
Add a MaxCompute data source to DataWorks.
Add the destination to DataWorks
Add the destination to DataWorks.
Create a workflow in the DataWorks console. The workflow is required when you create a batch synchronization node.
Create a batch synchronization node
Create a batch synchronization node based on the created workflow.
Configure and run the batch synchronization node by using the codeless UI or configure and run the batch synchronization node by using the code editor
Configure and run the batch synchronization node by using the codeless UI or code editor.
Check the synchronization results
Check the synchronization results on the destination.
Add a MaxCompute data source to DataWorks
For more information, see Add a MaxCompute data source of the new version.
Add the destination to DataWorks
Add the destination to the data source list of DataWorks based on the data source type. For more information about how to add a data source, see Add a data source.
Create a workflow
Create a workflow in the DataWorks console. The workflow is required when you create a batch synchronization node.
Log on to the DataWorks console. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
On the DataStudio page, move the pointer over the icon and select Create Workflow.
In the Create Workflow dialog box, configure Workflow Name and Description.
Click Create.
Create a batch synchronization node
Create a batch synchronization node based on the created workflow.
Click the newly created workflow and right-click Data Integration.
Choose .
In the Create Node dialog box, configure the Name parameter, and select a path from the Path drop-down list.
ImportantThe node name must be 1 to 128 characters in length, and can contain letters, digits, underscores (_), and periods (.).
Click Confirm.
Configure and run the batch synchronization node by using the codeless UI
Configure network connections and a resource group.
Select the source.
Select MaxCompute(ODPS) from the Source drop-down list and select the created MaxCompute data source from the Data Source Name drop-down list.
Select an exclusive resource group for Data Integration.
Select an existing exclusive resource group for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration.
Select the destination.
Specify Destination and Data Destination Name.
Test connectivity.
Test the network connectivity between the resource group and the source and between the resource group and destination. You must make sure that the exclusive resource group for Data Integration is connected to the data sources. Click Next.
Configure a task.
Configure the source and destination.
In the Source and Destination sections, configure the table from which data is read, the table to which data is written, and the range of data to be synchronized. For more information, see Step 3: Configure the source and destination in the "Configure a batch synchronization node by using the codeless UI" topic.
Configure field mappings.
After the source and destination are configured, you must configure mappings between source fields and destination fields. After the mappings are configured, the batch synchronization node writes the values of the source fields to the destination fields of the same data type based on the mappings. For more information, see Step 4: Configure mappings between source fields and destination fields in the "Configure a batch synchronization node by using the codeless UI" topic.
Configure channel control.
You can configure channel control policies to define attributes for data synchronization. For more information, see Step 5: Configure channel control policies in the "Configure a batch synchronization node by using the codeless UI" topic.
Configure scheduling properties.
Configure scheduling properties on the Properties tab to filter data.
In the top toolbar, click the icon to save the configurations and click the icon to run the batch synchronization node.
Configure and run the batch synchronization node by using the code editor
Configure network connections and a resource group.
Select the source, destination, and exclusive resource group for Data Integration, and then establish network connections between the resource group and the data sources. For more information, see Configure and run the batch synchronization node by using the codeless UI in this topic.
Switch to the code editor and import a template.
Click the Conversion script icon in the top toolbar. If no script is configured, you can click the icon in the top toolbar of the configuration tab to apply a script template.
Edit code in the code editor to configure the batch synchronization node.
Configure a reader for the synchronization node.
Configure the source and the source table.
{ "stepType": "odps", "parameter": { "partition": [], "datasource": "odps_first", "envType": 0, "column": [ "*" ], "table": "" }, "name": "Reader", "category": "reader" },
stepType: the data source type of the source. Set this parameter to odps.
partition: the partition information of the source table. You can run the
show partitions <table_name>;
command to view the partition information of the table. For more information, see Table operations.datasource: the name of the MaxCompute data source.
column: the name of the source column.
table: the name of the source table. You can run the
show tables;
command to view the table name.name and category: Set name to Reader and category to reader. This way, the data source is configured as the source.
Configure a writer for the batch synchronization node.
Configure the destination and the destination table.
{ "stepType":"oss", "parameter":{ "partition":"", "truncate":true, "datasource":"", "column":[ "*" ], "table":"" }, "name":"Writer", "category":"writer" }
stepType: the data source type of the destination.
partition: the partition information of the destination table.
datasource: the name of the destination.
column: the name of the destination column.
table: the name of the destination table.
name and category: Set name to Writer and category to writer. This way, the data source is configured as the destination.
Configure channel control policies, such as the maximum transmission rate and and the maximum number of dirty data records allowed.
"setting": { "errorLimit": { "record": "1024" }, "speed": { "throttle": false, "concurrent": 1 } },
record: the maximum number of dirty data records allowed.
throttle: specifies whether throttling is enabled.
concurrent: the maximum number of parallel threads that the batch synchronization node uses to read data from the source or write data to the destination.
For more information, see Step 4: Edit the script of the batch synchronization node to configure the node in the "Configure a batch synchronization node by using the code editor" topic.
Configure the properties of the synchronization node. For more information, see Supported formats of scheduling parameters.
In the top toolbar, click the icon to save the configurations and click the icon to run the batch synchronization node.
Check the synchronization results
Go to the destination and check whether the data in the MaxCompute table is exported to the destination table:
If all data is exported, the synchronization is complete.
If no data is exported or some data failed to be exported, see Batch synchronization.