This topic describes how to export MaxCompute data to other data sources by using the Data Integration service of DataWorks.

Background information

You can use one of the following methods to export data:
  • Use the codeless user interface (UI). After you create a batch synchronization node in the DataWorks console, configure a source, a destination, and field mappings on the codeless UI to export data.
  • Use the code editor. After you create a batch synchronization node in the DataWorks console, switch to the code editor. Then, write code to configure a source, a destination, and field mappings to export data.

Prerequisites

Limits

Each batch synchronization node can export data from only one table. If you want to export data from multiple tables, you must create multiple batch synchronization nodes.

Procedure

Perform the following steps to export data by using Data Integration:

  1. Add a MaxCompute data source to DataWorks
    Add a MaxCompute data source to DataWorks.
  2. Add the destination to DataWorks
    Add the destination to DataWorks.
  3. Create a workflow
    Create a workflow in the DataWorks console. The workflow is required when you create a batch synchronization node.
  4. Create a batch synchronization node
    Create a batch synchronization node based on the created workflow.
  5. Configure and run the batch synchronization node by using the codeless UI or Configure and run the batch synchronization node by using the code editor
    Configure and run the batch synchronization node by using the codeless UI or code editor.
  6. Check the synchronization results
    Check the synchronization results on the destination.

Add a MaxCompute data source to DataWorks

  1. Go to the Data Source page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides. Find your workspace and click Data Integration in the Actions column.
    4. On the page that appears, click Connection in the left-side navigation pane. The Data Source page appears.
  2. On the Data Source page, click New data source in the upper-right corner.
  3. In the Add data source dialog box, click MaxCompute (ODPS) in the Big Data Storage section.
  4. In the Add MaxCompute (ODPS) data source dialog box, set the parameters as required.
    MaxCompute connection
    Parameter Description
    Data Source Name The name of the connection. The name can contain letters, digits, and underscores (_), and must start with a letter.
    Description The description of the connection. The description can be up to 80 characters in length.
    Applicable environment The environment in which the connection is used. Valid values: Development and Production.
    Note This parameter is displayed only when the workspace is in standard mode.
    ODPS Endpoint The endpoint of the MaxCompute project. This parameter is read-only, and the value is automatically obtained from system configurations.
    Tunnel Endpoint The endpoint of the MaxCompute Tunnel service. For more information, see Configure endpoints.
    ODPS project name The name of the MaxCompute project.
    AccessKey ID The AccessKey ID of the account that you can use to connect to the MaxCompute project. You can view the AccessKey ID on the Security Management page.
    AccessKey Secret The AccessKey secret of the account that you can use to connect to the MaxCompute project.
  5. On the Data Integration tab, click Test connectivity in the Operation column of each resource group.
    A sync node uses only one resource group. To ensure that your sync nodes can be properly run, you must test the connectivity of all the resource groups for Data Integration on which your sync nodes will be run. If you need to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch test connectivity. For more information, see Select a network connectivity solution.
  6. After the connection passes the connectivity test, click Complete.

Add the destination to DataWorks

Add the destination to the data source list of DataWorks based on the data source type. For more information about how to add a data source, see Configure data sources.

Create a workflow

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region where the required workspace resides, find the workspace and click Data Analytics.
  4. On the DataStudio page, move the pointer over the Create icon icon and select Workflow.
  5. In the Create Workflow dialog box, set the Workflow Name and Description parameters.
    Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  6. Click Create.

Create a batch sync node

  1. Click the workflow that you created in the previous step to show its content and right-click Data Integration.
  2. Choose Create > Batch Synchronization.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  4. Click Commit.

Configure and run the batch synchronization node by using the codeless UI

  1. Configure the source.
    Select ODPS from the Connection drop-down list below Source and select the added MaxCompute data source from the drop-down list on the right side of Connection. Then, select the source table from the Table drop-down list. If the table is a partitioned table, specify Partition Key Column. Source
  2. Configure the destination.
    Select the data source type of the destination from the Connection drop-down list below Target and select the name of the destination from the drop-down list on the right side of Connection. Then, select the destination table from the Table drop-down list. Destination
  3. Configure field mappings.
    Configure mappings between the fields in the source table and the fields in the destination table. Field mappings
  4. Configure channel control.
    Channel control
  5. Configure scheduling properties.
    Configure scheduling properties on the Properties tab to filter data.
  6. In the top toolbar, click the Save icon to save the configurations and click the Run icon to run the batch synchronization node.

Configure and run the batch synchronization node by using the code editor

  1. Import a template.
    Select ODPS from the Source Connection Type drop-down list and select the added MaxCompute data source from the Connection drop-down list below Source Connection Type. Select the data source type of the destination from the Target Connection Type drop-down list and select the name of the destination from the Connection drop-down list below Target Connection Type. Then, click OK. Import a template
  2. Configure the source.
    Configure the source and the source table.
    {
                "stepType": "odps",
                "parameter": {
                    "partition": [],
                    "datasource": "odps_first",
                    "envType": 0,
                    "column": [
                        "*"
                    ],
                    "table": ""
                },
                "name": "Reader",
                "category": "reader"
            },
    • stepType: the data source type of the source. Set this parameter to odps.
    • partition: the partition information of the source table. You can run the show partitions <table_name>; command to view the partition information of the table. For more information, see View partition information.
    • datasource: the name of the MaxCompute data source.
    • column: the name of the source column.
    • table: the name of the source table. You can run the show tables; command to view the table name. For more information, see List tables and views in a project.
    • name and category: Set name to Reader and category to reader. This way, the data source is configured as the source.
  3. Configure the destination.
    Configure the destination and the destination table.
    {
                "stepType":"mysql",
                "parameter":{
                    "partition":"",
                    "truncate":true,
                    "datasource":"",
                    "column":[
                        "*"
                    ],
                    "table":""
                },
                "name":"Writer",
                "category":"writer"
            }
    • stepType: the data source type of the destination.
    • partition: the partition information of the destination table.
    • datasource: the name of the destination.
    • column: the name of the destination column. Make sure that the column has a one-to-one mapping with the column that is specified in Step 2.
    • table: the name of the destination table.
    • name and category: Set name to Writer and category to writer. This way, the data source is configured as the destination.
  4. Configure channel control.
    "setting": {
            "errorLimit": {
                "record": "1024"   
            },
            "speed": {
                "throttle": false,   
                "concurrent": 1   
            }
        },
    • record: the maximum number of dirty data records allowed.
    • throttle: specifies whether throttling is enabled.
    • concurrent: the maximum number of parallel threads that the batch synchronization node uses to read data from the source or write data to the destination.
  5. Configure scheduling properties.
  6. In the top toolbar, click the Save icon to save the configurations and click the Run icon to run the batch synchronization node.

Check the synchronization results

Go to the destination and check whether the data in the MaxCompute table is exported to the destination table:
  • If all data is exported, the synchronization is complete.
  • If no data is exported or some data failed to be exported, see FAQ about batch synchronization.