All Products
Search
Document Center

Tablestore:Export full data to MaxCompute

Last Updated:Nov 25, 2025

Use the data integration feature in DataWorks to export full data from Tablestore to MaxCompute for offline analysis and processing. DataWorks is a platform for big data development and governance.

Prerequisites

Before you export data, complete the following preparations:

Note

If your DataWorks workspace and Tablestore instance are in different regions, you must create a VPC peering connection to enable cross-region network connectivity.

Create a VPC peering connection for cross-region network connectivity

The following example shows a use case where the source table instance is in the China (Shanghai) region and the DataWorks workspace is in the China (Hangzhou) region.

  1. Attach a VPC to the Tablestore instance.

    1. Log on to the Tablestore console. In the top navigation bar, select the region where the target table is located.

    2. Click the instance alias to navigate to the Instance Management page.

    3. On the Network Management tab, click Bind VPC. Select a VPC and vSwitch, enter a VPC name, and then click OK.

    4. Wait for the VPC to attach. The page automatically refreshes to display the VPC ID and VPC Address in the VPC list.

      Note

      When you add a Tablestore data source in the DataWorks console, you must use this VPC address.

      image

  2. Obtain the VPC information for the DataWorks workspace resource group.

    1. Log on to the DataWorks console. In the top navigation bar, select the region where your workspace is located. In the navigation pane on the left, click Workspace to go to the Workspaces page.

    2. Click the workspace name to go to the Workspace Details page. In the left navigation pane, click Resource Group to view the resource groups attached to the workspace.

    3. To the right of the target resource group, click Network Settings. In the Data Scheduling & Data Integration section, view the VPC ID of the attached virtual private cloud.

  3. Create a VPC peering connection and configure routes.

    1. Log on to the VPC console. In the navigation pane on the left, click VPC Peering Connection and then click Create VPC Peering Connection.

    2. On the Create VPC Peering Connection page, enter a name for the peering connection and select the requester VPC instance, accepter account type, accepter region, and accepter VPC instance. Then, click OK.

    3. On the VPC Peering Connection page, find the VPC peering connection and click Configure route in the Requester VPC and Accepter columns.

      For the destination CIDR block, enter the CIDR block of the peer VPC. For example, when you configure a route entry for the requester VPC, enter the CIDR block of the accepter VPC. When you configure a route entry for the accepter VPC, enter the CIDR block of the requester VPC.

Procedure

Follow these steps to configure a full data export from Tablestore to MaxCompute.

Step 1: Add a Tablestore data source

Configure a Tablestore data source in DataWorks to connect to the source data table.

  1. Log on to the DataWorks console. Switch to the destination region. In the navigation pane on the left, choose Data Integration > Data Integration. From the drop-down list, select the workspace and click Go to Data Integration.

  2. In the navigation pane on the left, click Data source.

  3. On the Data Sources page, click Add Data Source.

  4. In the Add Data Source dialog box, search for and select Tablestore as the data source type.

  5. In the Add OTS Data Source dialog box, configure the data source parameters as described in the following table.

    Parameter

    Description

    Data Source Name

    The data source name must be a combination of letters, digits, and underscores (_). It cannot start with a digit or an underscore (_).

    Data Source Description

    A brief description of the data source. The description cannot exceed 80 characters in length.

    Region

    Select the region where the Tablestore instance resides.

    Tablestore Instance Name

    The name of the Tablestore instance.

    Endpoint

    The endpoint of the Tablestore instance. Use the VPC address.

    AccessKey ID

    The AccessKey ID and AccessKey secret of the Alibaba Cloud account or RAM user.

    AccessKey Secret

  6. Test the resource group connectivity.

    When you create a data source, you must test the connectivity of the resource group to ensure that the resource group for the sync task can connect to the data source. Otherwise, the data sync task cannot run.

    1. In the Connection Configuration section, click Test Network Connectivity in the Connection Status column for the resource group.

    2. After the connectivity test passes, click Complete. The new data source appears in the data source list.

      If the connectivity test fails, use the Network Connectivity Diagnostic Tool to troubleshoot the issue.

Step 2: Add a MaxCompute data source

Configure a MaxCompute data source to use as the destination for the data export.

  1. Click Add Data Source again. Select MaxCompute as the data source type and configure the parameters.

    Parameter

    Description

    Data Source Name

    The name must consist of letters, digits, and underscores (_). It cannot start with a digit or an underscore (_).

    Data Source Description

    A brief description of the data source. The description cannot exceed 80 characters.

    Authentication Method

    The default value is Alibaba Cloud Account And Alibaba Cloud RAM Role. You cannot change this value.

    Alibaba Cloud Account

    • Current Alibaba Cloud Account: Select the MaxCompute Project Name and Default Access Identity for the current account in the specified region.

    • Other Alibaba Cloud Account: Enter the UID of Alibaba Cloud Account, MaxCompute Project Name, and RAM Role for the other account in the specified region.

    Region

    The region where the MaxCompute project is located.

    Endpoint

    The default value is Auto Fit. You can also select Custom Configuration as needed.

  2. After you configure the parameters and the connectivity test succeeds, click Complete to add the data source.

Step 3: Configure a batch synchronization task

Create a data sync task to define the data transfer rules and field mappings from Tablestore to MaxCompute.

Create a task node

  1. Go to the Data Development page.

    1. Log on to the DataWorks console.

    2. In the top navigation bar, select a resource group and a region.

    3. In the navigation pane on the left, click Data Development and O&M > Data Development.

    4. Select the target workspace and click Go to Data Studio.

  2. On the Data Studio console, click the image icon to the right of Workspace Directories, and then select Create Node > Data Integration > Batch Synchronization.

  3. In the Create Node dialog box, select a Path. Set the data source to Tablestore and the destination to MaxCompute(ODPS). Enter a Name and click OK.

Configure the sync task

In the Workspace Directories, click the new batch synchronization task node to open it. You can configure the sync task in the codeless UI or using the code editor.

Codeless UI (default)

Configure the following items in the codeless UI:

  • Data Source: Select the source and destination data sources.

  • Runtime Resource: Select a resource group. The system automatically tests the data source connectivity after you make a selection.

  • Data Source:

    • Table: Select the source data table from the drop-down list.

    • Primary Key Range (Start): Specify the start of the primary key range from which to read data. The value must be in the JSON array format. inf_min represents an infinitely small value.

      For example, if the primary key consists of an int column named id and a string column named name, the sample configuration is as follows:

      Specific primary key range

      Full data

      [
        {
          "type": "int",
          "value": "000"
        },
        {
          "type": "string",
          "value": "aaa"
        }
      ]
      [
        {
          "type": "inf_min"
        },
        {
          "type": "inf_min"
        }
      ]
    • Primary Key Range (End): Specify the end of the primary key range from which to read data. The value must be in the JSON array format. inf_max represents an infinitely large value.

      For example, if the primary key consists of an int column named id and a string column named name, the sample configuration is as follows:

      Specific primary key range

      Full data

      [
        {
          "type": "int",
          "value": "999"
        },
        {
          "type": "string",
          "value": "zzz"
        }
      ]
      [
        {
          "type": "inf_max"
        },
        {
          "type": "inf_max"
        }
      ]
    • Splitting Configuration: Specify custom split configuration information in the JSON array format. In most cases, we recommend that you do not configure this parameter. If you do not configure this parameter, set it to [].

      If hot spots occur in Tablestore data storage and the automatic splitting policy of Tablestore Reader is ineffective, use a custom splitting rule. The rule specifies the split points within the start and end primary key range. You only need to configure the shard key and do not need to specify all primary keys.

  • Destination: Configure the following items. keep the default values for other parameters or modify them as needed.

    • Production Project Name: The name of the MaxCompute project that is associated with the destination data source. This parameter is automatically populated.

    • Tunnel Resource Group: The default value is Common transmission resources. This resource group uses the free quota of MaxCompute. Select a dedicated Tunnel resource group as needed.

    • Table: Select the destination table. Click Generate Target Table Schema to automatically create the destination table.

    • Partition: Specify the partition where the synchronized data is saved. This parameter can be used for daily incremental synchronization.

    • Write Mode: Select whether to clear existing data before writing or to append data to the table.

  • Destination Field Mapping: Configure the field mapping from the source table to the destination table. The system automatically maps fields based on the source table. Modify the mapping as needed.

After you complete the configuration, click Save at the top of the page.

Code editor

Click Code Editor at the top of the page to edit the script in the editor.

The following sample script is for a source data table where the primary key consists of an int column named id and a string column named name. The attribute column is an int field named age. In your script, replace the values for the datasource and table parameters with your actual values.
{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "ots",
            "parameter": {
                "datasource": "source_data",
                "column": [
                    {
                        "name": "id",
                        "type": "INTEGER"
                    },
                    {
                        "name": "name",
                        "type": "STRING"
                    },
                    {
                        "name": "age",
                        "type": "INTEGER"
                    }
                ],
                "range": {
                    "begin": [
                        {
                            "type": "inf_min"
                        },
                        {
                            "type": "inf_min"
                        }
                    ],
                    "end": [
                        {
                            "type": "inf_max"
                        },
                        {
                            "type": "inf_max"
                        }
                    ],
                    "split": []
                },
                "table": "source_table",
                "newVersion": "true"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "odps",
            "parameter": {
                "partition": "pt=${bizdate}",
                "truncate": true,
                "datasource": "target_data",
                "tunnelQuota": "default",
                "column": [
                    "id",
                    "name",
                    "age"
                ],
                "emptyAsNull": false,
                "guid": null,
                "table": "source_table",
                "consistencyCommit": false
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

After you finish editing the script, click Save at the top of the page.

Run the sync task

  1. Click Debug Configuration on the right side of the page. Select a resource group to run the task and add Script Parameters.

    • bizdate: The data partition of the MaxCompute destination table. For example, 20251120.

  2. Click Run at the top of the page to start the sync task.

Step 4: View the synchronization result

After the sync task is complete, view its execution status in the logs and check the synchronized data in the DataWorks console.

  1. View the task running status and result at the bottom of the page. The following log information indicates that the sync task ran successfully.

    2025-11-18 11:16:23 INFO Shell run successfully!
    2025-11-18 11:16:23 INFO Current task status: FINISH
    2025-11-18 11:16:23 INFO Cost time is: 77.208s
  2. View the synchronized data in the destination table.

    Go to the DataWorks console. In the navigation pane on the left, click Data Governance > Data Map. Then, click Go to Data Map to view the destination table and its data.

FAQ