All Products
Search
Document Center

Tablestore:Data integration

Last Updated:May 12, 2026

If your business workloads demand high concurrency, scalability, and availability, or require complex searches and big data analysis, and your current database architecture is inadequate or too costly to upgrade, use DataWorks Data Integration to migrate data from your existing databases to Tablestore. DataWorks Data Integration can also migrate Tablestore data across different instances and accounts, or to OSS and MaxCompute for backup and analysis.

Use cases

DataWorks Data Integration is a stable, efficient, and scalable data synchronization platform. It supports data migration and data synchronization between various heterogeneous data sources, such as MySQL, Oracle, MaxCompute, and Tablestore.

You can use DataWorks Data Integration for various Tablestore data migration scenarios. These include migrating data from databases to Tablestore, synchronizing Tablestore data across instances or accounts, and migrating Tablestore data to OSS or MaxCompute.

Database to Tablestore

DataWorks Data Integration lets you migrate data from various heterogeneous data sources to Tablestore.

Across instances or accounts

You can copy data from Tablestore data tables or time series tables by configuring the appropriate Reader and Writer plug-ins in DataWorks. The following table describes the relevant plug-ins.

Plug-in

Description

OTSReader

Reads data from Tablestore tables. You can also specify a data range for incremental data extraction.

OTSStreamReader

Incrementally exports data from Tablestore tables.

OTSWriter

Writes data to Tablestore.

To OSS or MaxCompute

You can migrate data from Tablestore to OSS or MaxCompute.

  • MaxCompute is a fast and fully managed data warehouse service that processes data at the terabyte or petabyte scale. You can use MaxCompute to back up data from Tablestore or migrate data to MaxCompute for processing.

  • OSS is a highly secure, cost-effective, and reliable cloud storage service that can store massive amounts of data. You can use OSS to back up data from Tablestore or synchronize data to OSS and then download the data as files to a local computer.

Migration solutions

Use DataWorks Data Integration to migrate data between Tablestore and other data sources.

  • Data import solutions allow you to synchronize data from sources like MySQL, Oracle, Kafka, HBase, MaxCompute, and PolarDB-X 2.0 to Tablestore. You can also synchronize data between Tablestore data tables or time series tables.

  • Data export solutions allow you to synchronize data from Tablestore to MaxCompute and OSS.

Import data

Migration solution

Description

Synchronize MySQL data to Tablestore

You can migrate data from MySQL databases to Tablestore data tables only.

This process uses the MySQL Reader and Tablestore Writer scripts. Configure the data sources as follows:

Synchronize Oracle data to Tablestore

You can migrate data from Oracle databases to Tablestore data tables only.

This process uses the Oracle Reader and Tablestore Writer scripts. Configure the data sources as follows:

Synchronize Kafka data to Tablestore

You can migrate data from Kafka to Tablestore data tables or time series tables.

Important

This process uses the Kafka Reader and Tablestore Writer scripts. Configure the data sources as follows:

Synchronize HBase data to Tablestore

You can migrate data from HBase databases to Tablestore data tables only.

This process uses the HBase Reader and Tablestore Writer scripts. Configure the data sources as follows:

Synchronize MaxCompute data to Tablestore

You can migrate data from MaxCompute to Tablestore data tables only.

This process uses the MaxCompute Reader and Tablestore Writer scripts. Configure the data sources as follows:

Synchronize PolarDB-X 2.0 data to Tablestore

You can migrate data from PolarDB-X 2.0 to Tablestore data tables only.

This process uses the PolarDB-X 2.0 Reader and Tablestore Writer scripts. Configure the data sources as follows:

Synchronize data between Tablestore data tables

You can migrate data between Tablestore data tables.

This process uses the Tablestore Reader and Writer scripts. For details about data source configuration, see Tablestore data source. When configuring the scripts, refer to the instructions for reading and writing wide table data.

Synchronize data between Tablestore time series tables

You can migrate data between Tablestore time series tables.

This process uses the Tablestore Reader and Writer scripts. For details about data source configuration, see Tablestore data source. When configuring the scripts, refer to the instructions for reading and writing time series data.

Export data

Migration solution

Description

Synchronize Tablestore data to MaxCompute

You can use MaxCompute to back up Tablestore data or migrate the data to MaxCompute.

This process uses the Tablestore Reader and MaxCompute Writer scripts. Configure the data sources as follows:

Synchronize Tablestore data to OSS

You can download files synchronized to OSS at any time or store them in OSS as backups.

This process uses the Tablestore Reader and OSS Writer scripts. Configure the data sources as follows:

Prerequisites

After selecting a migration solution, complete the following prerequisites:

  • Ensure network connectivity between DataWorks and both the source and destination.

  • For the source, confirm its version, prepare the required account, and configure the necessary permissions and service-specific settings. For more information, see the source documentation.

  • For the destination, activate the service and create the required resources. For more information, see the destination documentation.

Usage notes

Important

If you encounter any issues, submit a ticket.

  • Ensure that DataWorks Data Integration supports data migration for your specific product version.

  • The data types in the source and destination data sources must match. Otherwise, the migration will result in dirty data.

  • After you select a migration solution, carefully review the limits and usage notes for your source and destination data sources.

  • Before you migrate Kafka data, choose the Table Store data model that best fits your business scenario.

Configuration process

Based on your migration solution, follow this process to migrate data using DataWorks Data Integration.

image

The following table describes the steps in the process.

No.

Step

Description

1

Add source and destination data sources

Create the required data sources based on your migration solution.

  • For data import operations, the destination data source is Tablestore. The source data source can be MySQL, MaxCompute, or another Tablestore data source.

  • For data export operations, the source data source is Tablestore. The destination data source can be MaxCompute or OSS.

2

Configure using the codeless UI

DataWorks Data Integration provides the codeless UI and step-by-step guidance. You can enter settings in a visual interface and follow prompts to configure a batch synchronization task. The codeless UI is easy to learn but lacks some advanced features.

3

Verify the migration results

View the data in the destination data source based on your migration solution.

  • After importing data, view the data in the Tablestore console.

  • After exporting data, view the exported data in the MaxCompute console or the OSS console.

Configuration example

Data import

Use DataWorks Data Integration to synchronize data from databases like MySQL, Oracle, and MaxCompute to Tablestore data tables, or to synchronize Tablestore data across accounts or instances. Examples include synchronizing data from one data table to another.

This topic shows how to use the codeless UI to synchronize data from MaxCompute to a Tablestore data table.

Prerequisites

  • Obtain the information about the MaxCompute project and the source table.

  • Activate Tablestore, and create an instance and a destination data table. Obtain the instance name, instance endpoint, and region ID.

    Note

    You do not need to pre-define attribute columns for the data table; they are dynamically specified when data is written.

  • Create an AccessKey for your Alibaba Cloud account or for a RAM user with permissions on Tablestore and MaxCompute.

  • Activate DataWorks and create a workspace in the same region as your MaxCompute or Tablestore instance.

  • Create a serverless resource group and bind it to the workspace.

Important

If your MaxCompute and Tablestore instances are in different regions, create a VPC peering connection to establish cross-region network connectivity.

Create a VPC peering connection for cross-region connectivity

This example assumes the DataWorks workspace and MaxCompute instance are in the China (Hangzhou) region, and the Tablestore instance is in the China (Shanghai) region.

  1. Bind a VPC to the Tablestore instance.

    1. Log on to the Tablestore console and select a region in the top navigation bar.

    2. Click the instance alias to go to the Instance Management page.

    3. Go to the Network Management tab, click Bind VPC, select a VPC and a vSwitch, enter a name for the VPC, and then click Yes.

    4. After the VPC is bound, the page refreshes automatically. You can then view the VPC ID and VPC endpoint in the VPC list.

      Note

      Use this VPC endpoint when you add a Tablestore data source in the DataWorks console.

      image

  2. Obtain the VPC information of the DataWorks workspace resource group.

    1. Log on to the DataWorks console, select the region of the workspace from the top navigation bar, and then click Workspaces in the navigation pane on the left to go to the Workspace list page.

    2. Click the workspace name to go to the Workspace Details page. In the navigation pane on the left, click Resource Group to view the list of resource groups bound to the workspace.

    3. Click Network Settings to the right of the target resource group. In the Resource Scheduling & Data Integration area, view the VPC ID of the bound VPC.

  3. Create a VPC peering connection and configure routes.

    1. Log on to the VPC console. In the navigation pane on the left, click VPC. Select the regions of the Tablestore instance and the DataWorks workspace, and record the CIDR block for each VPC.

      image

    2. In the navigation pane on the left, click VPC Peering Connection. On the VPC Peering Connection page, click Create VPC Peering Connection.

    3. On the Create VPC Peering Connection page, enter a name for the peering connection, select the requester VPC, accepter account type, accepter region, and accepter VPC, and then click OK.

    4. On the VPC Peering Connection page, find the VPC peering connection that you created. In the Requester VPC and Accepter VPC columns, click Configure route.

      The destination CIDR block must be the CIDR block of the peer VPC. When you configure a route for the requester VPC, enter the CIDR block of the accepter VPC. When you configure a route for the accepter VPC, enter the CIDR block of the requester VPC.

Step 1: Add a Tablestore data source and a MaxCompute data source

This topic describes how to add a data source, using a Tablestore data source as an example.

Note

To add a MaxCompute data source, simply search for and select MaxCompute as the data source type in the Add Data Source dialog box. Then, configure the data source parameters.

  1. Go to the Data Integration page.

    Log on to the DataWorks console. After switching to the destination region, click Data Integration > Data Integration in the navigation pane on the left. In the drop-down list, select the corresponding workspace and click Go to Data Integration.

  2. In the navigation pane on the left, click Data Source.

  3. On the Data Sources page, click Add Data Source.

  4. In the Add Data Source dialog box, select Tablestore as the data source type.

  5. In the Add Tablestore Data Source dialog box, set the data source parameters as outlined in the table below.

    Parameter

    Description

    Data Source Name

    The name of the data source. The name can contain only letters, digits, and underscores (_), and must start with a letter.

    Data Source Description

    The description of the data source. The description cannot exceed 80 characters in length.

    Region

    Select the region where the Tablestore instance is located.

    Tablestore Instance Name

    The name of the Tablestore instance.

    Endpoint

    The endpoint of the Tablestore instance. We recommend that you use the VPC address.

    AccessKey ID

    The AccessKey ID and AccessKey secret of an Alibaba Cloud account or a Resource Access Management (RAM) user.

    AccessKey Secret

  6. Test the connectivity of the resource group. You must perform this test when you create a data source to ensure that the resource group used by the sync task can connect to the data source. Otherwise, the data synchronization task cannot run properly.

    1. In the Connection Configuration section, for the target resource group, click Test Network Connectivity in the Connection Status column.

    2. After the connectivity test passes, the Connectivity Status changes to Connected. Click Complete. The new data source then appears in the data source list.

      Note

      If the connectivity test fails and the status is Failed, use the connectivity diagnostic tool to troubleshoot the issue. If the resource group still cannot connect to the data source, or submit a ticket.

Step 2: Configure batch synchronization

DataStudio (old version)

1. Create a task node
  1. Go to the Data Development page.

    1. Log on to the DataWorks console.

    2. In the top navigation bar, select a resource group and region.

    3. In the navigation pane on the left, click Data Development and O&M > Data Development.

    4. On the Data Development page, select the target workspace from the drop-down list and click the Go to Data Development button.

  2. On the Data Development page of the Data Studio console, under the Business Flow node, click the target business flow.

    For more information, see Create a business flow.

  3. Right-click the Data Integration node and choose Create Node > Batch Synchronization.

  4. In the Create Node dialog box, select a path, enter a name, and click OK.

    The new batch synchronization node is displayed under the Data Integration node.

2. Configure the synchronization task
  1. Under the Data Integration node, double-click the new batch synchronization task node.

  2. Configure the network and resources.

    Select the data source, data destination, and the resource group for the batch synchronization task, and then test the connectivity.

    1. In the Network and Resource Configuration step, set Data Source to MaxCompute(ODPS) and select the new MaxCompute data source for Data Source Name.

    2. Select a resource group.

      After you select a resource group, the system displays information such as the region and specifications of the resource group. The system also automatically tests the connectivity between the resource group and the selected data source.

      Note

      Serverless resource groups let you specify an upper limit for the number of CUs that a synchronization task can use. If an out-of-memory (OOM) error occurs because of insufficient resources, increase the CU limit for the resource group.

    3. Set Destination to Tablestore and select the new Tablestore data source for Data Source Name.

      The system automatically tests the connectivity between the resource group and the selected data source.

    4. After the connectivity test is passed, click Next.

  3. Configure and save the task.

    1. In the Configure Task step, configure the data source and destination in the Configure Source and Destination section as needed.

      Data source

      Parameter

      Description

      Data source

      The MaxCompute data source that you selected in the previous step.

      Tunnel resource group

      This is the Tunnel Quota. By default, Public Transport Resources is selected, which is the free quota for MaxCompute.

      This is the data transport resource for MaxCompute. For more information, see Purchase and use a dedicated resource group for data transport.

      Note

      If a dedicated Tunnel quota is unavailable because of an overdue payment or expiration, the task automatically switches to use public transport resources during runtime.

      Table

      The source table.

      Filter method

      The logic to filter the data that you want to synchronize. The following methods are supported:

      • Partition Filter: Use a partition expression to specify the range of source data to synchronize. If you select this method, you must also configure the Partition Information and When partitions do not exist, parameters.

      • Data Filtering: Use a WHERE clause to specify the range of source data to synchronize. You do not need to enter the WHERE keyword.

      Partition information

      Note

      This parameter is required when you set Filtering Method to Partition Filter.

      Specify the value of the partition column.

      • The value can be a fixed value, such as ds=20220101.

      • The value can be a scheduling parameter, such as ds=${bizdate}. The scheduling parameter is automatically replaced with its actual value at runtime.

      If a partition does not exist

      Note

      This parameter is required when you set Filtering Method to Partition Filter.

      The policy for the synchronization task if a partition does not exist.

      • Error.

      • the partitions are ignored and tasks are normally run..

      Data destination

      Parameter

      Description

      Data source

      The Tablestore data source that you selected in the previous step.

      Table

      The destination data table.

      Primary key information

      The primary key information of the destination data table.

      Write mode

      The mode to write data to Tablestore. The following modes are supported:

      • PutRow: This mode corresponds to the PutRow API of Tablestore. It inserts data into a specified row. If the row does not exist, a new row is created. If the row exists, the original row is overwritten.

      • UpdateRow: This mode corresponds to the UpdateRow API of Tablestore. It updates the data in a specified row. If the row does not exist, a new row is created. If the row exists, the values of specified columns in the row are added, modified, or deleted based on the request.

    2. Configure field mapping.

      After you configure the data source and destination, you must specify the mapping between Source Field and Destination Field. The task writes data from the source fields to the destination fields of the corresponding data types based on the mapping. For more information, see 4. Configure field mapping.

      Important
      • You must specify the primary key information in Source Field to read the primary key data.

      • Because you configured the Destination for the destination table in the Primary Key Information section in the previous step, you cannot configure primary key information in Destination Field.

      • If a field has the INTEGER data type, you must configure it as INT. DataWorks automatically converts it to the INTEGER type. If you directly configure the data type as INTEGER, an error is reported in the logs and the task fails.

    3. Configure channel control.

      Use channel control to manage the properties of the data synchronization process. For more information about the parameters, see Relationship between concurrency and rate limiting for batch synchronization.

    4. Click the image.png icon to save the configuration.

Run the synchronization task
  1. Click the 1680170333627-a1e19a43-4e2a-4340-9564-f53f2fa6806e icon.

  2. In the Parameters dialog box, select a resource group for the run.

  3. Click Run.

New version of DataStudio

1. Create a task node
  1. Go to the Data Development page.

    1. Log on to the DataWorks console.

    2. In the top navigation bar, select a resource group and region.

    3. In the navigation pane on the left, click Data Development and O&M > Data Development.

    4. On the Data Development page, select your target workspace from the drop-down list and click Go to Data Studio.

  2. In data development page of the Data Studio console, click the image icon to the right of Workspace Directories, and select Create Node > Data Integration > Batch Synchronization.

    Note

    If this is your first time using Workspace Directories, you can also click the Create Node button.

  3. In the Create Node dialog box, select a path, enter a name, and click OK.

    The new batch synchronization node appears in the Workspace Directories.

2. Configure the synchronization task
  1. In the Project Directory, click the new batch synchronization task node to open it.

  2. Configure the network and resources.

    Select the data source, data destination, and the resource group for the batch synchronization task, and then test the connectivity.

    1. In the Network and Resource Configuration step, set Data Source to MaxCompute(ODPS) and select the new MaxCompute data source for Data Source Name.

    2. Select a resource group.

      After you select a resource group, the system displays information such as the region and specifications of the resource group. The system also automatically tests the connectivity between the resource group and the selected data source.

      Note

      Serverless resource groups let you specify an upper limit for the number of CUs that a synchronization task can use. If an out-of-memory (OOM) error occurs because of insufficient resources, increase the CU limit for the resource group.

    3. Set Destination to Tablestore and select the new Tablestore data source for Data Source Name.

      The system automatically tests the connectivity between the resource group and the selected data source.

    4. After the connectivity test is passed, click Next.

  3. Configure and save the task.

    1. In the Configure Task step, configure the data source and destination in the Configure Source and Destination section as needed.

      Data source

      Parameter

      Description

      Data source

      The MaxCompute data source that you selected in the previous step.

      Tunnel resource group

      This is the Tunnel Quota. By default, Public Transport Resources is selected, which is the free quota for MaxCompute.

      This is the data transport resource for MaxCompute. For more information, see Purchase and use a dedicated resource group for data transport.

      Note

      If a dedicated Tunnel quota is unavailable because of an overdue payment or expiration, the task automatically switches to use public transport resources during runtime.

      Table

      The source table.

      Filter method

      The logic to filter the data that you want to synchronize. The following methods are supported:

      • Partition Filter: Use a partition expression to specify the range of source data to synchronize. If you select this method, you must also configure the Partition Information and When partitions do not exist, parameters.

      • Data Filtering: Use a WHERE clause to specify the range of source data to synchronize. You do not need to enter the WHERE keyword.

      Partition information

      Note

      This parameter is required when you set Filtering Method to Partition Filter.

      Specify the value of the partition column.

      • The value can be a fixed value, such as ds=20220101.

      • The value can be a scheduling parameter, such as ds=${bizdate}. The scheduling parameter is automatically replaced with its actual value at runtime.

      If a partition does not exist

      Note

      This parameter is required when you set Filtering Method to Partition Filter.

      The policy for the synchronization task if a partition does not exist.

      • Error.

      • the partitions are ignored and tasks are normally run..

      Data destination

      Parameter

      Description

      Data source

      The Tablestore data source that you selected in the previous step.

      Table

      The destination data table.

      Primary key information

      The primary key information of the destination data table.

      Write mode

      The mode to write data to Tablestore. The following modes are supported:

      • PutRow: This mode corresponds to the PutRow API of Tablestore. It inserts data into a specified row. If the row does not exist, a new row is created. If the row exists, the original row is overwritten.

      • UpdateRow: This mode corresponds to the UpdateRow API of Tablestore. It updates the data in a specified row. If the row does not exist, a new row is created. If the row exists, the values of specified columns in the row are added, modified, or deleted based on the request.

    2. Configure field mapping.

      After you configure the data source and destination, you must specify the mapping between Source Field and Destination Field. The task writes data from the source fields to the destination fields of the corresponding data types based on the mapping. For more information, see 4. Configure field mapping.

      Important
      • You must specify the primary key information in Source Field to read the primary key data.

      • Because you configured the Destination for the destination table in the Primary Key Information section in the previous step, you cannot configure primary key information in Destination Field.

      • If a field has the INTEGER data type, you must configure it as INT. DataWorks automatically converts it to the INTEGER type. If you directly configure the data type as INTEGER, an error is reported in the logs and the task fails.

    3. Configure channel control.

      Use channel control to manage the properties of the data synchronization process. For more information about the parameters, see Relationship between concurrency and rate limiting for batch synchronization.

    4. Click Save to save the configuration.

3. Run the synchronization task
  1. To the right of the task, click Debugging Configuration and select a resource group for running the task.

  2. Click Run.

Step 3: View the synchronization results

After running the synchronization task, view the task status in the logs and check the results in the destination data table in the Tablestore console.

  1. View the task status.

    1. On the Result tab of the synchronization task, check the status of Current task status.

      A status of FINISH for Current task status means the task is complete.

    2. To view detailed logs, click the link for Detail log url.

  2. View the results in the destination data table.

    1. Go to the Instance Management page.

      1. Log on to the Tablestore console.

      2. At the top of the page, select the resource group and region.

      3. On the Overview page, click the instance alias or click Instance Management in the Actions column of the instance.

    2. On the Instance Details tab, click the Tables tab.

    3. On the Tables tab, click the name of the destination data table.

    4. On the Query Data tab, view the data synchronized to the data table.

Data export

You can use DataWorks Data Integration to export data from Tablestore to MaxCompute or OSS.

Billing

  • Using a migration tool to access Tablestore incurs charges for data reads and writes. Once the data is written, Tablestore also charges storage fees based on the data volume. For more information about billing, see billing overview.

  • The DataWorks billing model consists of software fees and resource fees. For more information, see billing introduction.

Other solutions

You can download Tablestore data to a local file.

You can also use migration tools such as DataX and Tunnel Service to import data.

Migration tool

Description

DataX

DataX abstracts data synchronization from various sources by using a Reader plugin to read from the source and a Writer plugin to write to the destination.

Tunnel Service

Tunnel Service is an integrated service for consuming full and incremental data that is built on the Tablestore data API. By creating a data channel for a data table, you can easily process historical and new data from the table. This service is ideal for migrating and synchronizing data from a Tablestore data table. For more information, see Synchronize data from one data table to another.

Data Transmission Service (DTS)

Data Transmission Service (DTS) is a real-time data streaming service provided by Alibaba Cloud. It supports data interaction between data sources such as relational databases (RDBMS), NoSQL databases, and online analytical processing (OLAP) systems. DTS integrates data synchronization, migration, subscription, integration, and processing to help you build a secure, scalable, and highly available data architecture. For more information, see Synchronize PolarDB-X 2.0 data to Tablestore and Migrate PolarDB-X 2.0 data to Tablestore.

Field type mappings

This appendix lists the field type mappings between common services and Tablestore. Use these mappings to configure field mappings.

MaxCompute and Tablestore

MaxCompute type

Tablestore type

STRING

STRING

BIGINT

INTEGER

DOUBLE

DOUBLE

BOOLEAN

BOOLEAN

BINARY

BINARY

MySQL and Tablestore

MySQL type

Tablestore type

STRING

STRING

INT, INTEGER

INTEGER

DOUBLE, FLOAT, DECIMAL

DOUBLE

BOOL, BOOLEAN

BOOLEAN

BINARY

BINARY

Kafka and Tablestore Field Type Mapping

Kafka type

Tablestore type

STRING

STRING

INT8, INT16, INT32, INT64

INTEGER

FLOAT32, FLOAT64

DOUBLE

BOOLEAN

BOOLEAN

BYTES

BINARY