All Products
Search
Document Center

Tablestore:Synchronize data between data tables

Last Updated:Mar 18, 2025

This topic describes how to migrate or synchronize data between data tables in Tablestore by using Tunnel Service, DataWorks, DataX, or the Tablestore CLI.

Prerequisites

  • The following information is obtained: the names, endpoints, and region IDs of the instances to which the source and destination data tables belong.

  • The destination data table is created. The number and types of primary key columns of the destination data table are the same as the number and types of primary key columns of the source data table. For more information, see Create a data table.

  • An AccessKey pair of an Alibaba Cloud account or a Resource Access Management (RAM) user is obtained. For more information, see How do I obtain an AccessKey pair?

    Important

    For security purposes, we recommend that you use Tablestore features as a RAM user. You can create a RAM user, attach the AliyunOTSFullAccess policy to the RAM user to grant the RAM user the permissions to manage Tablestore, and then create an AccessKey pair for the RAM user. For more information, see Use the AccessKey pair of a RAM user to access Tablestore.

Use Tunnel Service to synchronize data

After you create a tunnel for the source data table, you can use a Tablestore SDK to synchronize data. During the synchronization process, you can specify a custom business processing logic to process data.

  1. Create a tunnel for the source data table and record the tunnel ID. For more information, see Create a tunnel.

  2. Use a Tablestore SDK to synchronize data.

    Sample code in Java:

    import com.alicloud.openservices.tablestore.SyncClient;
    import com.alicloud.openservices.tablestore.TunnelClient;
    import com.alicloud.openservices.tablestore.model.StreamRecord;
    import com.alicloud.openservices.tablestore.tunnel.worker.IChannelProcessor;
    import com.alicloud.openservices.tablestore.tunnel.worker.ProcessRecordsInput;
    import com.alicloud.openservices.tablestore.tunnel.worker.TunnelWorker;
    import com.alicloud.openservices.tablestore.tunnel.worker.TunnelWorkerConfig;
    
    import java.util.List;
    
    public class TunnelSample {
    
        public static void main(String[] args) {
            // Specify the name of the instance to which the source data table belongs.
            final String sourceInstanceName = "sourceInstanceName";
            // Specify the endpoint of the instance to which the source data table belongs.
            final String sourceEndpoint = "sourceEndpoint";
            // Obtain the AccessKey ID and AccessKey secret from the environment variables. You can use the AccessKey ID and AccessKey secret to access the instance to which the source data table belongs.
            final String sourceAccessKeyId = System.getenv("SOURCE_TABLESTORE_ACCESS_KEY_ID");
            final String sourceKeySecret = System.getenv("SOURCE_TABLESTORE_ACCESS_KEY_SECRET");
    
            // Initialize a TunnelClient.
            TunnelClient tunnelClient = new TunnelClient(sourceEndpoint, sourceAccessKeyId, sourceKeySecret, sourceInstanceName);
    
            // Configure the TunnelWorkerConfig.
            TunnelWorkerConfig config = new TunnelWorkerConfig(new SimpleProcessor());
    
            // Configure TunnelWorker and start automatic data processing. 
            // Specify the tunnel ID. You can query the tunnel ID on the Tunnels tab of the source data table in the Tablestore console. You can also query the tunnel ID by calling the ListTunnel or DescribeTunnel operation. 
            TunnelWorker worker = new TunnelWorker("tunnelId", tunnelClient, config);
            try {
                worker.connectAndWorking();
            } catch (Exception e) {
                e.printStackTrace();
                worker.shutdown();
                tunnelClient.shutdown();
            }
        }
    
        public static class SimpleProcessor implements IChannelProcessor {
            // Specify the name of the instance to which the destination data table belongs.
            final String targetInstanceName = "targetInstanceName";
            // Specify the endpoint of the instance to which the destination data table belongs.
            final String targetEndpoint = "targetEndpoint";
            // Obtain the AccessKey ID and AccessKey secret from the environment variables. You can use the AccessKey ID and AccessKey secret to access the instance to which the destination data table belongs.
            final String targetAccessKeyId = System.getenv("TARGET_TABLESTORE_ACCESS_KEY_ID");
            final String targetKeySecret = System.getenv("TARGET_TABLESTORE_ACCESS_KEY_SECRET");
    
            // Initialize a Tablestore client to interact with the instance to which the destination data table belongs.
            SyncClient client = new SyncClient(targetEndpoint, targetAccessKeyId, targetKeySecret, targetInstanceName);
    
            @Override
            public void process(ProcessRecordsInput processRecordsInput) {
                // Incremental or full data is returned in ProcessRecordsInput. 
                List<StreamRecord> list = processRecordsInput.getRecords();
                for (StreamRecord streamRecord : list) {
                    // Specify a custom business processing logic. 
                    switch (streamRecord.getRecordType()) {
                        case PUT:
                            // putRow
                            break;
                        case UPDATE:
                            // updateRow
                            break;
                        case DELETE:
                            // deleteRow
                            break;
                    }
                    System.out.println(streamRecord.toString());
                }
            }
    
            @Override
            public void shutdown() {
    
            }
        }
    }

Use DataWorks or DataX to synchronize data

You can use DataWorks or DataX to synchronize data between data tables in Tablestore. In this example, DataWorks is used.

Prerequisites

DataWorks is activated and a workspace is created. For more information, see Activate DataWorks and Create a workspace.

Step 1: Add Tablestore data sources

Add Tablestore data sources for the instances to which the source and destination data tables belong.

  1. Go to the Data Integration page.

    Log on to the DataWorks console, select a region in the upper-left corner, choose Data Development and Governance > Data Integration, select a workspace from the drop-down list, and then click Go to Data Integration.

  2. In the left-side navigation pane, click Data Source.

  3. On the Data Source page, click Add Data Source.

  4. In the Add Data Source dialog box, click the Tablestore block.

  5. In the Add OTS data source dialog box, configure the parameters that are described in the following table.

    Parameter

    Description

    Data Source Name

    The name of the data source. The name can contain letters, digits, and underscores (_), and must start with a letter.

    Data Source Description

    The description of the data source. The description cannot exceed 80 characters in length.

    Endpoint

    The endpoint of the Tablestore instance. For more information, see Endpoints.

    If the Tablestore instance and the resources of the destination data source are in the same region, enter a virtual private cloud (VPC) endpoint. Otherwise, enter a public endpoint.

    Table Store instance name

    The name of the Tablestore instance. For more information, see Instance.

    AccessKey ID

    The AccessKey ID and AccessKey secret of your Alibaba Cloud account or RAM user. For more information about how to create an AccessKey pair, see Create an AccessKey pair.

    AccessKey Secret

  6. Test the network connectivity between the data source and the resource group that you select.

    To ensure that your synchronization nodes run as expected, you need to test the connectivity between the data source and all types of resource groups on which your synchronization nodes will run.

    Important

    A synchronization task can use only one type of resource group. By default, only shared resource groups for Data Integration are displayed in the resource group list. To ensure the stability and performance of data synchronization, we recommend that you use an exclusive resource group for Data Integration.

    1. Click Purchase to create a new resource group or click Associate Purchased Resource Group to associate an existing resource group. For more information, see Create and use an exclusive resource group for Data Integration.

    2. After the resource group is started, click Test Network Connectivity in the Connection Status (Production Environment) column of the resource group.

      If Connected is displayed, the connectivity test is passed.

  7. If the data source passes the network connectivity test, click Complete.

    The newly created data source is displayed in the data source list.

Step 2: Create a synchronization node

  1. Go to the DataStudio console.

    Log on to the DataWorks console, select a region in the upper-left corner, choose Data Development and Governance > Data Development, select a workspace from the drop-down list, and then click Go to DataStudio.

  2. On the Scheduled Workflow page of the DataStudio console, click Business Flow and select a business flow.

    For information about how to create a workflow, see Create a workflow.

  3. Right-click the Data Integration node and choose Create Node > Offline synchronization.

  4. In the Create Node dialog box, select a path and enter a node name.

  5. Click Confirm.

    The newly created offline synchronization node will be displayed under the Data Integration node.

Step 3: Configure and run a batch synchronization task

  1. Double-click the new synchronization node under Data Integration.

  2. Configure network connections and a resource group.

    Select the source and destination for the batch synchronization task, and the resource group that is used to run the batch synchronization task. Establish network connections between the resource group and data sources, and test the connectivity.

    Important

    Data synchronization nodes must be run by using resource groups. Select a resource group and make sure that network connections between the resource group and the source and destination are established.

    1. In the Configure Network Connections and Resource Group step, select Tablestore from the Source drop-down list and set the Data Source Name parameter to the source that you created.

    2. Select a resource group.

      After you select a resource group, the system displays the region and specifications of the resource group. The system automatically tests the connectivity between the resource group and the source.

      Important

      Make sure that the resource group is the same as that you selected when you created the data source.

    3. Select Tablestore from the Destination drop-down list and set the Data Source Name parameter to the destination that you created.

      The system automatically tests the connectivity between the resource group and the destination.

    4. Click Next.

  3. Configure and save the task.

    Use the code editor

    1. In the Configure tasks step, click the image.png icon. In the message that appears, click OK.

    2. In the code editor, configure the script. For information about how to configure the script, see Tablestore data source.

      • Configure Tablestore Reader

        Tablestore Reader reads data from Tablestore. You can specify a data range to extract incremental data from Tablestore. For more information, see Appendix: Code and parameters.

      • Configure Tablestore Writer

        By using Tablestore SDK for Java, Tablestore Writer connects to the Tablestore server and writes data to the Tablestore server. Tablestore Writer provides features that you can use to optimize the write process, such as retries upon write timeouts, retries upon write exceptions, and batch submission. For more information, see Appendix: Code and parameters.

    3. Click the image.png icon to save the configurations.

    Use the codeless UI

    1. In the Configure Source and Destination section of the Configure tasks step, configure the source and destination based on your business requirements.

      Source

      Parameter

      Description

      Table

      The name of the source data table.

      Range of Primary Key(begin)

      The start primary key and the end primary key that are used to specify the range of the data that you want to read. The values must be a JSON array.

      The start primary key and the end primary key must be valid primary keys or virtual points that consist of values of the INF_MIN and INF_MAX types. The number of columns in the virtual points must be the same as the number of primary key columns. INF_MIN specifies an infinitely small value. All values of other types are greater than INF_MIN. INF_MAX specifies an infinitely great value. All values of other types are smaller than INF_MAX.

      The rows in the data table are sorted in ascending order by primary key. The range of the data that you want to read is a left-closed, right-open interval. All rows whose primary keys are greater than or equal to the start primary key and less than the end primary key are returned.

      Range of Primary Key(end)

      Split configuration information

      The custom rule used to split data. We recommend that you do not configure this parameter in common scenarios.

      If data is unevenly distributed in a Tablestore table and the automatic splitting feature of Tablestore Reader fails, you can specify a custom rule to split data. You can configure a split key within the range between the start primary key and the end primary key. You do not need to specify all primary keys. The value is a JSON array.

      Destination

      Parameter

      Description

      Table

      The name of the destination data table.

      primaryKeyInfo

      The primary key information of the destination data table.

      WriteMode

      The mode in which data is written to Tablestore. Valid values:

      • PutRow: inserts data into the specified row. This mode corresponds to the Tablestore PutRow API operation. If the specified row does not exist, a new row is added. If the specified row exists, the row is overwritten.

      • UpdateRow: updates data in the specified row. This mode corresponds to the Tablestore UpdateRow API operation. If the specified row does not exist, a new row is added. If the specified row exists, the values of the specified columns in the row are added, modified, or removed based on the content of the request.

    2. Configure mappings between source fields and destination fields.

      After you configure the source and destination, you must configure mappings between source fields and destination fields. After you configure the mappings, the batch synchronization task writes the values of the source fields to the destination fields of the specific data type based on the mappings.

      image

    3. Configure channel control policies.

      You can configure channel control policies to define attributes for data synchronization. For more information, see Channel control settings for batch synchronization.

    4. Click the image.png icon to save the configurations.

  4. Run the synchronization task.

    Note

    In most cases, you need to perform full data synchronization only once. You do not need to configure scheduling properties.

    1. Click the 1680170333627-a1e19a43-4e2a-4340-9564-f53f2fa6806e icon.

    2. In the Parameters dialog box, select the name of the resource group from the drop-down list.

    3. Click Run.

      After the synchronization task is complete, click the URL of the run log on the Result tab to go to the details page of the run log. On the details page of the run log, check the value of Current task status.

      If the value of Current task status is FINISH, the task is complete.

Use the Tablestore CLI to migrate data

Important

When you use the Tablestore CLI to migrate data, you must manually export the data of the source table as a local JSON file and then import the data in the file to the destination table. This method is suitable for scenarios in which you want to migrate a small amount of data. If you want to migrate a large amount of data, we recommend that you do not use this method.

Prerequisites

The Tablestore CLI is downloaded. For more information, see Download the Tablestore CLI.

Step 1: Export data from the source table

  1. Start the Tablestore CLI and configure the access information of the instance where the source table resides. For more information, see Start the Tablestore CLI and configure access information.

    Run the config command to configure access information.

    Before you run the command, replace the endpoint and name of the instance, AccessKey ID, and AccessKey secret in the command with the actual endpoint and name of the instance where the source table resides, AccessKey ID, and AccessKey secret.
    config --endpoint https://myinstance.cn-hangzhou.ots.aliyuncs.com --instance myinstance --id NTSVL******************** --key 7NR2****************************************
  2. Export data.

    1. Run the use command to use the source table. In this example, the source table is named source_table.

      use --wc -t source_table
    2. Export data from the source table to a local JSON file. For more information, see Export data.

      scan -o /tmp/sourceData.json

Step 2: Import data to the destination table

  1. Configure the access information of the instance where the destination table resides.

    Run the config command to configure access information.

    Before you run the command, replace the endpoint and name of the instance, AccessKey ID, and AccessKey secret in the command with the actual endpoint and name of the instance where the destination table resides, AccessKey ID, and AccessKey secret.
    config --endpoint https://myinstance.cn-hangzhou.ots.aliyuncs.com --instance myinstance --id NTSVL******************** --key 7NR2****************************************
  2. Import data.

    1. Run the use command to use the destination table. In this example, the destination table is named target_table.

      use --wc -t target_table
    2. Import data from the local JSON file to the destination table. For more information, see Import data.

      import -i /tmp/sourceData.json 

References

If you want to migrate data across accounts or regions, use DataX to connect to a virtual private cloud (VPC) over the Internet or Cloud Enterprise Network (CEN). For information about how to use CEN, see Scenario-based networking for VPC connections.