All Products
Search
Document Center

Tablestore:Overview

Last Updated:Feb 28, 2024

You can use DataWorks Data Integration to synchronize full and incremental data in Tablestore to Object Storage Service. This way, Tablestore data is backed up and you can use Tablestore data in OSS.

How it works

The offline synchronization feature of DataWorks Data Integration abstracts the synchronization between different data sources and destinations into a Reader plug-in that is used to read data from the data source and a Writer plug-in that is used to write data to the destination. This allows you to define the data source and destination and use them together with DataWorks scheduling parameters to synchronize full or incremental data from the data source to the destination.

When you synchronize Tablestore data to OSS, you must configure a Tablestore-related Reader plug-in and the OSS-related Writer plug-in for the offline synchronization task. The following items describe the usage notes of the plug-ins.

  • Tablestore-related Reader plug-ins

    The Tablestore-related Reader plug-in that is required varies based on the data synchronization mode that you use. The following table describes the mappings between data synchronization modes and Tablestore-related Reader plug-ins.

    Synchronization mode

    Tablestore-related Reader plug-in

    Plug-in description

    Full export

    Tablestore Reader

    The plug-in is used to read data from Tablestore tables. You can specify the range of data that you want to extract to perform incremental extraction. For more information, see Tablestore data source.

    Incremental synchronization

    OTSStream Reader

    The plug-in is used to export data in Tablestore tables in incremental mode. For more information, see OTSStream data source.

  • OSS-related Writer plug-in

    DataWorks uses the OSS Writer plug-in to write data to OSS, regardless of whether the full export or incremental synchronization mode is used. For more information, see OSS data source.

Synchronization modes

You can configure data filters and use scheduling parameters in offline synchronization tasks to determine whether to synchronize full data or incremental data. The following table describes the synchronization modes.

Synchronization mode

Description

Full export

In this mode, full data in Tablestore is exported to OSS at a time.

If you use this mode, you need to run an offline synchronization task only once. You do not need to configure scheduling parameters for the offline synchronization task.

Incremental synchronization

In this mode, new and modified data in Tablestore is periodically synchronized to OSS.

If you use this mode, you need to configure scheduling parameters for the offline synchronization task. This way, incremental data is periodically synchronized.

Scenarios

You need to back up Tablestore data at lower costs or want to export Tablestore data as files to local devices.

Procedure

The procedure varies based on the synchronization mode that you use. Use the procedure specific to your synchronization mode. For more information, see Export full data from Tablestore to OSS and Synchronize incremental data to OSS.

Full export procedure

image

The following table describes the major steps in full export mode.

Step

Operation

Description

1

Add a data source

This step is performed to specify instance information about the table from which you want to synchronize data. The data source is Tablestore.

2

Add a destination

This step is performed to specify information about the OSS bucket to which you want to synchronize data. The destination is OSS.

3

Create an offline task node

Offline task nodes are required for offline synchronization operations. You need to create an offline task node for each synchronization operation.

4

Configure and start an offline synchronization task

DataWorks Data Integration provides wizard mode and script mode to configure offline synchronization tasks. Select the mode based on your business requirements.

  • Wizard mode: You can configure a data synchronization task on a graphical user interface. This mode is easy to use but provides only limited features.

  • Script mode: You can write JSON scripts for data synchronization to complete data synchronization development. This mode is suitable for advanced users and results in a high learning cost. This mode supports advanced features to facilitate flexible and fine-grained configurations.

5

Verify migration results

After you export data, you can view the imported data in the OSS console.

Incremental synchronization procedure

image

The following table describes the major steps in incremental synchronization mode.

Step

Operation

Description

1

Add a data source

This step is performed to specify instance information about the table from which you want to synchronize data. The data source is Tablestore.

If an existing Tablestore data source meets your business requirements, skip this step.

2

Add a destination

This step is performed to specify information about the OSS bucket to which you want to synchronize data. The destination is OSS.

If an existing OSS data source meets your business requirements, skip this step.

3

Create an offline task node

Offline task nodes are required for offline synchronization operations. You need to create an offline task node for each synchronization operation.

4

Configure and start an offline synchronization task

DataWorks Data Integration provides wizard mode and script mode to configure offline synchronization tasks. Select the mode based on your business requirements.

  • Wizard mode: You can configure a data synchronization task on a graphical user interface. This mode is easy to use but provides only limited features.

  • Script mode: You can write JSON scripts for data synchronization to complete data synchronization development. This mode is suitable for advanced users and results in a high learning cost. This mode supports advanced features to facilitate flexible and fine-grained configurations.

5

Configure scheduling parameters

This step is performed to configure the execution time, rerun property, and scheduling dependencies of the synchronization task so that the synchronization task can be periodically executed.

6

Debug code and submit the task

After the debugging is successful, submit the offline synchronization task to the server so that the task can be periodically executed based on the scheduling properties.

7

View task execution results

You can view the task running status in the DataWorks console and view the data synchronization results in the OSS console.

Billing

  • When you synchronize data from Tablestore to OSS, you are charged by Tablestore for reading Tablestore data based on the number of capacity units (CUs) that are consumed. You are charged separately for metered read CUs and reserved read CUs. Whether metered read CUs or reserved read CUs are consumed varies based on the type of the instance that you access. For more information, see Billing overview.

    Note

    For more information about instance types and CUs, see Instance and Read and write throughput.

  • After data is synchronized to OSS, you are charged by OSS for the storage of data files based on the storage usage and duration. If you download objects from OSS to local devices, you are charged by OSS for the number of GET API requests and the amount of outbound traffic over the Internet. For more information, see Billing overview.