You can use the Data Integration feature of DataWorks to synchronize incremental and full data from Tablestore to MaxCompute.
Working principles
DataWorks Data Integration can be used to synchronize large amounts of offline data. Data Integration facilitates data transmission between diverse structured and semi-structured data sources. It provides readers and writers for the supported data sources and defines a data transmission channel between the sources and destinations based on simplified data types.
When you synchronize Tablestore data to MaxCompute, you must configure a Tablestore-related Reader plug-in and the MaxCompute-related Writer plug-in for the offline synchronization task. The following items describe the usage notes of the plug-ins.
Tablestore-related Reader plug-ins
The Tablestore-related Reader plug-in that is required varies based on the data synchronization mode that you use. The following table describes the mappings between data synchronization modes and Tablestore-related Reader plug-ins.
Synchronization mode
Tablestore-related Reader plug-in
Plug-in description
Full export
Tablestore Reader
The plug-in is used to read data from Tablestore tables. You can specify the range of data that you want to extract to perform incremental extraction. For more information, see Tablestore data source.
Incremental synchronization
OTSStream Reader
The plug-in is used to export data in Tablestore tables in incremental mode. For more information, see Tablestore Stream data source.
MaxCompute-related Writer plug-in
DataWorks uses the MaxCompute-related Writer plug-in to write data to MaxCompute, regardless of whether the full export or incremental synchronization mode is used. For more information, see MaxCompute data source.
Synchronization modes
You can export full data from Tablestore to MaxCompute at a time. For more information, see Export full data from Tablestore to MaxCompute.
Prerequisites
The instance and table information to be synchronized from Tablestore to MaxCompute is confirmed and recorded.
DataWorks is activated and a workspace is created. For more information, see Activate DataWorks and Create a workspace.
A MaxCompute table is created. For more information, see Create and manage MaxCompute tables.
A Resource Access Management (RAM) user for which an AccessKey pair is created is created. The AliyunOTSFullAccess policy is attached to the RAM user to grant the RAM user the permissions to manage Tablestore and the AliyunDataWorksFullAccess policy is attached to the RAM user to grant the RAM user the permissions to manage DataWorks. For more information, see Use the AccessKey pair of a RAM user to access Tablestore.
ImportantTo prevent security risks caused by the leakage of the AccessKey pair of your Alibaba Cloud account, we recommend that you use the AccessKey pair of a RAM user.