This topic describes how to use the DataWorks console to synchronize full data from Tablestore to Object Storage Service (OSS). The objects synchronized to OSS can be downloaded and stored in OSS as the backup of the data in Tablestore.

Step 1: Add a Tablestore data source

To add a Tablestore data source, perform the following steps:

  1. Go to the Data Integration homepage.
    1. Log on to the DataWorks console as a project administrator.
      Note Only the project administrator role can be used to add data sources. Members who assume other roles can only view data sources.
    2. Select a region. In the left-side navigation pane, click Workspaces.
    3. On the Workspaces page, click Data Integration in the Actions column that corresponds to the required workspace.
  2. Add a data source.
    1. In the Data Integration console, choose Data Source > Data Sources.
    2. On the Data Source page, click Add data source in the upper-right corner.
    3. In the Add data source dialog box, click OTS in the NoSQL section.
    4. In the Add OTS data source dialog box, configure the parameters.
      fig_otssource
      Parameter Description
      Data source The name of the data source.
      Description The description of the data source.
      Endpoint The endpoint of the Tablestore instance. For more information, see Endpoint.
      • If the Tablestore instance is in the same region as the OSS bucket, enter the endpoint to access the Tablestore instance over the classic network.
      • If the Tablestore instance and the OSS bucket are located in different regions, enter the public IP address.
      • Do not enter a virtual private cloud (VPC) endpoint.
      Table Store instance name The name of the Tablestore instance.
      AccessKey ID The AccessKey ID and AccessKey secret of your logon account. For more information about how to obtain the AccessKey ID and AccessKey secret, see Create an AccessKey pair for a RAM user.
      AccessKey Secret
    5. Click Test connectivity to test the connectivity of the data source.
  3. Click Complete.
    On the Data Source page, information about the data source appears.

Step 2: Add an OSS data source

The operations are similar to those of Step 1. However, in this step, you must click OSS in the Semi-structuredstorage section.
Note When you configure an OSS data source, make sure that the endpoint does not contain the bucket name.
In this example, the following figure shows that the data source is named OTS2OSS. fig_oss_001

Step 3: Create a synchronization task

To create and configure a task to synchronize data from Tablestore to OSS, perform the following steps:

  1. Go to Data Analytics.
    1. Log on to the DataWorks console as a project administrator.
    2. Select a region. In the left-side navigation pane, click Workspaces.
    3. On the Workspaces page, click Data Analytics in the Actions column that corresponds to the workspace.
  2. On the Data Analytics page of the DataStudio console, click Business Flow and select a business flow.

    For more information about how to create a business flow, see Create a workflow.

  3. Create a synchronization task node.
    You must create a node for each synchronization task.
    1. Right-click Data Integration and then choose Create > Batch synchronization.
      You can also move the pointer over the fig_addnode icon, and then choose Data Integration > Batch synchronization to create a node.
    2. In the Create Node dialog box, configure Node Name and Location.
      fig_Oss0002
    3. Click Commit.
  4. Configure the Tablestore data source.
    1. In the hierarchy tree, click Data Integration. Double-click the name of the node for the data synchronization task.
    2. On the edit page of the synchronization task node, configure Source and Target in the Connections section.
      • Configure Source.

        Set Connection to OTS for Source.

      • Configure Target.

        Set Connection to OSS for Target. Configure the data source.

      fig_oss003
    3. Click the script icon or Switch to the code editor to configure the script.

      Tablestore supports only the script mode to configure the connection. When you use the script to configure the connection, you must configure Tablestore Reader and OSS Writer plug-ins. For more information about specific operations, see Tablestore Reader and OSS Writer.

      On the configuration page of the script, configure the parameters based on the following example:

      {
      "type": "job",    # Do not change the value. 
      "version": "1.0",  # Do not change the value. 
      "configuration": {
       "setting": {
         "errorLimit": {
           "record": "0"  # When the number of errors exceeds the value of record, the task fails to be imported. 
         },
         "speed": {
           "mbps": "1",  # The rate at which to import data. Unit: MB/s. 
           "concurrent": "1"  # The number of concurrent threads. 
         }
       },
       "reader": {
         "plugin": "ots",  # Do not change the value. 
         "parameter": {
           "datasource": "",   # The name of the data source from which data is integrated. You must configure the name of the data source before data is integrated. You can select a data source of Tablestore or enter authentication information such as the AccessKey ID in plaintext. We recommend that you use a data source. 
           "table": "",    # The name of the data table in Tablestore. 
           "column": [   # Required. The names of the columns you want to export to OSS. 
             {
               "name": "column1"    # The name of the column in Tablestore. This column is to be exported to OSS. 
             },
             {
               "name": "column2"   # The name of the column in Tablestore. This column is to be exported to OSS. 
             }
           ],
           "range": {
             "begin": [
               {
                 "type": "INF_MIN"   # The starting position of the first primary key column in Tablestore. To export full data, set the parameter to INF_MIN. To export only part of the data, set the parameter based on your requirements. If the table contains multiple primary key columns, configure the information about the corresponding primary key columns for the begin parameter. 
               }
             ],
             "end": [
               {
                 "type": "type": "INF_MAX"   # The ending position of the first primary key column in Tablestore. To export full data, set the parameter to INF_MAX. To export part of data, set the parameter based on your requirements. If the data table contains multiple primary key columns, configure the information about the corresponding primary key columns for the end parameter. 
               }
             ],
             "split": [  # Configure the information about the partitions of the Tablestore data table. You can use this feature to accelerate data export. This parameter is automatically configured in the next version. 
             ]
           }
         }
       },
       "writer": {
         "plugin": "oss",
         "parameter": {
           "datasource": "",  # Configure the OSS data source. 
           "object": "",  # The prefix of the object name. The prefix excludes bucket names. Example: tablestore/20171111/. To perform scheduled export, you must use variables in the prefix. Example of a variable: tablestore/${date}. Then, specify the ${date} value when you configure scheduling parameters. 
           "writeMode": "truncate", # The operation the system performs when files of the same name exist. Use truncate to export full data. Valid values: truncate, append, and nonConflict. truncate specifies that files of the same name are cleared. append specifies that data is appended to the content of files of the same name. nonConflict specifies that an error is reported if files of the same name exist. 
           "fileFormat": "csv", # The format of the file. Valid values: csv, txt, and parquet. 
           "encoding": "UTF-8",  # The encoding type. 
           "nullFormat": "null", # The string used to define the null value. The value can be an empty string. 
           "dateFormat": "yyyy-MM-dd HH:mm:ss",  # The time format. 
           "fieldDelimiter": "," # The delimiter used to separate each column. 
         }
       }
      }
      }
    4. Click the save icon to save the data source configurations.
    Note
    • Full data export is used to export all data at a time. Therefore, you do not need to configure scheduling parameters. To configure scheduling parameters, configure scheduling parameters in Synchronize incremental data.
    • If the script configurations contain variables such as ${date}, set the variable to a specific value when you run the task to synchronize data.
  5. Run the synchronization task.
    1. Click the start icon.
    2. In the Arguments dialog box, select the resource group for scheduling.
    3. Click OK to run the task.
      After the task is run, you can check whether the task was successful and the number of rows of data exported on the Runtime Log tab.

Step 4: View the data exported to OSS

  1. Log on to the OSS console.
  2. Select the corresponding bucket and object name. You can check whether the object contains the content as expected after you download the object.