This topic describes how to use the DataWorks console to export full data from Tablestore to MaxCompute.

Step 1: Add a Tablestore data source

To add a Tablestore database as the data source, perform the following steps.

  1. Go to the Data Integration homepage.
    1. Log on to the DataWorks console as a project administrator.
      Note Only the project administrator role can be used to add data sources. Members who assume other roles can only view data sources.
    2. Select a region. In the left-side navigation pane, click Workspaces.
    3. On the Workspaces page, click Data Integration in the Actions column that corresponds to the required workspace.
  2. Add a data source.
    1. In the Data Integration console, choose Data Source > Data Sources.
    2. On the Data Source page, click Add data source in the upper-right corner.
    3. In the Add data source dialog box, click OTS in the NoSQL section.
    4. In the Add OTS data source dialog box, configure the parameters.
      fig_otssource
      Parameter Description
      Data source The name of the data source. Example: gps_data.
      Description The description of the data source.
      Endpoint The endpoint of the Tablestore instance. For more information, see Endpoint.
      • If the Tablestore instance is in the same region as the MaxCompute project, enter the endpoint to access the Tablestore instance over the classic network.
      • If the Tablestore instance is not in the same region as the MaxCompute project, enter the public endpoint.
      • Do not enter a virtual private cloud (VPC) endpoint.
      Table Store instance name The name of the Tablestore instance.
      AccessKey ID The AccessKey ID and AccessKey secret of your logon account. For more information about how to obtain the AccessKey ID and the AccessKey secret, see Create an AccessKey pair for a RAM user.
      AccessKey Secret
    5. Click Test connectivity to test the connectivity of the data source.
  3. Click Complete.
    On the Data Source page, information about the data source appears.

Step 2: Add a MaxCompute data source

The procedure is similar to that in Step 1, except that in the Add data source dialog box, you must click MaxCompute in the Big Data Storage section.

In this example, the following figure shows that the data source is named OTS2ODPS.

fig_odps_001

Step 3: Configure a synchronization task

To create and configure a task to synchronize data from Tablestore to MaxCompute, perform the following steps:

  1. Go to Data Analytics.
    1. Log on to the DataWorks console as a project administrator.
    2. Select a region. In the left-side navigation pane, click Workspaces.
    3. On the Workspaces page, click Data Analytics in the Actions column that corresponds to the workspace.
  2. On the Data Analytics page of the DataStudio console, click Business Flow and select a business flow.

    For more information about how to create a business flow, see Create a workflow.

  3. Create a synchronization task node.
    You must create a node for each synchronization task.
    1. Right-click Data Integration and then choose Create > Batch synchronization.
      You can also move the pointer over the fig_addnode icon, and then choose Data Integration > Batch synchronization to create a node.
    2. In the Create Node dialog box, set Node Name and Location.
      fig_newtask
    3. Click Commit.
  4. Configure the Tablestore data source.
    1. Click Data Integration. Double-click the name of the node for the data synchronization task.
    2. On the edit page of the synchronization task node, configure Source and Target in the Connections section.
      • Configure Source.

        Set Connection to OTS for Source.

      • Configure Target.

        In the Target section, select ODPS from the drop-down list next to Connection, and set Table.

      fig_ots2odps
    3. Click the script icon or Switch to the code editor to configure the script.

      Tablestore supports only the script mode to configure the connection. When you use the script to configure the connection, you must configure Tablestore Reader and MaxCompute Writer plug-ins. For more information about specific operations, see Configure Tablestore Reader and MaxCompute Writer.

      On the configuration page of the script, configure the parameters based on the following example:
      {
      "type": "job",
      "version": "1.0",
      "configuration": {
      "setting": {
        "errorLimit": {
          "record": "0"    # The maximum allowable number of errors that occur. 
        },
        "speed": {
          "mbps": "1",   # The maximum amount of traffic. Unit: MB. 
          "concurrent": "1"  # The number of concurrent threads. 
        }
      },
      "reader": {
        "plugin": "ots",  # The name of the plug-in used to read data. 
        "parameter": {
          "datasource": "",  # The name of the data source. 
          "table": "",  # The name of the data table. 
          "column": [  # The names of the columns in Tablestore that need to export to MaxCompute. 
            {
              "name": "column1"
            },
            {
              "name": "column2"
            },
            {
              "name": "column3"
            },
            {
              "name": "column4"
            },
            {
              "name": "column5"
            }
          ],
          "range": "range": {  # The range of data to export. In the full export mode, the range is from INF_MIN to INF_MAX. 
            "begin": [ # The range of data to export. In full export mode, the range is from INF_MIN to INF_MAX. The number of configuration items in begin must be the same as the number of primary key columns in the data table in Tablestore. 
              {
                "type": "INF_MIN"
              },
              {
                "type": "INF_MIN"
              },
              {
                "type": "STRING",  # The position from which to export data in the third column starts from begin1. 
                "value": "begin1"
              },
              {
                "type": "type": "INT",  # The position from which to export data in the fourth column starts from 0. 
                "value": "0"
              }
            ],
            "end": [  # The position at which data export ends. 
              {
                "type": "INF_MAX"
              },
              {
                "type": "INF_MAX"
              },
              {
                "type": "STRING",
                "value": "end1"
              },
              {
                "type": "INT",
                "value": "100"
              }
            ],
            "split": [  # Specify the partition range. Typically, this parameter can be left empty. If the read performance is poor, submit a ticket or join the DingTalk group 23307953 to contact Tablestore technical support. 
              {
                "type": "STRING",
                "value": "splitPoint1"
              },
              {
                "type": "STRING",
                "value": "splitPoint2"
              },
              {
                "type": "STRING",
                "value": "splitPoint3"
              }
            ]
          }
        }
      },
      "writer": {
        "plugin": "odps",  # The name of the plug-in used to write data to MaxCompute. 
        "parameter": {
          "datasource": "",  # The name of the data source of MaxCompute. 
          "column": [],  # The names of the columns in MaxCompute. The column names are sorted in the same order as in Tablestore. 
          "table": "",  # The name of the table in MaxCompute. The table must be created before you run the task. Otherwise, the task may fail. 
          "partition": "",  # This parameter is required if the MaxCompute table is partitioned. Do not specify this parameter if the table is not partitioned. The partition to which data is written. The last-level partition must be specified. 
          "truncate": false  #  Specify whether to delete all previous data. 
        }
      }
      }
      }
      You can use the begin and end parameters to configure the range of data to export. For example, a data table contains the pk1 and pk2 primary key columns. The pk1 column is of the STRING type. The pk2 column is of the INTEGER type.
      • To export full data from the data table, configure the following parameters:
        "begin": [ # The position from which data export starts. 
          {
            "type": "INF_MIN"
          },
          {
            "type": "INF_MIN"
          }
        ],
        "end": [  # The position at which data export ends. 
          {
            "type": "INF_MAX"
          },
          {
            "type": "INF_MAX"
          }
        ],
      • To export data from the rows where the value of pk1 is "tablestore", configure the following parameters:
        "begin": [ # The position from which data export starts. 
          {
            "type": "STRING",
            "value": "tablestore"
          },
          {
            "type": "INF_MIN"
          }
        ],
        "end": [  # The position at which data export ends. 
          {
            "type": "STRING",
            "value": "tablestore"
          },
          {
            "type": "INF_MAX"
          }
        ],
    4. Click the save icon to save the data source configurations.
  5. Run the synchronization task.
    1. Click the start icon.
    2. In the Arguments dialog box, select the resource group for scheduling.
    3. Click OK to run the task.
      After the task is run, you can check whether the task was successful and the number of rows of exported data on the Runtime Log tab.
  6. Configure the scheduling parameters.
    You can configure the running time, rerun properties, and scheduling dependencies of the synchronization task in Properties.
    1. In the hierarchy tree, click Data Integration. Double-click the name of the synchronization task node.
    2. On the right side of the edit page of the synchronization task node, click Properties to configure the scheduling parameters. For more information, see Configure recurrence and dependencies for a node.
  7. Submit the synchronization task.
    1. On the edit page of the synchronization task node, click the submit icon.
    2. In the Commit Node dialog box, enter your comments in the Change description field.
    3. Click OK.
      After the synchronization task is submitted to the scheduling system, the scheduling system runs the synchronization task at the scheduled time based on the configured scheduling parameters.

Step 4: View the synchronization task

  1. Go to Operation Center.
    Note You can also click Operation Center in the upper-right corner of the DataStudio console to go to Operation Center.
    1. Log on to the DataWorks console as a project administrator.
    2. Select a region. In the left-side navigation pane, click Workspaces.
    3. On the Workspaces page, click Operation Center in the Actions column that corresponds to the required workspace.
  2. In the left-side navigation pane of the Operation Center console, choose Cycle Task Maintenance > Cycle Task.
  3. On the Cycle Task page, view the details about the submitted synchronization task.
    • In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Instance to view the task that is scheduled to run on the current date. Click the instance name to view the task running details.
    • You can view logs while a task is running or after the task is completed.

Step 5: View the data imported to MaxCompute

  1. Go to the DataMap console.
    1. Log on to the DataWorks console as a project administrator.
    2. Select a region. In the left-side navigation pane, click Workspaces.
    3. On the Workspaces page, click Data Map in the Actions column that corresponds to a workspace.
  2. In the top navigation bar of the DataMap console, choose My Data > Managed by Me.
  3. On the Managed by Me tab, click the name of the imported table.
  4. On the table details page, click the Data Preview tab to view the imported data.