Data Integration supports data synchronization in wizard mode and script mode. The wizard mode is simpler while the script mode is more flexible.

This chapter describes how to synchronize incremental data (generated by the Put, Update, and Delete actions) from Table Store to MaxCompute through the Table Store feature in a near-real-time manner.
Note Because the offline synchronization mode is used, a latency of about 10 minutes exists.

Step 1. Create Table Store data source

  1. Log on to the Data IDE.
  2. If you are using Data Integration for the first time, you must first create a Data Integration project.
  3. On the Data Sources page, click New Source.
  4. Select Table Store as the data source.
  5. Set parameters and click test connectivity.

    The parameters are described as follows.
    Parameter Description
    Name Name of the Table Store data source. This example uses gps_data.
    Description Description of the data source.
    Endpoint Enter the instance address on the Table Store instance page.
    • If the Table Store instance is in the same region as the MaxCompute instance, enter the private network address.
    • If the Table Store instance is not in the same region as the MaxCompute instance, enter the public network address.
    Note You cannot enter the VPC address.
    Table Store ID Name of the Table Store instance.
    Access ID AccessKeyID of the logon account.
    Access Key AccessKeySecret corresponding to the AccessKeyID of the logon account.
    Note If the connectivity test fails, check whether the endpoint and instance name are correct. If the problem persists, open a ticket.
  6. Click complete. Information about the Table Store data source is displayed on the Data Sources page.

Step 2. Create MaxCompute data source

This operation is similar to Step 1. You only need to select MaxCompute as the data source.

In this example, the MaxCompute data source is named OTS2ODPS.

Step 3:  Create an incremental real-time data tunnel

  1. On the Data IDE page, click Sync Tasks.
  2. At the right side of the page, click Create a synchronization task.
  3. Select Wizard mode.
  4. Select the Table Store data source created in Step 1.

    The parameters are described as follows.

    Parameter Description
    Data sources The Table Store data source you created. In this example, gps_data is selected.
    Table Data Integration automatically obtains the latest data table from Table Store.

    Stream must be activated for the selected table. If Stream is not activated, click Activate Stream in One Click at the right side to activate Stream.

    The incremental data is valid for up to 24 hours.

    Start time Start time of incremental export.

    For a periodic task, the variable value is required. The default value is ${start_time}.

    End time End time of incremental export.

    For a periodic task, the variable value is required. The default value is ${end_time}.

    Status table It is used to store status values during incremental export. The default value is recommended.
    Maximum number of retries It indicates the maximum number of retries to perform during when the network is unstable. The default value is 30. You can set the value as needed.
    Export time series information It indicates whether the exported data contains the time information. It is not selected by default.
  5. On the Select Target page, select the MaxCompute data source created in Step 2.

    The parameters are described as follows.

    Parameter Description
    Data sources The MaxCompute data source you created. In this example, OTS2ODPS is selected.
    Table Select a table in this data source. If no table is available, at the right side click Create New Target Table to create a table. In the dialog box that appears, replace your_table_name with the name of the table to be created, for example, ots_gps_data. (Because timestamp is a reserved field in MaxCompute and cannot be used in this box, ts can be used to represent timestamp if necessary.)http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/61034/cn_zh/1510908157657/61034-6.png
    Partition information The default value is ${bdp.system.bizdate}, indicating data in MaxCompute is partitioned by date.
    Cleaning rule Select Clean Existing Data Insert Overwrite Before Writing.
  6. On the Field Mapping page, make sure the Table Store table maps the MaxCompute table.
  7. On the Channel Control page, set the parameters.

    The parameters are described as follows.

    Parameter Description
    Job speed limit Range: 1 MB/s to 20 MB/S. To request a higher job speed limit, open a ticket.
    Number of concurrent jobs The maximum value is 10. Maximum rate of a job = Task speed limit/Number of concurrent jobs
    Number of error records The task fails when the number of error records exceeds the value. The default value is 0.
  8. On the preview page, check the configurations.
  9. Click Save. In this example, the task name that is saved is OTStoODPS.

Step 4. Set scheduling parameters

  1. At the top of the page, click Data Development.
  2. On the Task Development tab, double-click the created task OTStoODPS.

  3. Click Scheduling configuration to set the scheduling parameters.

    To set the task to run on the next day, configure the following parameters as shown.



    The parameters are described as follows.
    Parameter Description
    Scheduling status Indicates the running of the task. By default, it is not selected.
    Error retry We recommend that you select this parameter so that the system can retry if an error occurs.
    Start date The default value is recommended.
    Scheduling cycle Minute is used in this example.
    Start time It is set to 00:00 in this example.
    Scheduling interval The scheduling interval is set to 5 minutes in this example.
    End time It is set to 23:59 in this example.
    Dependency attributes Set the Dependency Attribute field based on your business needs, or retain the default value.
    Cross-cycle dependency Set the Cross-Cycle Dependency field based on your business needs, or retain the default value.
  4. Click Parameter Configuration.

    The parameters are described as follows.

    Parameter Description
    ${bdp.system.bizdate} It does not need to be configured.
    startTime It is the Start Time variable set in Scheduling Configuration. In this example, it is set to $[yyyymmddhh24miss-10/24/60], indicating a time equal to the scheduling task start time minus 10 minutes.
    endTime It is the End Time variable set in Scheduling Configuration. In this example, it is set to $[yyyymmddhh24miss-5/24/60], indicating a time equal to the scheduling task start time minus 5 minutes.

Step 5. Submit the task

  1. At the top of the page, click Submit.

  2. In the dialog box, click Confirm Submission.

After the task is submitted, the system prompts The current file is read-only.

Step 6. Check the task

  1. At the top of the page, click Operation Center.

  2. In the left-side navigation pane, click Task List > Cycle Task to view the created task OTStoODPS.
  3. The task starts running at 00:00 on the next day.
    • In the left-side navigation pane, click Task O&M > Cycle Instance to view scheduling tasks to be executed on the day. Click the instance name to view the details.

    • You can view the log when a task is running or after it is completed.

Step 7. View the data that has been imported to MaxCompute

  1. At the top of the page, click Data Management.

  2. In the left-side navigation pane, click Query Data. All the tables in MaxCompute are listed.
  3. Find the table (ots_gps_data) to which the data is imported, and click the table to go to the table details page.
  4. Click Data Preview to view the imported data.