edit-icon download-icon

Incremental synchronization (wizard mode)

Last Updated: Mar 20, 2018

Data Integration supports data synchronization in wizard mode and script mode. Wizard mode is simpler whereas script mode is more flexible.

This chapter describes how to synchronize incremental data (generated by the Put, Update, and Delete actions) from Table Store to MaxCompute through the Table Store feature in a near-real-time manner.

Note: Because the offline synchronization mode is used, a latency of about 10 minutes exists.

Step 1. Create Table Store data source

  1. Log on to the Data IDE.

  2. If you are using Data Integration for the first time, you must first create a Data Integration project.

  3. On the Data Sources page, click New Source.

  4. Select Table Store as the data source.

  5. Set parameters and click test connectivity.

    The parameters are described as follows.

    ParameterDescription
    NameName of the Table Store data source. This example uses gps_data.
    DescriptionDescription of the data source.
    EndpointEnter the instance address on the Table Store instance page.
    - If the Table Store instance is in the same region as the MaxCompute instance, enter the private network address.
    - If the Table Store instance is not in the same region as the MaxCompute instance, enter the public network address.
    - You cannot enter the VPC address.
    Table Store IDName of the Table Store instance.
    Access IDAccessKeyID of the logon account.
    Access KeyAccessKeySecret corresponding to the AccessKeyID of the logon account.

    Note: If the connectivity test fails, check whether the endpoint and instance name are correct. If the problem persists, open a ticket.

  6. Click complete. Information about the Table Store data source is displayed on the Data Sources page.

Step 2. Create MaxCompute data source

This operation is similar to Step 1. You only need to select MaxCompute as the data source.

In this example, the MaxCompute data source is named OTS2ODPS.

Step 3. Create an incremental real-time data tunnel

  1. On the Data IDE page, click Sync Tasks.

  2. Select Wizard mode.

  3. Select the Table Store data source created in Step 1.

    The parameters are described as follows.

    ParameterDescription
    Data sourcesThe Table Store data source you created. In this example, gps_data is selected.
    TableData Integration automatically obtains the latest data table from Table Store.
    Stream must be activated for the selected table. If Stream is not activated, click Activate Stream in One Click at the right side to activate Stream.
    The incremental data is valid for up to 24 hours.
    Start timeStart time of incremental export.
    For a periodic task, the variable value is required. The default value is ${start_time}.
    End timeEnd time of incremental export.
    For a periodic task, the variable value is required. The default value is ${end_time}.
    Status tableIt is used to store status values during incremental export. The default value is recommended.
    Maximum number of retriesIt indicates the maximum number of retries to perform during when the network is unstable. The default value is 30. You can set the value as needed.
    Export time series informationIt indicates whether the exported data contains the time information. It is not selected by default.
  4. On the Select Target page, select the MaxCompute data source created in Step 2.

    The parameters are described as follows.

    ParameterDescription
    Data sourcesThe MaxCompute data source you created. In this example, OTS2ODPS is selected.
    TableSelect a table in this data source.
    If no table is available, at the right side click Create New Target Table to create a table. In the dialog box that appears, replace your_table_name with the name of the table to be created, for example, ots_gps_data. (Because timestamp is a reserved field in MaxCompute and cannot be used in this box, ts can be used to represent timestamp if necessary.)
    Partition informationThe default value is ${bdp.system.bizdate}, indicating data in MaxCompute is partitioned by date.
    Cleaning ruleSelect the first option.
  5. On the Field Mapping page, make sure the Table Store table maps the MaxCompute table.

  6. On the Channel Control page, set the parameters.

    The parameters are described as follows.

    ParameterDescription
    Job speed limitRange: 1 MB/s to 20 MB/S. To request a higher job speed limit, open a ticket.
    Number of concurrent jobsThe maximum value is 10. Maximum rate of a job = Task speed limit/Number of concurrent jobs
    Number of error recordsThe task fails when the number of error records exceeds this value. The default value is 0.
  7. On the preview page, check the configurations.

  8. Click Save. In this example, the task name that is saved is OTStoODPS.

Step 4. Set scheduling parameters

  1. At the top of the page, click Data Development.

  2. On the Task Development tab, double-click the created task OTStoODPS.

    Open a task

  3. Click Scheduling configuration to set the scheduling parameters.

To set the task to run on the next day, configure the following parameters as shown.

  1. ![Scheduling configuration](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/61034/reseller_en/1514883931054/61034-9-en.png)

The parameters are described as follows.

ParameterDescription
Scheduling statusIndicates the running of the task. By default, it is not selected.
Error retryWe recommend that you select this parameter so that the system can retry if an error occurs.
Start dateThe default value is recommended.
Scheduling cycleMinute is used in this example.
Start timeIt is set to 00:00 in this example.
Scheduling intervalThe scheduling interval is set to 5 minutes in this example.
End timeIt is set to 23:59 in this example.
Dependency attributesSet the Dependency Attribute field based on your business needs, or retain the default value.
Cross-cycle dependencySet the Cross-Cycle Dependency field based on your business needs, or retain the default value.
  1. Click Parameter Configuration.

    The parameters are described as follows.

    ParameterDescription
    ${bdp.system.bizdate}It does not need to be configured.
    startTimeIt is the Start Time variable set in Scheduling Configuration. In this example, it is set to $[yyyymmddhh24miss-10/24/60], indicating a time equal to the scheduling task start time minus 10 minutes.
    endTimeIt is the End Time variable set in Scheduling Configuration. In this example, it is set to $[yyyymmddhh24miss-5/24/60], indicating a time equal to the scheduling task start time minus 5 minutes.

Step 5. Submit the task

  1. At the top of the page, click Submit.

    Submit a task

  2. In the dialog box, click Confirm Submission.

After the task is submitted, the system prompts The current file is read-only.

Step 6. Check the task

  1. At the top of the page, click Operation Center.

    Go to the O&M Center

  2. In the left-side navigation pane, click Task List > Cycle Task to view the created task OTStoODPS.

  3. The task starts running at 00:00 on the next day.

    • In the left-side navigation pane, click Task O&M > Cycle Instance to view scheduling tasks to be executed on the day. Click the instance name to view the details.

    • You can view the log when a task is running or after it is completed.

Step 7. View the data that has been imported to MaxCompute

  1. At the top of the page, click Data Management.

    Go to Data Management

  2. In the left-side navigation pane, click Query Data. All the tables in MaxCompute are listed.

  3. Find the table (ots_gps_data) to which the data is imported, and click the table to go to the table details page.

  4. Click Data Preview to view the imported data.

Thank you! We've received your feedback.