edit-icon download-icon

Step 3: Configure data sync task

Last Updated: May 04, 2018

After following the preceding operations, you have succeeded to Configure the RDS data source. This document describes how to configure data synchronizing tasks to synchronize the data of RDS data sources to MaxCompute.

Procedure

  1. Log on to the DataWorks console as a developer and click Enter Workplace from the Actions column of the relevant project.

  2. Click Data Integration from the upper menu and navigate to the Data Synchronization page.

  3. Click Wizard Mode to create a synchronization task.

  4. Select a source.

    Select mysql data source and the source table and click Next, as shown in the following figure:

    step1

    The table is fully updated daily. Therefore, you must only collect the complete data of the previous day. Therefore, the data with the datetime being the date of the previous day is auto scheduled for data filtering. You can also use the system parameter ${bdp.system.bizdate} instead to have the field value auto replaced every day for task scheduling. For more information, see System scheduling parameters.

  5. Select a target.

This example shows how to import data into a MaxCompute (formerly ODPS) project. Select the default data source odps_first(odps) as the target. If the target table is not created yet, and you can click Create New Target Table. For more methods of creating tables, See Create a table.

  1. ![2](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/59099/intl_en/1519813830289/b.png)

The dialog box for quick table creation displays the MaxCompute table creation statements auto generated according to the source table structure:

  1. CREATE TABLE IF NOT EXISTS your_table_name (
  2. house_id STRING COMMENT '*',
  3. house_city STRING COMMENT '*',
  4. house_total_price DOUBLE COMMENT '*',
  5. house_unit_price DOUBLE COMMENT '*',
  6. house_type STRING COMMENT '*',
  7. house_floor STRING COMMENT '*',
  8. house_direction STRING COMMENT '*',
  9. house_deckoration STRING COMMENT '*',
  10. house_area DOUBLE COMMENT '*',
  11. house_community_name STRING COMMENT '*',
  12. house_region STRING COMMENT '*',
  13. proj_name STRING COMMENT '*',
  14. proj_addr STRING COMMENT '*',
  15. period BIGINT COMMENT '*',
  16. property STRING COMMENT '*',
  17. greening_rate STRING COMMENT '*',
  18. property_costs STRING COMMENT '*',
  19. datetime STRING COMMENT '*'
  20. )
  21. COMMENT '*'
  22. PARTITIONED BY (pt STRING);

Note:

  • In the auto generated code, the table name must be changed to the actual target table name, which can be the same with the source table, house_basic_info.

  • In the auto generated code, the varchar type in the source table corresponds to the string type, and the int type corresponds to the bigint type. Currently, MaxCompute only supports six Data types, which are different from the data types of common databases.

  • In the auto generated code, you cannot specify the default value of a field, whether the value of a field is not null (null is allowed by default), or the length of a field (the default maximum length of each field is 8 MB).

  • A partition table is created in the auto generated code with the partition name of pt. MySQL databases have no partition, while MaxCompute has partitions similar to those of Hadoop. For more information, see Partition. The target table in this example can have partitions by time.

  • As time partition is configured, the datetime field of the source table does not need to be synchronized to the target table, and even the field does not need to be created.

  • For more differences between the SQLs of common databases and those of MaxCompute, see Differences with mainstream SQLs.

Modify the table creation statements in the dialog box as per the notes, and click Submit. The statements used for table creation in MaxCompute are as follows:

  1. CREATE TABLE IF NOT EXISTS house_basic_info (
  2. house_id STRING COMMENT '*',
  3. house_city STRING COMMENT '*',
  4. house_total_price DOUBLE COMMENT '*',
  5. house_unit_price DOUBLE COMMENT '*',
  6. house_type STRING COMMENT '*',
  7. house_floor STRING COMMENT '*',
  8. house_direction STRING COMMENT '*',
  9. house_deckoration STRING COMMENT '*',
  10. house_area DOUBLE COMMENT '*',
  11. house_community_name STRING COMMENT '*',
  12. house_region STRING COMMENT '*',
  13. proj_name STRING COMMENT '*',
  14. proj_addr STRING COMMENT '*',
  15. period BIGINT COMMENT '*',
  16. property STRING COMMENT '*',
  17. greening_rate STRING COMMENT '*',
  18. property_costs STRING COMMENT '*'
  19. )
  20. COMMENT '*'
  21. PARTITIONED BY (pt STRING);

Configure the target as follows:

step2

  • Leave the partition value with its default value ${bdp.system.bizdate}, which corresponds to the datetime obtained from the source table filtration. It means that the data stored in this partition is the data in the source table with datetime=${bdp.system.bizdate}.

  • Leave the cleaning rule with its default option, that is, to clean the existing data before writing. For partition tables, only the data in current partition is cleaned (if any data exists).

  1. Field mapping.

    Leave the default configuration. The field names of the source table and the target table are automatically matched. As the datetime field of the source table has no corresponding field in the target table, no action is needed.

  2. Channel control.

    Leave the default configurations in this example. For more information about the configurations for channel control, see Set channel control parameters for data synchronization.

  3. Save and submit.

    • When saving a task, you can create a separate directory for the task. The target table name is used as the task name in this example.

    • The task is submitted to the scheduling system to auto run as scheduled. The scheduling configuration in this example maintains the default values with the scheduling cycle as Day.

What to do next

Now you have learned how to configure data synchronization tasks. The next tutorial explains how to perform data synchronization tasks to import RDS data into MaxCompute later. For more information, see Perform data import tasks.

Thank you! We've received your feedback.