edit-icon download-icon

Create a data sync job

Last Updated: Sep 14, 2018

The data integration function allows to periodically import business data generated in your system to the workspace and periodically export the flow computing results to the data source you specify for further display or operation.

DataWorsStep3

Currently, data from the following data sources can be imported to or exported from the workspace through the data integration function: RDS, MySQL, SQL Server, PostgreSQL, MaxCompute, ApsaraDB for Memcache, DRDS, OSS, Oracle, FTP, DM, Hdfs, MongoDB, and so on. For more information, see Supported data source types.

This section uses MySQL as an example to show how to export data in MaxCompute to MySQL through the data integration function.

Prerequisites

If your database is a self-built database on ECS or a RDS/MongoDB data source, you must add the data synchronization machine IP address whitelist to your ECS security group or RDS/MongoDB whitelist. For more information, see Add whitelist and security group.

Note:

If you use a custom resource group to schedule RDS data synchronization tasks, you must add the machine IP address of the custom resource group to the RDS whitelist.

Procedure

Add a data source

Note:

Only the project administrator can create a data source. Other roles can only view the data source.

  1. Log on to the DataWorks console as an administrator and click Enter Project in the operations column of the relevant project in the Project List.

  2. Click Data Integration from the upper menu, and click Data Sources in the left-side navigation pane.

  3. Click New Source in the upper-right corner, as shown in the following figure.

    NewSoucePage

  4. Enter the configuration items in the create data source dialog box, as shown in the following figure.

    AddNewSource

    • Type: The network type of data sources.

    • Name: The name must contain letters, numbers, and underscores (), but cannot begin with a number or an underscore (), for example, abc_123.

    • Description: The description cannot exceed 80 characters.

    • JDBC URL: jdbc:mysql://host:port/database

    • User name/Password: The user name and password are used to connect to the database.

    For configurations of different types of data sources, see the articles under Data Source Config.

  5. Click Test Connectivity.

  6. If the connectivity test is successful, click Complete.

Note: Make sure that the target MySQL database contains tables.

Create the table odps_result in the MySQL database. The statements used for table creation are as follows.

  1. CREATE TABLE `ODPS_RESULT` (
  2. `education` varchar(255) NULL ,
  3. `num` int(10) NULL
  4. )

After the table is created, you can run desc odps_result; to view the table details.

Create and configure a synchronization node

This section shows how to create and configure the synchronization node write_result, and write data from result_table to the MySQL database. The specific steps are as follows.

  1. Create the node write_result, as shown in the following figure.

    19

    20

  2. Select the source.

    Select the MaxCompute data source and the source table result_table and click Next, as shown in the following figure.

    21

  3. Select the target.

    Select the MySQL data source and the target table odps_result and click Next, as shown in the following figure.

    22

  4. Map the fields.

    Select the mapping between fields. You must configure the field mapping relationships. The Source Table Fields on the left correspond (one-to-one) with the Target Table Fields on the right.

    FieldMapping

  5. Control the channel.

    Click Next to configure the maximum job rate and dirty data check rules, as shown in the following figure.

    24

  6. Preview and store.

    After configuration, you can scroll up or down to view the task configurations. If no errors are found, click Save.

    25

Submit a data synchronization task

Once you save a synchronization task click Submit, and the synchronization task is submitted to the scheduling system. The scheduling system automatically and periodically runs the task from the second day according to the configuration attributes.

Subsequent steps

Now, you know how to create a synchronization task and export data to data sources of different types. Continue to the next tutorial for further study. This tutorial shows you how to set the scheduling attribute and dependency for a synchronization task. For more information, see Set task scheduling attribute and dependency.

Thank you! We've received your feedback.