This topic describes how to create a Data Integration task.

  • Data Integration is a reliable, secure, cost-effective, and elastically scalable data synchronization platform provided by Alibaba Group. It can be used across heterogeneous data storage systems and provides full or incremental data access channels in different network environments for a variety of data sources.
  • A reader plug-in reads data from a database at the underlying layer by connecting to a remote database and running SQL statements to select data from the database.
  • A writer plug-in writes data into a database at the underlying layer by connecting to a remote database and running SQL statements to write data into the database.

Preparations

Create an Alibaba Cloud account
  1. Activate an Alibaba Cloud account, and create the AccessKeys for this account.
  2. Activate MaxCompute to automatically generate a default MaxCompute data source, and log on to DataWorks using the Alibaba Cloud account.
  3. Create a workspace. You can collaboratively complete workflows and maintain data or tasks in the workspace. Before using DataWorks, you need to create a workspace.
Note You can grant RAM accounts the permissions to create Data Integration tasks. For more information, see Create a RAM account.
Create source and destination databases and tables
  1. You can create tables by running statements or create tables directly in the data source client. For more information about how to create databases and tables of different data source types, see their official documents.
  2. Grant read and write permissions on the databases and tables.
Note Generally, a reader plug-in requires at least the read permission, while a writer plug-in requires the add, delete, and modify permissions. We recommend that you grant sufficient permissions on tables of databases in advance.

Procedure

Create a data source
  1. Obtain data source information about a database.
  2. Configure the data source on the GUI.
Note
  • Not all data sources can be configured on the GUI. If you cannot find the configuration page for a data source, you can configure it in script mode by writing data source information in a JSON script.
  • For more information about the data sources that are supported, see Supported data sources.
(Optional) Create a custom resource group
  1. Create a resource group.
  2. Add a server.
  3. Install the agent.
  4. Test the connectivity.
Note
Configure a Data Integration task
  1. Configure the reader of the Data Integration task. For more information about how to configure a reader, see Configure a reader plug-in.
  2. Configure the writer of the Data Integration task. For more information about how to configure a writer, see Configure a writer plug-in.
  3. Configure the mapping between the reader and writer.
  4. Configure channel control. You can switch to a custom resource group in this step.
Note
  • A task can be configured in wizard or script mode.
  • When configuring a task, you can optimize the task speed. For more information, see Optimizing configuration.
  • You can switch from the wizard mode to script mode, but not from the script mode to wizard mode. We have provided templates for all plug-ins.
Run the Data Integration task
  1. You can run the Data Integration task directly on the GUI. Logs will not be saved.
  2. Before submitting the task, you need to configure scheduling. Generally, an instance is generated on the next day after submission.
Note When configuring the task, you can set scheduling parameters.

View run logs

You can view run logs of your task in O&M.

Note

You can find the DAG in O&M, right-click the DAG, and selectRun Log to view the run logs.