This topic describes how to create a Data Integration task.
- Data Integration is a reliable, secure, cost-effective, and elastically scalable data synchronization platform provided by Alibaba Group. It can be used across heterogeneous data storage systems and provides full or incremental data access channels in different network environments for a variety of data sources.
- A reader plug-in reads data from a database at the underlying layer by connecting to a remote database and running SQL statements to select data from the database.
- A writer plug-in writes data into a database at the underlying layer by connecting to a remote database and running SQL statements to write data into the database.
- Activate an Alibaba Cloud account, and create the AccessKeys for this account.
- Activate MaxCompute to automatically generate a default MaxCompute data source, and log on to DataWorks using the Alibaba Cloud account.
- Create a workspace. You can collaboratively complete workflows and maintain data or tasks in the workspace. Before using DataWorks, you need to create a workspace.
- You can create tables by running statements or create tables directly in the data source client. For more information about how to create databases and tables of different data source types, see their official documents.
- Grant read and write permissions on the databases and tables.
- Obtain data source information about a database.
- Configure the data source on the GUI.
- Not all data sources can be configured on the GUI. If you cannot find the configuration page for a data source, you can configure it in script mode by writing data source information in a JSON script.
- For more information about the data sources that are supported, see Supported data sources.
- Create a resource group.
- Add a server.
- Install the agent.
- Test the connectivity.
- If the data source is located in a private network environment or the resources provided by DataWorks do not meet your requirements, you can create a custom resource group.
- We recommend that you set the network type of the custom resource group to VPC regardless whether the server is in a classic network or VPC.
- For more information about how to configure a custom resource group, see Add scheduling resources.
- Best practices:
- Configure the reader of the Data Integration task. For more information about how to configure a reader, see Configure a reader plug-in.
- Configure the writer of the Data Integration task. For more information about how to configure a writer, see Configure a writer plug-in.
- Configure the mapping between the reader and writer.
- Configure channel control. You can switch to a custom resource group in this step.
- A task can be configured in wizard or script mode.
- When configuring a task, you can optimize the task speed. For more information, see Optimizing configuration.
- You can switch from the wizard mode to script mode, but not from the script mode to wizard mode. We have provided templates for all plug-ins.
- You can run the Data Integration task directly on the GUI. Logs will not be saved.
- Before submitting the task, you need to configure scheduling. Generally, an instance is generated on the next day after submission.
View run logs
You can view run logs of your task in O&M.
You can find the DAG in O&M, right-click the DAG, and selectRun Log to view the run logs.