Import data into MaxCompute using DataWorks - MaxCompute

You can use the Data Integration feature of DataWorks to import data from other data sources into MaxCompute in batch or real-time mode. You can also import some types of local files. This topic describes how to use DataWorks to import data into MaxCompute.

Procedure

Create a MaxCompute project and a table. The table stores the data that you synchronize to MaxCompute.
Create a DataWorks workspace and attach a MaxCompute compute resource.
Import data.
Import local files into MaxCompute
1. Log on to the DataWorks console and select a region in the upper-left corner.
2. In the left navigation pane, choose Data Integration > Data Upload and Download.
3. In the left navigation pane, click the upload icon and then click Upload Data.
4. Follow the on-screen instructions to upload the target data.
  You can import CSV, XLS, XLSX, and JSON files into MaxCompute using the Local File or OSS import options:
  - Local File: The maximum file size is 5 GB for CSV files and 100 MB for other file types.
  - OSS: You can only upload data from a Bucket in the same region as the current MaxCompute project.
For more information, see Upload data.
For earlier versions of DataWorks workspaces, you can upload local CSV or custom text files to a MaxCompute table. For more information, see Upload data.
Import data from other data sources into MaxCompute
1. Log on to the DataWorks console and select a region in the upper-left corner.
2. In the left navigation pane, choose Data Development and O&M > Data Development.
3. In the Select Workspace section, click Go To Data Studio.
4. In the left pane of DataStudio, click the icon and select Create Node > Data Integration > Batch Synchronization or Real-time Synchronization.
  - Batch Synchronization Node: Set the data destination to MaxCompute and the data source to another data source.
  - Real-time synchronization node: Set the output to MaxCompute and the input to another data source.
  For more information, see Configure a node in the codeless UI, Configure a node in the code editor, and Configure a real-time synchronization task in DataStudio.
5. Go back to the DataWorks console.
  In the left navigation pane, choose Data Integration > Data Integration.
6. In the left navigation pane, select Sync Task, click Create Synchronization Task, and then configure the data source information in the Create Sync Task dialog box.
  For more information, see Configure a real-time synchronization task for an entire database.

Data Integration synchronization features

DataWorks Data Integration supports synchronizing data from other data sources into MaxCompute. For example, you can synchronize data from databases such as ApsaraDB RDS into MaxCompute. The synchronization principles and supported features vary depending on the scenario.

Batch synchronization provides Reader and Writer plug-ins to read data from and write data to data sources.
- In a batch import scenario, each batch synchronization node can import data from one or more tables into a single MaxCompute table.
Real-time synchronization supports combining various input and output data sources to create a synchronization link. This link can perform real-time incremental synchronization for a single table or an entire database.
Data Integration also provides synchronization solutions for various scenarios that involve different data sources. These solutions support scenarios such as batch synchronization for an entire database and full and incremental real-time synchronization.

The following table describes the data synchronization features that are supported for MaxCompute.

Offline synchronization		Real-time synchronization				Synchronization Solutions
Read from single table	Write to single table	Read incremental data from single table	Write incremental data to single table	Read incremental data from entire database	Write incremental data to entire database	Read from entire database (batch)	Write to entire database (batch)	Read full and incremental data from single table/entire database (real-time)	Write full and incremental data to single table/entire database (real-time)
		-		-		-		-

For more information about the data synchronization features that DataWorks Data Integration provides for MaxCompute, see MaxCompute data source.

Billing

To use DataWorks Data Integration for data synchronization, you need to use Data Integration resource groups and scheduling resource groups. You can use shared or exclusive resource groups based on your requirements. If data is transferred over the Internet, you may be charged for data transfer costs.

For more information about the billing of Data Integration resource groups, see Billing of exclusive resource groups for Data Integration: Subscription and Billing of shared resource groups for Data Integration (debugging): Pay-as-you-go.
For more information about data transfer costs, see Internet traffic billing.
For more information about the billing of scheduling resource groups, see Billing of exclusive resource groups for scheduling: Subscription and Billing for shared resource groups for scheduling.

MaxCompute:Use DataWorks (offline and real-time)

Procedure

Import local files into MaxCompute

Import data from other data sources into MaxCompute

Data Integration synchronization features

Billing

Best practices

Offline synchronize an entire database to MaxCompute

Offline synchronize incremental data from a database to MaxCompute

Synchronize sharded databases and tables to MaxCompute

Real-time synchronize full and incremental data from a database to MaxCompute