You can use the Data Integration feature of DataWorks to import data from other data sources into MaxCompute in batch or real-time mode. You can also import some types of local files. This topic describes how to use DataWorks to import data into MaxCompute.
Procedure
Create a MaxCompute project and a table. The table stores the data that you synchronize to MaxCompute.
Create a DataWorks workspace and attach a MaxCompute compute resource.
Import data.
Import local files into MaxCompute
Log on to the DataWorks console and select a region in the upper-left corner.
In the left navigation pane, choose .
In the left navigation pane, click the upload icon
and then click Upload Data.Follow the on-screen instructions to upload the target data.
You can import
CSV,XLS,XLSX, andJSONfiles into MaxCompute using the Local File or OSS import options:Local File: The maximum file size is 5 GB for
CSVfiles and 100 MB for other file types.OSS: You can only upload data from a Bucket in the same region as the current MaxCompute project.
For more information, see Upload data.
For earlier versions of DataWorks workspaces, you can upload local CSV or custom text files to a MaxCompute table. For more information, see Upload data.
Import data from other data sources into MaxCompute
Log on to the DataWorks console and select a region in the upper-left corner.
In the left navigation pane, choose .
In the Select Workspace section, click Go To Data Studio.
In the left pane of DataStudio, click the
icon and select or Real-time Synchronization.Batch Synchronization Node: Set the data destination to MaxCompute and the data source to another data source.
Real-time synchronization node: Set the output to MaxCompute and the input to another data source.
For more information, see Configure a node in the codeless UI, Configure a node in the code editor, and Configure a real-time synchronization task in DataStudio.
Go back to the DataWorks console.
In the left navigation pane, choose .
In the left navigation pane, select Sync Task, click Create Synchronization Task, and then configure the data source information in the Create Sync Task dialog box.
For more information, see Configure a real-time synchronization task for an entire database.
Data Integration synchronization features
DataWorks Data Integration supports synchronizing data from other data sources into MaxCompute. For example, you can synchronize data from databases such as ApsaraDB RDS into MaxCompute. The synchronization principles and supported features vary depending on the scenario.
Batch synchronization provides Reader and Writer plug-ins to read data from and write data to data sources.
In a batch import scenario, each batch synchronization node can import data from one or more tables into a single MaxCompute table.
Real-time synchronization supports combining various input and output data sources to create a synchronization link. This link can perform real-time incremental synchronization for a single table or an entire database.
Data Integration also provides synchronization solutions for various scenarios that involve different data sources. These solutions support scenarios such as batch synchronization for an entire database and full and incremental real-time synchronization.
The following table describes the data synchronization features that are supported for MaxCompute.
Offline synchronization | Real-time synchronization | Synchronization Solutions | |||||||
Read from single table | Write to single table | Read incremental data from single table | Write incremental data to single table | Read incremental data from entire database | Write incremental data to entire database | Read from entire database (batch) | Write to entire database (batch) | Read full and incremental data from single table/entire database (real-time) | Write full and incremental data to single table/entire database (real-time) |
|
| - |
| - |
| - |
| - |
|
For more information about the data synchronization features that DataWorks Data Integration provides for MaxCompute, see MaxCompute data source.
Billing
To use DataWorks Data Integration for data synchronization, you need to use Data Integration resource groups and scheduling resource groups. You can use shared or exclusive resource groups based on your requirements. If data is transferred over the Internet, you may be charged for data transfer costs.
For more information about the billing of Data Integration resource groups, see Billing of exclusive resource groups for Data Integration: Subscription and Billing of shared resource groups for Data Integration (debugging): Pay-as-you-go.
For more information about data transfer costs, see Internet traffic billing.
For more information about the billing of scheduling resource groups, see Billing of exclusive resource groups for scheduling: Subscription and Billing for shared resource groups for scheduling.