The DataWorks Data Upload feature lets you import data from sources such as a local file, a Workbook from Data Analysis, an Object Storage Service (OSS) file, or an HTTP file. You can load this data into engines like MaxCompute, EMR Hive, Hologres, and StarRocks for analysis and management. This topic describes how to upload data by using this feature.
Before you begin
If your task involves a cross-border data operation (for example, transferring data from the Chinese mainland to a location outside the Chinese mainland, or between different countries or regions), you must read and understand the compliance statement. Failure to do so can lead to upload failures and legal liability.
Use English column headers in your source file. Chinese headers can cause parsing failures and upload errors.
Limitations
Resource group limitations: The Data Upload feature requires you to specify a resource group for scheduling and a resource group for Data Integration.
You can only use a Serverless resource group (recommended), an exclusive resource group for scheduling, or an exclusive resource group for Data Integration. You must configure these resource groups for the corresponding engine by choosing .
You must bind the selected resource group to the DataWorks workspace where the destination table is located. You must also ensure that the selected resource group can connect to the data source for the upload task.
NoteTo configure the resource group used by an engine in Data Analysis, see System management.
To establish network connectivity between a data source and a resource group, see Network connection solutions.
To bind an exclusive resource group to a workspace, see Use an exclusive resource group for scheduling and Use an exclusive resource group for Data Integration.
Table permissions required for data upload:
MaxCompute tables (ODPS tables): You must be the owner of the table. No specific roles in the DataWorks workspace are required.
Non-MaxCompute tables (such as Hologres, EMR Hive, and StarRocks): You must be the table owner and hold a specific role in the DataWorks workspace.
Tables in the development environment: You must have the Developer role.
Tables in the production environment: You must have the O&M role.
Table type limitations: You can upload data only to an internal table or a table in the default catalog (for StarRocks).
Billing
Data uploads may incur the following fees:
Data transfer fees.
If you create a new table, computing and storage fees are incurred.
These fees are charged by the respective compute engines. For detailed pricing information, see the billing documentation for each engine: MaxCompute billing, Hologres billing, E-MapReduce billing, and EMR Serverless StarRocks product billing.
Go to the Data Upload page
Go to the Upload and Download page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Data Upload and Download.
In the left-side navigation pane, click the
icon to go to the Upload Data page.Click Upload Data and follow the instructions to upload your data.
Select the source file
You can upload data from a local file, a workbook, Object Storage Service (OSS), or an HTTP file. Select a data source based on your business needs.
When you upload a file, you can choose whether to filter out dirty data.
Yes: If dirty data is found, the platform automatically ignores it and continues the upload.
No: If dirty data is found, the upload stops.
Local file
Use this option for data stored in a local file.
Set Data Source to Local File.
Under Specify Data to Be Uploaded, drag your local file to the Select File area.
NoteSupported formats include
CSV,XLS,XLSX, andJSON. The maximum file size is5 GBforCSVfiles and100 MBfor other file types.By default, only the first sheet of a file is uploaded. To upload multiple sheets, you must create a separate table for each sheet and make that sheet the first one in the file.
Uploading
SQLfiles is not currently supported.
Workbook
Select this option if the data you want to upload is in a DataWorks Data Analysis workbook.
Set Data Source to Workbook.
Under Specify Data to Be Uploaded:
From the dropdown list next to Select File, select the Workbook you want to upload.
If the Workbook does not exist, click the Create button to create one. You can also create a Workbook and import data in the Data Analysis module.
OSS
Select this option if the data you want to upload is stored in Object Storage Service (OSS).
Prerequisites:
You have created an OSS bucket and stored the data file to be uploaded in the bucket. You can then upload the OSS data to the corresponding data source.
To avoid permission issues, ensure the Alibaba Cloud account for the upload has access to the destination bucket. For more information, see Permissions and access control overview.
Steps:
Set Data Source to OSS.
Under Specify Data to Be Uploaded:
From the Select Bucket dropdown list, select the OSS bucket that contains the data to be uploaded.
NoteYou can only upload data from a bucket that is in the same region as your DataWorks workspace.
In the Select File area, choose the data file you want to upload.
NoteOnly
CSV,XLS,XLSX, andJSONfile formats are supported.
HTTP file
Select this option if the data you want to upload is an HTTP file.
Set Data Source to HTTP File.
Configure the parameters under Specify Data to Be Uploaded:
Parameter
Description
File URL
The URL of the data file.
NoteBoth HTTP and HTTPS URLs are supported.
File Type
The system automatically detects the file type.
Supported file types are
CSV,XLS, andXLSX. The maximum file size is 5 GB forCSVfiles and 50 MB for other file types.Request Method
Supported methods are GET, POST, and PUT. Although GET is recommended for retrieving data, the required method depends on your server's configuration.
Advanced parameters
You can also set the Request Header and Request Body in the Advanced Parameters section as needed.
Configure the destination table
In the Configure Destination Table section, select the Target Engine for the data upload and configure the relevant parameters.
When configuring the destination table, you must distinguish between the production (PROD) and development (DEV) environments for the data source. If you select the wrong environment, data will be uploaded to an unintended location.
MaxCompute
If you need to upload data to an internal table in MaxCompute, configure the parameters as described in the following table.
Parameter | Description | |
MaxCompute project name | Select a MaxCompute data source bound to the current region. If the required data source is not found, you can bind a MaxCompute compute resource to the current workspace to generate a data source with the same name. | |
Destination table | You can choose an Existing Table or Create Table. | |
Select destination table | The table where the uploaded data will be stored. You can search for the table by keyword. Note You can upload data only to tables that you own. For more information, see Limitations. | |
Upload mode | Select how to add data to the destination table.
| |
Table name | Enter a custom name for the new table. Note When creating a table in the MaxCompute engine, the system uses the configured MaxCompute account information from the DataWorks compute resources to create the table in the corresponding MaxCompute project. | |
Table type | Select Non-partitioned Table or Partitioned Table as needed. If you choose a partitioned table, you must specify the partition columns and their values. | |
Lifecycle | Specify the table's retention period. The table is deleted when this period expires. For more information about table lifecycles, see lifecycle and lifecycle operations. | |
EMR Hive
If you need to upload data to an internal table in EMR Hive, configure the parameters as described in the following table.
Parameter | Description |
Data source | Select an EMR Hive data source (Alibaba Cloud instance mode) in your region that is bound to the current workspace. |
Destination table | You can only upload data to an Existing Table. |
Select destination table | The table where the uploaded data will be stored. You can search for the table by keyword. Note
|
Upload mode | Select how to add data to the destination table.
|
Hologres
If you need to upload data to an internal table in Hologres, configure the parameters as described in the following table.
Parameter | Description |
Data source | Select the Hologres data source in your region that is bound to the current workspace. If the required data source is not found, you can bind a Hologres compute resource to the current workspace to generate a data source with the same name. |
Destination table | You can only upload data to an Existing Table. |
Select destination table | The table where the uploaded data will be stored. You can search for the table by keyword. Note
|
Upload mode | Select how to add data to the destination table.
|
Primary key conflict strategy | Select a strategy to handle primary key conflicts in the destination table.
|
StarRocks
If you need to upload data to a table in the StarRocks default catalog, configure the parameters as described in the following table.
Parameter | Description |
Data source | Select a StarRocks data source in your region that is bound to the current workspace. |
Destination table | You can only upload data to an Existing Table. |
Select destination table | The table where the uploaded data will be stored. You can search for the table by keyword. Note
|
Upload mode | Select how to add data to the destination table.
|
Advanced parameters | Configure Stream Load request parameters. |
Preview data and configure mappings
After you configure the destination table, you can preview the data and adjust the file encoding and data mapping as needed.
Currently, you can preview only the first 20 rows of data.
File Encoding Format: If the preview shows garbled characters, switch the encoding. Supported formats include
UTF-8,GB18030,Big5,UTF-16LE, andUTF-16BE.Preview data and configure destination table columns:
Upload data to an existing table: You must configure the mapping between the columns in the source file and the columns in the destination table. Supported mapping methods include Mapping by Column Name and Mapping by Order. After mapping, you can customize the destination table column names.
NoteIf a source column is not mapped, its data is grayed out and will not be uploaded.
Duplicate mappings between source and destination columns are not allowed.
The column name and column type cannot be empty. Otherwise, the data upload fails.
Upload data to a new table: You can use Intelligent Field Generation to automatically populate column information, or you can manually modify the column information.
NoteThe column name and column type cannot be empty. Otherwise, the data upload fails.
The EMR Hive, Hologres, and StarRocks engines do not support creating a new table during data upload.
Ignore First Row: Specify whether to upload the first row of the data file (typically the column names) to the destination table.
Selected: If the first row contains column names, it is not uploaded to the destination table.
Cleared: If the first row contains data, it is uploaded to the destination table.
Upload data
After configuring the settings, click Upload Data in the lower-left corner to start the upload.
Next steps
After the upload succeeds, you can click the
icon in the left-side navigation pane to open the Upload Data page. Find the Data Upload task you created and perform the following operations as needed:
Continue Upload: In the Actions column, click Continue Upload to upload the data again.
Query Data: In the Actions column, click Query Data to query and analyze the data.
View uploaded data details: Click a destination Table Name to open Data Map and view that table's details. For more information, see Metadata retrieval.
Appendix: Compliance statement for cross-border uploads
If your task involves a cross-border data operation, such as transferring data from the Chinese mainland to a location outside the Chinese mainland, or between different countries or regions, you must read and understand this compliance statement in advance. Failure to do so may cause the upload to fail and result in legal liability.
A cross-border data operation transfers your cloud business data to the region or product deployment area that you select. You must ensure that such operations comply with the following requirements:
You have the necessary permissions to process the relevant cloud business data.
You have implemented sufficient data security protection technologies and policies.
The data transfer complies with all applicable laws and regulations. For example, the transferred data must not contain any content that is restricted or prohibited from being transferred or disclosed by applicable law.
If your data upload involves a cross-border data operation, consult legal or compliance professionals before proceeding. You must ensure that the cross-border data transfer complies with all applicable laws, regulations, and regulatory policies. This includes, but is not limited to, obtaining valid consent from personal information subjects, completing the signing and filing of relevant contract terms, and completing relevant security assessments and other legal obligations.
You are legally responsible for any cross-border data operations that violate this compliance statement. Additionally, you will be liable for any losses incurred by Alibaba Cloud and its affiliates.
Related documents
DataStudio (Data Development) also supports uploading local CSV or text files to MaxCompute tables. For more information, see Upload data.
For more information about MaxCompute table operations, see Create and use MaxCompute tables.
For more information about Hologres table operations, see Create a Hologres table.
For more information about EMR table operations, see Create an EMR table.
FAQ
Resource group configuration issues.
Error message: A resource group must be configured for the source file or destination engine. Contact the workspace administrator to configure it.
Solution: To configure the resource group used by an engine in Data Analysis, see System management.
Resource group binding issues.
Error message: The global data upload resource group configured in your current workspace is not bound to the workspace where the destination table is located. Please contact the workspace administrator to bind it.
Solution: You can bind the resource group that you configured in System Management to the workspace.