All Products
Search
Document Center

DataWorks:Upload data

Last Updated:Mar 23, 2026

The DataWorks Data Upload feature lets you import data from sources such as a local file, a Workbook from Data Analysis, an Object Storage Service (OSS) file, or an HTTP file. You can load this data into engines like MaxCompute, EMR Hive, Hologres, and StarRocks for analysis and management. This topic describes how to upload data by using this feature.

Before you begin

  • If your task involves a cross-border data operation (for example, transferring data from the Chinese mainland to a location outside the Chinese mainland, or between different countries or regions), you must read and understand the compliance statement. Failure to do so can lead to upload failures and legal liability.

  • Use English column headers in your source file. Chinese headers can cause parsing failures and upload errors.

Limitations

  • Resource group limitations: The Data Upload feature requires you to specify a resource group for scheduling and a resource group for Data Integration.

  • Table permissions required for data upload:

    • MaxCompute tables (ODPS tables): You must be the owner of the table. No specific roles in the DataWorks workspace are required.

    • Non-MaxCompute tables (such as Hologres, EMR Hive, and StarRocks): You must be the table owner and hold a specific role in the DataWorks workspace.

      • Tables in the development environment: You must have the Developer role.

      • Tables in the production environment: You must have the O&M role.

  • Table type limitations: You can upload data only to an internal table or a table in the default catalog (for StarRocks).

Billing

Data uploads may incur the following fees:

  • Data transfer fees.

  • If you create a new table, computing and storage fees are incurred.

These fees are charged by the respective compute engines. For detailed pricing information, see the billing documentation for each engine: MaxCompute billing, Hologres billing, E-MapReduce billing, and EMR Serverless StarRocks product billing.

Go to the Data Upload page

  1. Go to the Upload and Download page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Integration > Data Upload and Download. On the page that appears, click Go to Data Upload and Download.

  2. In the left-side navigation pane, click the image icon to go to the Upload Data page.

  3. Click Upload Data and follow the instructions to upload your data.

Select the source file

You can upload data from a local file, a workbook, Object Storage Service (OSS), or an HTTP file. Select a data source based on your business needs.

Note

When you upload a file, you can choose whether to filter out dirty data.

  • Yes: If dirty data is found, the platform automatically ignores it and continues the upload.

  • No: If dirty data is found, the upload stops.

Local file

Use this option for data stored in a local file.

  1. Set Data Source to Local File.

  2. Under Specify Data to Be Uploaded, drag your local file to the Select File area.

    Note
    • Supported formats include CSV, XLS, XLSX, and JSON. The maximum file size is 5 GB for CSV files and 100 MB for other file types.

    • By default, only the first sheet of a file is uploaded. To upload multiple sheets, you must create a separate table for each sheet and make that sheet the first one in the file.

    • Uploading SQL files is not currently supported.

Workbook

Select this option if the data you want to upload is in a DataWorks Data Analysis workbook.

  1. Set Data Source to Workbook.

  2. Under Specify Data to Be Uploaded:

    1. From the dropdown list next to Select File, select the Workbook you want to upload.

    2. If the Workbook does not exist, click the Create button to create one. You can also create a Workbook and import data in the Data Analysis module.

OSS

Select this option if the data you want to upload is stored in Object Storage Service (OSS).

Prerequisites:

Steps:

  1. Set Data Source to OSS.

  2. Under Specify Data to Be Uploaded:

    1. From the Select Bucket dropdown list, select the OSS bucket that contains the data to be uploaded.

      Note

      You can only upload data from a bucket that is in the same region as your DataWorks workspace.

    2. In the Select File area, choose the data file you want to upload.

      Note

      Only CSV, XLS, XLSX, and JSON file formats are supported.

HTTP file

Select this option if the data you want to upload is an HTTP file.

  1. Set Data Source to HTTP File.

  2. Configure the parameters under Specify Data to Be Uploaded:

    Parameter

    Description

    File URL

    The URL of the data file.

    Note

    Both HTTP and HTTPS URLs are supported.

    File Type

    The system automatically detects the file type.

    Supported file types are CSV, XLS, and XLSX. The maximum file size is 5 GB for CSV files and 50 MB for other file types.

    Request Method

    Supported methods are GET, POST, and PUT. Although GET is recommended for retrieving data, the required method depends on your server's configuration.

    Advanced parameters

    You can also set the Request Header and Request Body in the Advanced Parameters section as needed.

Configure the destination table

In the Configure Destination Table section, select the Target Engine for the data upload and configure the relevant parameters.

Important

When configuring the destination table, you must distinguish between the production (PROD) and development (DEV) environments for the data source. If you select the wrong environment, data will be uploaded to an unintended location.

MaxCompute

If you need to upload data to an internal table in MaxCompute, configure the parameters as described in the following table.

Parameter

Description

MaxCompute project name

Select a MaxCompute data source bound to the current region. If the required data source is not found, you can bind a MaxCompute compute resource to the current workspace to generate a data source with the same name.

Destination table

You can choose an Existing Table or Create Table.

Destination Table > Existing Table

Select destination table

The table where the uploaded data will be stored. You can search for the table by keyword.

Note

You can upload data only to tables that you own. For more information, see Limitations.

Upload mode

Select how to add data to the destination table.

  • Clear Table Data First: Clears all existing data from the destination table before performing a full import into the mapped columns.

  • Append: Appends the new data to the corresponding mapped columns in the destination table.

Destination Table > Create Table

Table name

Enter a custom name for the new table.

Note

When creating a table in the MaxCompute engine, the system uses the configured MaxCompute account information from the DataWorks compute resources to create the table in the corresponding MaxCompute project.

Table type

Select Non-partitioned Table or Partitioned Table as needed. If you choose a partitioned table, you must specify the partition columns and their values.

Lifecycle

Specify the table's retention period. The table is deleted when this period expires. For more information about table lifecycles, see lifecycle and lifecycle operations.

EMR Hive

If you need to upload data to an internal table in EMR Hive, configure the parameters as described in the following table.

Parameter

Description

Data source

Select an EMR Hive data source (Alibaba Cloud instance mode) in your region that is bound to the current workspace.

Destination table

You can only upload data to an Existing Table.

Select destination table

The table where the uploaded data will be stored. You can search for the table by keyword.

Note
  • If the destination table does not exist, follow the on-screen prompts to create one in Table Management in DataStudio (Data Development).

  • You can only upload data to tables that you own. For more information, see Limitations.

Upload mode

Select how to add data to the destination table.

  • Clear Table Data First: Clears all existing data from the destination table before performing a full import into the mapped columns.

  • Append: Appends the new data to the corresponding mapped columns in the destination table.

Hologres

If you need to upload data to an internal table in Hologres, configure the parameters as described in the following table.

Parameter

Description

Data source

Select the Hologres data source in your region that is bound to the current workspace. If the required data source is not found, you can bind a Hologres compute resource to the current workspace to generate a data source with the same name.

Destination table

You can only upload data to an Existing Table.

Select destination table

The table where the uploaded data will be stored. You can search for the table by keyword.

Note
  • If the destination table does not exist, follow the on-screen prompts to create one in the Hologres console.

  • You can only upload data to tables that you own. For more information, see Limitations.

Upload mode

Select how to add data to the destination table.

  • Clear Table Data First: Clears all existing data from the destination table before performing a full import into the mapped columns.

  • Append: Appends the new data to the corresponding mapped columns in the destination table.

Primary key conflict strategy

Select a strategy to handle primary key conflicts in the destination table.

  • Ignore: The uploaded data is ignored, and the data in the destination table is not updated.

  • Update (replace): Replaces the entire existing row with the new data. Unmapped columns are set to NULL.

  • update: Updates only the mapped columns in the existing row.

StarRocks

If you need to upload data to a table in the StarRocks default catalog, configure the parameters as described in the following table.

Parameter

Description

Data source

Select a StarRocks data source in your region that is bound to the current workspace.

Destination table

You can only upload data to an Existing Table.

Select destination table

The table where the uploaded data will be stored. You can search for the table by keyword.

Note
  • If the destination table does not exist, follow the on-screen prompts to create one on the EMR Serverless StarRocks instance page.

  • You can only upload data to tables that you own. For more information, see Limitations.

Upload mode

Select how to add data to the destination table.

  • Clear Table Data First: Clears all existing data from the destination table before performing a full import into the mapped columns.

  • Append: Appends the new data to the corresponding mapped columns in the destination table.

Advanced parameters

Configure Stream Load request parameters.

Preview data and configure mappings

After you configure the destination table, you can preview the data and adjust the file encoding and data mapping as needed.

Note

Currently, you can preview only the first 20 rows of data.

  • File Encoding Format: If the preview shows garbled characters, switch the encoding. Supported formats include UTF-8, GB18030, Big5, UTF-16LE, and UTF-16BE.

  • Preview data and configure destination table columns:

    • Upload data to an existing table: You must configure the mapping between the columns in the source file and the columns in the destination table. Supported mapping methods include Mapping by Column Name and Mapping by Order. After mapping, you can customize the destination table column names.

      Note
      • If a source column is not mapped, its data is grayed out and will not be uploaded.

      • Duplicate mappings between source and destination columns are not allowed.

      • The column name and column type cannot be empty. Otherwise, the data upload fails.

    • Upload data to a new table: You can use Intelligent Field Generation to automatically populate column information, or you can manually modify the column information.

      Note
      • The column name and column type cannot be empty. Otherwise, the data upload fails.

      • The EMR Hive, Hologres, and StarRocks engines do not support creating a new table during data upload.

  • Ignore First Row: Specify whether to upload the first row of the data file (typically the column names) to the destination table.

    • Selected: If the first row contains column names, it is not uploaded to the destination table.

    • Cleared: If the first row contains data, it is uploaded to the destination table.

Upload data

After configuring the settings, click Upload Data in the lower-left corner to start the upload.

Next steps

After the upload succeeds, you can click the image icon in the left-side navigation pane to open the Upload Data page. Find the Data Upload task you created and perform the following operations as needed:

  • Continue Upload: In the Actions column, click Continue Upload to upload the data again.

  • Query Data: In the Actions column, click Query Data to query and analyze the data.

  • View uploaded data details: Click a destination Table Name to open Data Map and view that table's details. For more information, see Metadata retrieval.

Appendix: Compliance statement for cross-border uploads

Important

If your task involves a cross-border data operation, such as transferring data from the Chinese mainland to a location outside the Chinese mainland, or between different countries or regions, you must read and understand this compliance statement in advance. Failure to do so may cause the upload to fail and result in legal liability.

A cross-border data operation transfers your cloud business data to the region or product deployment area that you select. You must ensure that such operations comply with the following requirements:

  • You have the necessary permissions to process the relevant cloud business data.

  • You have implemented sufficient data security protection technologies and policies.

  • The data transfer complies with all applicable laws and regulations. For example, the transferred data must not contain any content that is restricted or prohibited from being transferred or disclosed by applicable law.

If your data upload involves a cross-border data operation, consult legal or compliance professionals before proceeding. You must ensure that the cross-border data transfer complies with all applicable laws, regulations, and regulatory policies. This includes, but is not limited to, obtaining valid consent from personal information subjects, completing the signing and filing of relevant contract terms, and completing relevant security assessments and other legal obligations.

You are legally responsible for any cross-border data operations that violate this compliance statement. Additionally, you will be liable for any losses incurred by Alibaba Cloud and its affiliates.

Related documents

FAQ

  1. Resource group configuration issues.

    Error message: A resource group must be configured for the source file or destination engine. Contact the workspace administrator to configure it.

    Solution: To configure the resource group used by an engine in Data Analysis, see System management.

  2. Resource group binding issues.

    Error message: The global data upload resource group configured in your current workspace is not bound to the workspace where the destination table is located. Please contact the workspace administrator to bind it.

    Solution: You can bind the resource group that you configured in System Management to the workspace.