Upload data

The data upload and download service of DataWorks allows you to upload data from various sources, such as on-premises machines and Object Storage Service (OSS) buckets, to MaxCompute for data analysis, processing, and management. The data upload and download service provides efficient and convenient data transmission capabilities to implement data-driven business. This topic describes how to use the data upload and download service to upload data.

Limits

You can upload only data in the CSV format, and the data source must be an on-premises file or an OSS object.
You can upload data only to a MaxCompute compute engine.
If you upload an on-premises file, the file can be up to 5 GB in size.
If you upload data from an OSS bucket, the bucket must be in the same region as the current DataWorks workspace.

Prerequisites

A MaxCompute compute engine is associated with a DataWorks workspace. The MaxCompute compute engine is used to store data that you upload and to analyze and manage the data later. For information about how to associate a MaxCompute compute engine with a DataWorks workspace, see Associate a MaxCompute compute engine with a workspace.
If you upload data from an OSS bucket, the following conditions must be met:
- OSS is activated, an OSS bucket is created, and the file that you want to upload is stored in the OSS bucket. For more information, see Create a bucket and Upload objects.
- The Alibaba Cloud account that you want to use to upload data is granted the permissions to access the OSS bucket. For information about how to grant permissions to an account, see Overview.

Go to the Data Upload page

Go to the DataStudio page.
Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
In the upper-left corner of the DataStudio page, click the icon and choose All Products > Data Integration > Upload and Download.
In the left-side navigation pane of the Upload and Download page, click the upload icon . The Data Upload page appears.
Click Data Upload and upload the desired data by following the on-screen instructions.

Select the data that you want to upload.
DataWorks allows you to upload a CSV file from an on-premises machine or an OSS bucket. You can select a method based on your business requirements and add a CSV file by following the on-screen instructions.
- Upload an on-premises file
  - You can upload only one CSV file, and the content in the file must be separated by commas (,).
  - By default, only data of the first sheet in the CSV file is uploaded. If multiple sheets exist, data of other sheets is ignored.
  - You can upload up to 5 GB of data.
- Upload data from an OSS bucket: You can select a CSV file only from a bucket in the same region as the current DataWorks workspace. If no bucket is accessible, create a bucket. For more information, see Create a bucket.

Configure the table in which you want to store the data to be uploaded.

You can store the data in an existing table or create a table based on your business requirements.

The following table describes the parameters.

Parameter	Description
Compute Engine	The value is fixed to MaxCompute.
MaxCompute Project Name	The MaxCompute project that is used to store the data to be uploaded.
Destination Table (set to Existing Table)	Select Destination Table: Set the parameter to the MaxCompute table that is used to store the data to be uploaded. You can enter a keyword to search for the desired MaxCompute table. Upload Method: the method that is used to add the data to be uploaded to the destination table based on mappings between source fields and destination fields. If you set the parameter to Overwrite, data of mapped fields in the destination table is overwritten. If you set the parameter to Append, the data that you want to upload is appended to the mapped fields in the destination table.
Destination Table (set to Create Table)	Table Name: the name of the table. Table Type: the type of the table. Valid values: Non-partitioned Table and Partitioned Table. If you set the parameter to Partitioned Table, you must specify the partition field and its value. Lifecycle: the lifecycle of the table. After the lifecycle is reached, the table may become unavailable. For more information about a lifecycle, see Lifecycle and Lifecycle management operations.

Preview the data that you want to upload and specify fields in the destination table.

After you select the data that you want to upload and the destination table to which you want to store the data, you can preview the data details, and configure mappings between fields in the source file and fields in the destination table. Data can be uploaded only after you configure the mappings.

Note

Only the first 20 data records can be previewed.

The following table describes the parameters.

Parameter	Description
Settings for fields in the destination table when Destination Table is set to Existing Table	You must configure mappings between fields in the data file and fields in the destination table. Data can be uploaded only after you configure the mappings. The mapping methods are Mapping by Column Name and Mapping by Order. You can also configure the name of a mapped field in the destination table. Note If no mapping exists between the data to be uploaded and the destination fields, the data is dimmed and not uploaded. One-to-more mappings are not supported. The Field Name and Field Type parameters for the source file must be configured. Otherwise, the data cannot be uploaded.
Settings for fields in the destination table when Destination Table is set to Create Table	You can click Intelligent Field Generation to allow the system to fill in the field information. You can also manually modify the field information. Note The Field Name and Field Type parameters for the source file must be configured. Otherwise, the data cannot be uploaded.
File Encoding Format	If the data that is uploaded to the destination table contains garbled characters, you can switch to use other available encoding formats. Valid values: UTF-8, GB18030, and Big5.
Ignore First Row	Specifies whether to upload the first row of the CSV file that you want to upload to the destination table. In most cases, the first row contains column names. If you select the check box, the first row of the CSV file is not uploaded to the destination table. If you do not select the check box, the first row of the CSV file is uploaded to the destination table.

Click Data Upload to upload the data.

What to do next

After the data is uploaded, you can perform the following operations based on your business requirements:

Data query: You can use DataAnalysis to query and analyze data. For more information, see SQL query.
View the details of the uploaded data: On the Data Upload page, you can click the name of the destination table to go to the DataMap page and view the details of the destination table. For more information, see MaxCompute table data.