The MaxCompute console offers the Data Upload feature for offline (non-real-time) data uploads from local files or Alibaba Cloud OSS to MaxCompute. This feature lets you analyze, process, and manage your data within the MaxCompute environment.
Limitations
You can upload data only from a local file or Alibaba Cloud OSS.
Local file: You can upload data from CSV or XLSX files.
CSV format: The maximum file size is 5 GB.
XLSX format: The maximum file size is 100 MB.
Alibaba Cloud OSS: You can upload data only from CSV files. The maximum file size is 5 GB. The source bucket must be in the same region as your MaxCompute project.
You cannot upload data to tables (both existing and new) that have a custom schema. For more information about custom schemas, see Schema operations.
Prerequisites
You have created a MaxCompute project to store the uploaded data and have the required permissions. For example:
To upload data to an existing table, you need write permissions on that table.
To upload data to a new table, you need permissions to create tables in the project.
For more information about how to create a MaxCompute project, see Project management. For more information about how to grant permissions, see Permissions overview.
If you upload data from Alibaba Cloud OSS, the following requirements apply:
You have activated Alibaba Cloud OSS, created a bucket, and uploaded your data to the bucket. For more information, see Create buckets and Upload files.
The Alibaba Cloud account used for the upload has the required permissions to access the destination bucket. For more information, see Overview of permissions and access control.
Procedure
-
Log in to the MaxCompute console and select a region in the upper-left corner.
-
In the left-side navigation pane, choose Data Transfer > Data Upload. The Data Upload page appears.
-
On the Data Upload page, configure the parameters as described in Table 1. Data upload parameters.
Table 1. Data upload parameters
Section
Parameter
Description
Data source
Local file
Upload data from a file on your local machine.
You can upload a single CSV or XLSX file at a time.
CSV file: The maximum file size is 5 GB. Data must be separated by commas (,).
XLSX file: The maximum file size is 100 MB. By default, only data from the first sheet is uploaded. Data in other sheets is ignored.
Alibaba Cloud OSS
Upload data from a CSV file stored in an Alibaba Cloud OSS bucket.
You can select only a CSV file in a bucket that is in the current region. The maximum file size for a single upload is 5 GB. If no accessible bucket is available, you must create one. For more information, see Create buckets.
NoteIf your data exceeds 5 GB, split it into smaller files before uploading. Otherwise, the upload will fail.
Specify data to be uploaded
Select bucket
If you set Data Source to Alibaba Cloud OSS, select the path of the OSS bucket where your source file is located.
Select file
Select the source CSV or XLSX file.
Remove dirty data
Specifies whether to filter out dirty data during the upload. Dirty data refers to records where the data type does not match the data type of the corresponding column in the destination MaxCompute table.
NoteFor example, a record is considered dirty data if a source column contains a string with letters but the corresponding destination table column is of the BIGINT type. If you choose to remove dirty data, this record is not uploaded.
Yes: Removes records with data type mismatches.
No: Uploads all data from the file. Data type mismatches may cause the upload to fail.
Configure destination table
MaxCompute project name
The destination MaxCompute project.
Destination table
Select whether to upload data to an existing table or a new table.
Table 2. Parameters for an existing table
Parameter
Description
Select destination table
Select the destination MaxCompute table from the drop-down list or search for it by keyword.
Upload method
Specifies how to add data to the destination table based on the configured column mappings.
Clear Table Data First: Clears all existing data from the destination table and then inserts the new data.
Append: Adds the new data to the destination table while preserving existing data.
NoteFor information about how to configure column mappings, see Step 4.
Table 3. Parameters for a new table
Parameter
Description
Table name
Specify a name for the new table.
Table type
Select non-partitioned table or Partitioned Table. If you select partitioned table, you must specify one or more partition key columns and their values.
lifecycle
Specifies the lifecycle of the table in days. The table is automatically deleted after this period expires. For more information, see Lifecycle and Lifecycle operations.
-
Preview the source data and configure column mappings.
After you select the source file and destination table, preview the data and configure the column mappings. Data cannot be uploaded until the mappings are correctly configured.
NoteYou can preview only the first 20 rows of data.
Section
Parameter
Description
Upload file data preview
file encoding
If garbled characters appear in the preview, switch to a different encoding. Supported encodings are UTF-8, GB18030, and Big5.
Map by column name
Maps source columns to destination columns based on matching column names.
Map by order
Maps source columns to destination columns based on their order in the file.
Ignore first row
Specifies whether to ignore the first row of the source file, which typically contains column headers.
Selected: The first row is not uploaded.
Cleared: The first row is uploaded as a data record.
Click Upload data to start the upload process.
ImportantIf a source column is not mapped to a destination column, its data is grayed out in the preview and is not uploaded.
Each source column can be mapped to only one destination column. Duplicate mappings are not allowed.
When you create a new table, you must specify a name and data type for each column.
Viewing upload records
The upload may take some time for large datasets. You can leave the page and check the upload status by clicking Data Upload in the upper-right corner of the View Upload Records page.
Clicking View Upload Records also shows records generated by the Data Upload operation in DataWorks.
Next steps
After the data is successfully uploaded, you can use a connection tool to query the data in the destination MaxCompute table.