Multipart upload splits a large object into smaller parts that are uploaded independently and then combined into a single object. Use multipart upload for objects larger than 5 GB or when network interruptions and program crashes may disrupt a single upload.
Prerequisites
Before you begin, ensure that you have:
An OSS bucket
The
oss:PutObjectpermission attached to your RAM user. This permission covers the full multipart upload workflow:InitiateMultipartUpload,UploadPart, andCompleteMultipartUpload. For details, see Attach a custom policy to a RAM userThe
OSS_ACCESS_KEY_IDandOSS_ACCESS_KEY_SECRETenvironment variables configured. For more information, see Configure access credentials using OSS SDK for Python 1.0An OSS endpoint. If you want to access OSS from other Alibaba Cloud services in the same region as OSS, use an internal endpoint. For more information about OSS regions and endpoints, see Regions and endpoints
OSS SDK for Python 1.0 installed. For alternative configurations, such as using a custom domain or authenticating with credentials from Security Token Service (STS), see Initialization
How it works
A multipart upload has three stages:
Initiate -- Call
bucket.init_multipart_upload()to obtain a unique upload ID from OSS.Upload parts -- Call
bucket.upload_part()for each part. Parts can be uploaded in parallel.Complete -- Call
bucket.complete_multipart_upload()to combine all parts into a single object.
Each part number identifies a part's position in the final object. Uploading a new part with an existing part number overwrites the previous part.
Data integrity verification
OSS includes the MD5 hash of each uploaded part in the ETag header of the response. When the SDK sends a Content-MD5 header with the upload request, OSS calculates the MD5 hash of the received data and compares it against the Content-MD5 value. If the hashes do not match, OSS returns the InvalidDigest error code.
Part size constraints
| Constraint | Value |
|---|---|
| Minimum part size | 100 KB |
| Maximum part size | 5 GB |
| Last part | Can be smaller than 100 KB |
Increase part size on stable networks to reduce the number of API calls. Decrease part size on unreliable networks to reduce the cost of retransmitting failed parts.
Upload an object using multipart upload
The following example uploads a local file using multipart upload. The determine_part_size() helper calculates an appropriate part size based on the total file size.
# -*- coding: utf-8 -*-
import os
from oss2 import SizedFileAdapter, determine_part_size
from oss2.models import PartInfo
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"
# Specify the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt.
key = 'exampledir/exampleobject.txt'
# Specify the full path of the local file that you want to upload. Example: D:\\localpath\\examplefile.txt.
filename = 'D:\\localpath\\examplefile.txt'
total_size = os.path.getsize(filename)
# Use the determine_part_size method to determine the part size. The minimum part size is 100 KB and the maximum is 5 GB. The last part can be smaller than 100 KB. In this example, the part size is set to 1 MB.
part_size = determine_part_size(total_size, preferred_size=1 * 1024 * 1024)
# Initiate a multipart upload task.
upload_id = bucket.init_multipart_upload(key).upload_id
parts = []
# Upload the parts.
with open(filename, 'rb') as fileobj:
part_number = 1
offset = 0
while offset < total_size:
num_to_upload = min(part_size, total_size - offset)
# Use the SizedFileAdapter(fileobj, size) method to generate a new object and recalculate the position from which the append operation starts.
result = bucket.upload_part(key, upload_id, part_number,
SizedFileAdapter(fileobj, num_to_upload))
parts.append(PartInfo(part_number, result.etag))
offset += num_to_upload
part_number += 1
# Complete the multipart upload task.
bucket.complete_multipart_upload(key, upload_id, parts)Set optional headers
Set headers when initiating a multipart upload to control object metadata, encryption, and storage class:
headers = dict()
# Caching behavior
# headers['Cache-Control'] = 'no-cache'
# Download filename
# headers['Content-Disposition'] = 'oss_MultipartUpload.txt'
# Validity period (milliseconds)
# headers['Expires'] = '1000'
# Prevent overwriting an existing object with the same name (true = forbid overwrite)
# headers['x-oss-forbid-overwrite'] = 'true'
# Server-side encryption method
# headers[OSS_SERVER_SIDE_ENCRYPTION] = SERVER_SIDE_ENCRYPTION_KMS
# Encryption algorithm (default: AES-256)
# headers[OSS_SERVER_SIDE_DATA_ENCRYPTION] = SERVER_SIDE_ENCRYPTION_KMS
# KMS Customer Master Key (CMK) ID
# headers[OSS_SERVER_SIDE_ENCRYPTION_KEY_ID] = '9468da86-3509-4f8d-a61e-6eab1eac****'
# Storage class
# headers['x-oss-storage-class'] = oss2.BUCKET_STORAGE_CLASS_STANDARD
# Object tags
# headers[OSS_OBJECT_TAGGING] = 'k1=v1&k2=v2&k3=v3'
upload_id = bucket.init_multipart_upload(key, headers=headers).upload_idSet the object access control list (ACL) when completing the upload:
headers = dict()
# Set object ACL to private
# headers["x-oss-object-acl"] = oss2.OBJECT_ACL_PRIVATE
bucket.complete_multipart_upload(key, upload_id, parts, headers=headers)Complete an upload by listing parts on the server
Instead of tracking parts locally, direct OSS to list all uploaded parts and assemble them automatically. Set the x-oss-complete-all header to yes and pass None for the parts parameter.
When x-oss-complete-all:yes is specified, the request body cannot be specified. OSS lists all parts uploaded with the given upload ID, sorts them by part number, and completes the upload.# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"
# Specify the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt.
key = 'exampledir/exampleobject.txt'
# Specify the upload ID. You can obtain the upload ID from the response to the InitiateMultipartUpload operation.
upload_id = '0004B9894A22E5B1888A1E29F823****'
headers = dict()
# headers["x-oss-object-acl"] = oss2.OBJECT_ACL_PRIVATE
headers["x-oss-complete-all"] = 'yes'
bucket.complete_multipart_upload(key, upload_id, None, headers=headers)Cancel a multipart upload
Call bucket.abort_multipart_upload() to cancel an in-progress multipart upload. After cancellation, the upload ID becomes invalid and all uploaded parts are deleted.
# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"
# Specify the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt.
key = 'exampledir/exampleobject.txt'
# Specify the upload ID. You can obtain the upload ID from the response to the InitiateMultipartUpload operation.
upload_id = 'yourUploadId'
# Cancel the multipart upload task with the specified upload ID. The uploaded parts are deleted.
bucket.abort_multipart_upload(key, upload_id)List uploaded parts
Use oss2.PartIterator to list all parts uploaded for a given upload ID. Each part includes its part number, ETag, and size.
# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"
# Specify the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt.
key = 'exampledir/exampleobject.txt'
# Specify the upload ID. You can obtain the upload ID from the response to the InitiateMultipartUpload operation. You must obtain the upload ID before you call the CompleteMultipartUpload operation to complete the multipart upload task.
upload_id = 'yourUploadId'
# List the uploaded parts that use the specified upload ID.
for part_info in oss2.PartIterator(bucket, key, upload_id):
print('part_number:', part_info.part_number)
print('etag:', part_info.etag)
print('size:', part_info.size)List multipart upload tasks
List tasks for a specific object
Each call to init_multipart_upload() for the same object generates a distinct upload ID. Use oss2.ObjectUploadIterator to list all active multipart upload tasks for an object.
# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"
# Specify the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt.
key = 'exampledir/exampleobject.txt'
# List all multipart upload tasks of the object.
for upload_info in oss2.ObjectUploadIterator(bucket, key):
print('key:', upload_info.key)
print('upload_id:', upload_info.upload_id)List all tasks in a bucket
Use oss2.MultipartUploadIterator to list all active multipart upload tasks in a bucket.
# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"
# Specify the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# List all multipart upload tasks in the bucket.
for upload_info in oss2.MultipartUploadIterator(bucket):
print('key:', upload_info.key)
print('upload_id:', upload_info.upload_id)List tasks by object name prefix
Pass a prefix parameter to filter multipart upload tasks to objects whose names start with the given prefix.
# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Specify the ID of the region that maps to the endpoint. Example: cn-hangzhou. This parameter is required if you use the signature algorithm V4.
region = "cn-hangzhou"
# Specify the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# List the multipart upload tasks of objects whose names contain the test prefix in the bucket.
for upload_info in oss2.MultipartUploadIterator(bucket, prefix='test'):
print('key:', upload_info.key)
print('upload_id:', upload_info.upload_id)Clean up incomplete multipart uploads
If a multipart upload is interrupted and AbortMultipartUpload is not called, the uploaded parts remain in the bucket and incur storage costs. Remove them using one of these methods:
Manually delete parts -- See Delete parts.
Configure lifecycle rules -- Automatically delete expired parts. See Configuration examples.
References
A multipart upload involves three API operations:
Related operations:
AbortMultipartUpload -- Cancel a multipart upload
ListParts -- List uploaded parts
ListMultipartUploads -- List all ongoing multipart upload tasks