When uploading large files (larger than 5 GB) to OSS, network interruptions or program crashes may cause failures. Multipart upload splits large files into smaller parts for concurrent upload, improving speed and resilience. After all parts are uploaded, call the CompleteMultipartUpload operation to combine them into a complete object.
Usage notes
-
In this topic, the public endpoint of the China (Hangzhou) region is used. If you want to access OSS from other Alibaba Cloud services in the same region as OSS, use an internal endpoint. For more information about OSS regions and endpoints, see Regions and Endpoints.
-
In this topic, access credentials are obtained from environment variables. For more information about how to configure access credentials, see Configure access credentials using OSS SDK for Python 1.0.
-
This topic demonstrates creating an OSSClient instance with an OSS endpoint. For alternative configurations, such as using a custom domain or authenticating with credentials from Security Token Service (STS), see Initialization.
-
The multipart upload process (InitiateMultipartUpload, UploadPart, and CompleteMultipartUpload) requires the
oss:PutObjectpermission. Grant custom permissions to a RAM user.
Multipart upload process
A multipart upload has three steps:
-
Initialize a multipart upload event.
Call bucket.init_multipart_upload to obtain a globally unique uploadId.
-
Upload parts.
Call bucket.upload_part to upload each part.
Note-
For a given uploadId, the part number identifies a part's position in the file. Uploading with the same part number overwrites the existing data.
-
OSS returns the MD5 hash of the received part data in the ETag response header.
-
OSS compares the MD5 hash of the uploaded data with the MD5 hash calculated by the SDK. A mismatch returns the InvalidDigest error code.
-
-
Complete the multipart upload.
After all parts are uploaded, call bucket.complete_multipart_upload to combine them into a complete object.
Complete multipart upload examples
You can combine uploaded parts into a complete object in two ways:
-
Combine parts by passing part information in the request body
# -*- coding: utf-8 -*- import os from oss2 import SizedFileAdapter, determine_part_size from oss2.models import PartInfo import oss2 from oss2.credentials import EnvironmentVariableCredentialsProvider # Obtain access credentials from environment variables. Before running this code, make sure you have set the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables. auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider()) # Set Endpoint to the endpoint of the region where the bucket is located. For example, for a bucket in the China (Hangzhou) region, set Endpoint to https://oss-cn-hangzhou.aliyuncs.com. endpoint = "https://oss-cn-hangzhou.aliyuncs.com" # Set region to the region ID that corresponds to the endpoint, for example, cn-hangzhou. Note: This parameter is required for SignatureV4. region = "cn-hangzhou" # Set yourBucketName to the name of your bucket. bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region) # Set key to the full path of the object. The path cannot include the bucket name. Example: exampledir/exampleobject.txt. key = 'exampledir/exampleobject.txt' # Set filename to the full path of the local file. Example: D:\\localpath\\examplefile.txt. filename = 'D:\\localpath\\examplefile.txt' total_size = os.path.getsize(filename) # The determine_part_size method determines the part size. The minimum part size is 100 KB, and the maximum is 5 GB. The last part can be smaller than 100 KB. This example sets the part size to 1 MB. part_size = determine_part_size(total_size, preferred_size=1 * 1024 * 1024) # Initialize the multipart upload. # To set headers when you initialize the multipart upload, set the relevant headers in init_multipart_upload as shown below. # headers = dict() # Specify the web page caching behavior for the object. # headers['Cache-Control'] = 'no-cache' # Specify the name of the object when it is downloaded. # headers['Content-Disposition'] = 'oss_MultipartUpload.txt' # Specify the expiration time in milliseconds. # headers['Expires'] = '1000' # Specify whether to overwrite an object that has the same name when you initialize the multipart upload. Here, it is set to true, which prohibits overwriting. # headers['x-oss-forbid-overwrite'] = 'true' # Specify the server-side encryption method for each part of the object. # headers[OSS_SERVER_SIDE_ENCRYPTION] = SERVER_SIDE_ENCRYPTION_KMS # Specify the encryption algorithm for the object. If this is not specified, AES256 is used. # headers[OSS_SERVER_SIDE_DATA_ENCRYPTION] = SERVER_SIDE_ENCRYPTION_KMS # The customer master key (CMK) managed by KMS. # headers[OSS_SERVER_SIDE_ENCRYPTION_KEY_ID] = '9468da86-3509-4f8d-a61e-6eab1eac****' # Specify the storage class of the object. # headers['x-oss-storage-class'] = oss2.BUCKET_STORAGE_CLASS_STANDARD # Specify object tags. You can set multiple tags. # headers[OSS_OBJECT_TAGGING] = 'k1=v1&k2=v2&k3=v3' # upload_id = bucket.init_multipart_upload(key, headers=headers).upload_id upload_id = bucket.init_multipart_upload(key).upload_id # Use the upload_id to cancel the multipart upload event or list uploaded parts. # To cancel a multipart upload event by uploadId, get the uploadId after you call InitiateMultipartUpload. # To list uploaded parts by uploadId, get the uploadId after you call InitiateMultipartUpload and before you call CompleteMultipartUpload. # print("UploadID:", upload_id) parts = [] # Upload parts one by one. with open(filename, 'rb') as fileobj: part_number = 1 offset = 0 while offset < total_size: num_to_upload = min(part_size, total_size - offset) # The SizedFileAdapter(fileobj, size) method generates a new file object and recalculates the starting position for appending. result = bucket.upload_part(key, upload_id, part_number, SizedFileAdapter(fileobj, num_to_upload)) parts.append(PartInfo(part_number, result.etag)) offset += num_to_upload part_number += 1 # Complete the multipart upload. # To set headers when you complete the multipart upload, see the following sample code. headers = dict() # Set the access control list (ACL) for the file. Here, it is set to OBJECT_ACL_PRIVATE, which means private. # headers["x-oss-object-acl"] = oss2.OBJECT_ACL_PRIVATE bucket.complete_multipart_upload(key, upload_id, parts, headers=headers) # bucket.complete_multipart_upload(key, upload_id, parts)ImportantIf network conditions are good, increase the part size. Otherwise, decrease the part size.
-
Combine parts by listing part data from the server
NoteBefore using this method, ensure multiple parts have been uploaded with the upload_id specified in the following code.
# -*- coding: utf-8 -*- import oss2 from oss2.credentials import EnvironmentVariableCredentialsProvider # Obtain access credentials from environment variables. Before running this code, make sure you have set the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables. auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider()) # Set Endpoint to the endpoint of the region where the bucket is located. For example, for a bucket in the China (Hangzhou) region, set Endpoint to https://oss-cn-hangzhou.aliyuncs.com. endpoint = "https://oss-cn-hangzhou.aliyuncs.com" # Set region to the region ID that corresponds to the endpoint, for example, cn-hangzhou. Note: This parameter is required for SignatureV4. region = "cn-hangzhou" # Set yourBucketName to the name of your bucket. bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region) # Set key to the full path of the object. The path cannot include the bucket name. Example: exampledir/exampleobject.txt. key = 'exampledir/exampleobject.txt' # Set filename to the full path of the local file. Example: D:\\localpath\\examplefile.txt. filename = 'D:\\localpath\\examplefile.txt' # Set upload_id. Get the upload_id after you call InitiateMultipartUpload and before you call CompleteMultipartUpload. upload_id = '0004B9894A22E5B1888A1E29F823****' # Complete the multipart upload. # To set the file ACL when you complete the multipart upload, set the relevant headers in the complete_multipart_upload function as shown below. headers = dict() # headers["x-oss-object-acl"] = oss2.OBJECT_ACL_PRIVATE # If you set x-oss-complete-all to yes, OSS lists all parts that have been uploaded with the current uploadId, sorts them by part number, and then runs the CompleteMultipartUpload operation. # If you set x-oss-complete-all to yes, you cannot specify a body. Otherwise, an error is returned. headers["x-oss-complete-all"] = 'yes' bucket.complete_multipart_upload(key, upload_id, None, headers=headers)
Cancel a multipart upload event
Call bucket.abort_multipart_upload to cancel a multipart upload. After cancellation, the uploadId becomes invalid and uploaded parts are deleted.
# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before running this code, make sure you have set the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Set Endpoint to the endpoint of the region where the bucket is located. For example, for a bucket in the China (Hangzhou) region, set Endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Set region to the region ID that corresponds to the endpoint, for example, cn-hangzhou. Note: This parameter is required for SignatureV4.
region = "cn-hangzhou"
# Set yourBucketName to the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# Set key to the full path of the object. The path cannot include the bucket name. Example: exampledir/exampleobject.txt.
key = 'exampledir/exampleobject.txt'
# Set upload_id. The upload_id is returned after you call InitiateMultipartUpload.
upload_id = 'yourUploadId'
# Cancel the multipart upload event for the specified upload_id. The uploaded parts will be deleted.
bucket.abort_multipart_upload(key, upload_id)
List uploaded parts
The following code lists uploaded parts:
# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before running this code, make sure you have set the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Set Endpoint to the endpoint of the region where the bucket is located. For example, for a bucket in the China (Hangzhou) region, set Endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
# Set region to the region ID that corresponds to the endpoint, for example, cn-hangzhou. Note: This parameter is required for SignatureV4.
region = "cn-hangzhou"
# Set yourBucketName to the name of your bucket.
bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
# Set key to the full path of the object. The path cannot include the bucket name. Example: exampledir/exampleobject.txt.
key = 'exampledir/exampleobject.txt'
# Set upload_id. Get the upload_id after you call InitiateMultipartUpload and before you call CompleteMultipartUpload.
upload_id = 'yourUploadId'
# List information about the parts uploaded with the specified upload_id.
for part_info in oss2.PartIterator(bucket, key, upload_id):
print('part_number:', part_info.part_number)
print('etag:', part_info.etag)
print('size:', part_info.size)
List multipart upload events
-
List multipart upload events for a specific object
The following code lists the multipart upload events for a specific object:
# -*- coding: utf-8 -*- import os import oss2 from oss2.credentials import EnvironmentVariableCredentialsProvider # Obtain access credentials from environment variables. Before running this code, make sure you have set the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables. auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider()) # Set Endpoint to the endpoint of the region where the bucket is located. For example, for a bucket in the China (Hangzhou) region, set Endpoint to https://oss-cn-hangzhou.aliyuncs.com. endpoint = "https://oss-cn-hangzhou.aliyuncs.com" # Set region to the region ID that corresponds to the endpoint, for example, cn-hangzhou. Note: This parameter is required for SignatureV4. region = "cn-hangzhou" # Set yourBucketName to the name of your bucket. bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region) # Set key to the full path of the object. The path cannot include the bucket name. Example: exampledir/exampleobject.txt. key = 'exampledir/exampleobject.txt' # List all multipart upload events for the object. Each call to init_multipart_upload for the same object returns a different upload_id. # Each upload_id corresponds to one multipart upload event. for upload_info in oss2.ObjectUploadIterator(bucket, key): print('key:', upload_info.key) print('upload_id:', upload_info.upload_id) -
List all multipart upload events in a bucket
The following code lists all multipart upload events in a bucket:
# -*- coding: utf-8 -*- import os import oss2 from oss2.credentials import EnvironmentVariableCredentialsProvider # An AccessKey pair of an Alibaba Cloud account has permissions on all API operations. This poses a high security risk. We strongly recommend that you create and use a RAM user for API access or routine O&M. To create a RAM user, log on to the RAM console. auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider()) # Set Endpoint to the endpoint of the region where the bucket is located. For example, for a bucket in the China (Hangzhou) region, set Endpoint to https://oss-cn-hangzhou.aliyuncs.com. endpoint = "https://oss-cn-hangzhou.aliyuncs.com" # Set region to the region ID that corresponds to the endpoint, for example, cn-hangzhou. Note: This parameter is required for SignatureV4. region = "cn-hangzhou" # Set yourBucketName to the name of your bucket. bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region) # List all multipart upload events in the bucket. for upload_info in oss2.MultipartUploadIterator(bucket): print('key:', upload_info.key) print('upload_id:', upload_info.upload_id) -
List multipart upload events by prefix
The following code lists multipart upload events for objects with a specific prefix:
# -*- coding: utf-8 -*- import os import oss2 from oss2.credentials import EnvironmentVariableCredentialsProvider # An AccessKey pair of an Alibaba Cloud account has permissions on all API operations. This poses a high security risk. We strongly recommend that you create and use a RAM user for API access or routine O&M. To create a RAM user, log on to the RAM console. auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider()) # Set Endpoint to the endpoint of the region where the bucket is located. For example, for a bucket in the China (Hangzhou) region, set Endpoint to https://oss-cn-hangzhou.aliyuncs.com. endpoint = "https://oss-cn-hangzhou.aliyuncs.com" # Set region to the region ID that corresponds to the endpoint, for example, cn-hangzhou. Note: This parameter is required for SignatureV4. region = "cn-hangzhou" # Set yourBucketName to the name of your bucket. bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region) # List multipart upload events for objects in the bucket that have the 'test' prefix. for upload_info in oss2.MultipartUploadIterator(bucket, prefix='test'): print('key:', upload_info.key) print('upload_id:', upload_info.upload_id)
FAQ
How do I delete parts?
If a multipart upload is interrupted without calling AbortMultipartUpload, the uploaded parts remain in the bucket and incur storage fees. Delete them in one of the following ways:
-
Delete parts manually. Delete parts.
-
Delete parts automatically with lifecycle rules. Lifecycle configuration examples.
References
-
A multipart upload involves three API operations:
-
Initialize a multipart upload: InitiateMultipartUpload.
-
Upload a part: UploadPart.
-
Complete a multipart upload: CompleteMultipartUpload.
-
-
Cancel a multipart upload: AbortMultipartUpload.
-
List uploaded parts: ListParts.
-
List in-progress multipart uploads: ListMultipartUploads.