Use OSS SDK for Python to upload large objects by performing multipart upload - Object Storage Service

When you upload a large object that is larger than 5 GB in size to Object Storage Service (OSS), you may fail to upload the object due to network interruptions or program crashes. If the object fails to be uploaded after multiple retries, you can upload the large object by performing multipart upload. You can split the object into multiple parts and upload the parts in parallel to speed up the upload. After all parts are uploaded, you can call the CompleteMultipartUpload operation to combine these parts into a complete object.

Usage notes

In this topic, the public endpoint of the China (Hangzhou) region is used. If you want to access OSS by using other Alibaba Cloud services in the same region as OSS, use an internal endpoint. For more information about the regions and endpoints supported by OSS, see Regions and endpoints.
In this topic, access credentials are obtained from environment variables. For more information about how to configure access credentials, see Configure access credentials.
In this topic, an OSSClient instance is created by using an OSS endpoint. If you want to create an OSSClient instance by using custom domain names or Security Token Service (STS), see Initialization.
To perform multipart upload, you must have the oss:PutObject permission. For more information, see Attach a custom policy to a RAM user.

Process

To implement multipart upload, perform the following steps:

Initiate a multipart upload task.
Use the bucket.init_multipart_upload method to obtain a unique upload ID in OSS.
Upload parts.
Use the bucket.upload_part method to upload the parts.
Note
- In a multipart upload task, part numbers are used to identify the relative positions of the parts in an object. If you upload a part and use its part number to upload another part, the latter part overwrites the former part.
- OSS includes the MD5 hash of each uploaded part in the ETag header in the response.
- OSS calculates the MD5 hash of uploaded parts and compares the MD5 hash with the MD5 hash that is calculated by OSS SDK for Go. If the two hashes are different, OSS returns the InvalidDigest error code.
Complete the multipart upload task.
After you upload all the parts, use the bucket.complete_multipart_upload method to combine the parts into a complete object.

Examples

After all parts are uploaded, you can combine all parts into a complete object by using one of the following methods:

Combine all parts into a complete object by including part information in the request body

# -*- coding: utf-8 -*-
import os
from oss2 import SizedFileAdapter, determine_part_size
from oss2.models import PartInfo
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, 'https://oss-cn-hangzhou.aliyuncs.com', 'examplebucket')
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt. 
key = 'exampledir/exampleobject.txt'
# Specify the full path of the local file that you want to upload. Example: D:\\localpath\\examplefile.txt. 
filename = 'D:\\localpath\\examplefile.txt'

total_size = os.path.getsize(filename)
# Use the determine_part_size method to determine the part size. 
part_size = determine_part_size(total_size, preferred_size=100 * 1024)

# Initiate a multipart upload task. 
# If you want to specify the storage class of the object when you initiate the multipart upload task, configure the related headers when you use the init_multipart_upload method. 
# headers = dict()
# Specify the caching behavior of the web page for the object. 
# headers['Cache-Control'] = 'no-cache'
# Specify the name of the object when it is downloaded. 
# headers['Content-Disposition'] = 'oss_MultipartUpload.txt'
# Specify the content encoding format of the object. 
# headers['Content-Encoding'] = 'utf-8'
# Specify the validity period. Unit: milliseconds. 
# headers['Expires'] = '1000'
# Specify whether the object that is uploaded by performing multipart upload overwrites the existing object that has the same name when the multipart upload task is initiated. In this example, this parameter is set to true, which indicates that the object with the same name cannot be overwritten. 
# headers['x-oss-forbid-overwrite'] = 'true'
# Specify the server-side encryption method that you want to use to encrypt each part. 
# headers[OSS_SERVER_SIDE_ENCRYPTION] = SERVER_SIDE_ENCRYPTION_KMS
# Specify the algorithm that you want to use to encrypt the object. If you do not configure this parameter, the object is encrypted by using AES-256. 
# headers[OSS_SERVER_SIDE_DATA_ENCRYPTION] = SERVER_SIDE_ENCRYPTION_KMS
# Specify the ID of the Customer Master Key (CMK) that is managed by Key Management Service (KMS). 
# headers[OSS_SERVER_SIDE_ENCRYPTION_KEY_ID] = '9468da86-3509-4f8d-a61e-6eab1eac****'
# Specify the storage class of the object. 
# headers['x-oss-storage-class'] = oss2.BUCKET_STORAGE_CLASS_STANDARD
# Specify tags for the object. You can specify multiple tags for the object at the same time. 
# headers[OSS_OBJECT_TAGGING] = 'k1=v1&k2=v2&k3=v3'
# upload_id = bucket.init_multipart_upload(key, headers=headers).upload_id
upload_id = bucket.init_multipart_upload(key).upload_id
# Cancel the multipart upload task or list uploaded parts based on the upload ID. 
# If you want to cancel a multipart upload task based on the upload ID, obtain the upload ID after you call the InitiateMultipartUpload operation to initiate the multipart upload task. 
# If you want to list the uploaded parts in a multipart upload task based on the upload ID, obtain the upload ID after you call the InitiateMultipartUpload operation to initiate the multipart upload task but before you call the CompleteMultipartUpload operation to complete the multipart upload task. 
# print("UploadID:", upload_id)
parts = []

# Upload the parts. 
with open(filename, 'rb') as fileobj:
    part_number = 1
    offset = 0
    while offset < total_size:
        num_to_upload = min(part_size, total_size - offset)
        # Use the SizedFileAdapter(fileobj, size) method to generate a new object and recalculate the position from which the append operation starts. 
        result = bucket.upload_part(key, upload_id, part_number,
                                    SizedFileAdapter(fileobj, num_to_upload))
        parts.append(PartInfo(part_number, result.etag))

        offset += num_to_upload
        part_number += 1

# Complete the multipart upload task. 
# Configure headers (if you want to) when you complete the multipart upload task. 
headers = dict()
# Specify the access control list (ACL) of the object. In this example, the ACL is set to OBJECT_ACL_PRIVATE, which indicates that the ACL of the object is private. 
# headers["x-oss-object-acl"] = oss2.OBJECT_ACL_PRIVATE
bucket.complete_multipart_upload(key, upload_id, parts, headers=headers)
# bucket.complete_multipart_upload(key, upload_id, parts)

Important

We recommend that you increase the size of each part when network conditions are stable. Otherwise, decrease the size of each part.

Combine parts into a complete object by listing the parts on the server.

Note

If you want to combine parts into a complete object by listing the parts on the server, make sure that multiple parts have been uploaded by using the upload ID specified in the following sample code.

# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, 'https://oss-cn-hangzhou.aliyuncs.com', 'examplebucket')
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt. 
key = 'exampledir/exampleobject.txt'
# Specify the full path of the local file that you want to upload. Example: D:\\localpath\\examplefile.txt. 
filename = 'D:\\localpath\\examplefile.txt'
# Specify the upload ID. You can obtain the upload ID after you call the InitiateMultipartUpload operation to initiate the multipart upload task but before you call the CompleteMultipartUpload operation to complete the multipart upload task. 
upload_id = '0004B9894A22E5B1888A1E29F823****'

# Complete the multipart upload task. 
# If you want to specify the ACL of the object when you complete the multipart upload task, configure the related headers in the complete_multipart_upload function. 
headers = dict()
# headers["x-oss-object-acl"] = oss2.OBJECT_ACL_PRIVATE
# If you specify x-oss-complete-all:yes in the request, OSS lists all parts that are uploaded by using the current upload ID, sorts the parts by part number, and then performs the CompleteMultipartUpload operation. 
# If x-oss-complete-all:yes is specified in the request, the request body cannot be specified. Otherwise, an error occurs. 
headers["x-oss-complete-all"] = 'yes'
bucket.complete_multipart_upload(key, upload_id, None, headers=headers)

Cancel a multipart upload task

You can use the bucket.abort_multipart_upload method to cancel a multipart upload task. If a multipart upload task is canceled, the upload ID cannot be used to upload parts. In addition, the uploaded parts are deleted.

The following sample code provides an example on how to cancel a multipart upload task:

# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, 'https://oss-cn-hangzhou.aliyuncs.com', 'examplebucket')
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt. 
key = 'exampledir/exampleobject.txt'
# Specify the upload ID. You can obtain the upload ID from the response to the InitiateMultipartUpload operation. 
upload_id = 'yourUploadId'

# Cancel the multipart upload task with the specified upload ID. The uploaded parts are deleted. 
bucket.abort_multipart_upload(key, upload_id)

List the uploaded parts

The following sample code provides an example on how to list the uploaded parts:

# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
bucket = oss2.Bucket(auth, 'https://oss-cn-hangzhou.aliyuncs.com', 'yourBucketName')
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt. 
key = 'exampledir/exampleobject.txt'
# Specify the upload ID. You can obtain the upload ID from the response to the InitiateMultipartUpload operation. You must obtain the upload ID before you call the CompleteMultipartUpload operation to complete the multipart upload task. 
upload_id = 'yourUploadId'

# List the uploaded parts that use the specified upload ID. 
for part_info in oss2.PartIterator(bucket, key, upload_id):
    print('part_number:', part_info.part_number)
    print('etag:', part_info.etag)
    print('size:', part_info.size)

List multipart upload tasks

List the multipart upload tasks of a specific object

The following sample code provides an example on how to list the multipart upload tasks of a specific object:

# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, 'https://oss-cn-hangzhou.aliyuncs.com', 'examplebucket')
# Specify the full path of the object. Do not include the bucket name in the full path. Example: exampledir/exampleobject.txt. 
key = 'exampledir/exampleobject.txt'

# List all multipart upload tasks of the object. Each time the init_multipart_upload method is used for the same object, a different upload ID is returned. 
# An upload ID uniquely identifies a multipart upload task. 
for upload_info in oss2.ObjectUploadIterator(bucket, key):
    print('key:', upload_info.key)
    print('upload_id:', upload_info.upload_id)

List all multipart upload tasks in a bucket

The following sample code provides an example on how to list all multipart upload tasks in a bucket:

# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

# The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. Using these credentials to perform operations in OSS is a high-risk operation. We recommend that you use a RAM user to call API operations or perform routine O&M. To create a RAM user, log on to the RAM console. 
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, 'https://oss-cn-hangzhou.aliyuncs.com', 'examplebucket')

# List all multipart upload tasks in the bucket. 
for upload_info in oss2.MultipartUploadIterator(bucket):
    print('key:', upload_info.key)
    print('upload_id:', upload_info.upload_id)

List the multipart upload tasks of objects whose names contain a specific prefix in a bucket

The following sample code provides an example on how to list the multipart upload tasks of objects whose names contain a specific prefix in a bucket:

# -*- coding: utf-8 -*-
import os
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. Using these credentials to perform operations in OSS is a high-risk operation. We recommend that you use a RAM user to call API operations or perform routine O&M. To create a RAM user, log on to the RAM console. 
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint. 
# Specify the name of the bucket. Example: examplebucket. 
bucket = oss2.Bucket(auth, 'https://oss-cn-hangzhou.aliyuncs.com', 'examplebucket')

# List the multipart upload tasks of objects whose names contain the test prefix in the bucket. 
for upload_info in oss2.MultipartUploadIterator(bucket, prefix='test'):
    print('key:', upload_info.key)
    print('upload_id:', upload_info.upload_id)

FAQ

How do I delete parts?

You can use one of the following methods to delete parts:

Automatic deletion
You can configure lifecycle rules to automatically delete parts at a specific time. For more information, see Configure lifecycle rules to delete expired parts.
Manual deletion
You can call the AbortMultipartUpload operation to cancel a multipart upload task and delete the parts. For more information, see AbortMultipartUpload.

References

A multipart upload involves three API operations. For more information about the operations, see the following topics:
For more information about the API operation that you can call to cancel a multipart upload task, see AbortMultipartUpload.
For more information about the API operation that you can call to list uploaded parts, see ListParts.
For more information about the API operation that you can call to list all ongoing multipart upload tasks (initiated but not completed or canceled tasks), see ListMultipartUploads.