How to use OSS Python SDK V2 for multipart upload - Object Storage Service

The multipart upload feature lets you split a large object into multiple parts. After these parts are uploaded, you can call the CompleteMultipartUpload operation to combine the parts into a complete object.

Notes

The public endpoint for the region is used by default in this topic. To access OSS from other Alibaba Cloud services in the same region, use the corresponding internal endpoint. For more information about the mapping between regions and endpoints of OSS, see Regions and endpoints.
Multipart upload requires the oss:PutObject permission. For more information, see Grant custom permissions to a Resource Access Management (RAM) user.

Multipart upload process

A multipart upload consists of the following three steps:

Initialize a multipart upload.
Call the Client.InitiateMultipartUpload method to obtain an upload ID that is unique to OSS.
Upload parts.
Call the Client.UploadPart method to upload part data.
Note
- For a given upload ID, the part number identifies each part and its relative position in the final object. If you upload a new part using an existing part number, the original part is overwritten.
- OSS includes the MD5 hash of the part data in the ETag header of the response.
- OSS calculates the MD5 hash of the uploaded data and compares it with the hash calculated by the SDK.
Complete the multipart upload.
Call the Client.CompleteMultipartUpload method to combine these parts into a complete object.

Sample code

The following sample code demonstrates how to split a large local file into multiple parts, upload the parts to a bucket, and then combine these parts into a complete object:

import os
import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line parameter parser for the multipart upload sample.
parser = argparse.ArgumentParser(description="multipart upload sample")

# The required --region parameter, which specifies the region in which the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# The required --bucket parameter, which specifies the name of the bucket.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# The optional --endpoint parameter, which specifies the endpoint that other services can use to access OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# The required --key parameter, which specifies the name of the object. 
parser.add_argument('--key', help='The name of the object.', required=True)

# The required --file_path parameter, which specifies the path of the file to upload. 
parser.add_argument('--file_path', help='The path of Upload file.', required=True)


def main():
    # Parse the command-line parameters.
    args = parser.parse_args()

    # Obtain access credentials from environment variables for authentication.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configuration of the SDK and set the credentials provider.
    cfg = oss.config.load_default()
    cfg.credentials_provider = credentials_provider

    # Specify the region in which the bucket is located.
    cfg.region = args.region

    # If an endpoint is provided, specify the endpoint in the configuration object.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Use the configuration to create an OSSClient instance.
    client = oss.Client(cfg)

    # Initiate a multipart upload and get the upload ID.
    result = client.initiate_multipart_upload(oss.InitiateMultipartUploadRequest(
        bucket=args.bucket,
        key=args.key,
    ))

    # Define the size of each part as 5MB.
    part_size = 5 * 1024 * 1024

    # Obtain the total size of the object to upload.
    data_size = os.path.getsize(args.file_path)

    # Initialize the part number, starting from 1.
    part_number = 1

    # Store the uploaded part information.
    upload_parts = []

    # Open the file in binary mode for reading.
    with open(args.file_path, 'rb') as f:
        # Traverse the file and upload it in parts based on part_size.
        for start in range(0, data_size, part_size):
            n = part_size
            if start + n > data_size:  # Handle the case where the last part may be smaller than part_size.
                n = data_size - start

            # Create a SectionReader to read a specific portion of the file.
            reader = oss.io_utils.SectionReader(oss.io_utils.ReadAtReader(f), start, n)

            # Upload parts.
            up_result = client.upload_part(oss.UploadPartRequest(
                bucket=args.bucket,
                key=args.key,
                upload_id=result.upload_id,
                part_number=part_number,
                body=reader
            ))

            # Output the upload result of each part.
            print(f'status code: {up_result.status_code},'
                  f' request id: {up_result.request_id},'
                  f' part number: {part_number},'
                  f' content md5: {up_result.content_md5},'
                  f' etag: {up_result.etag},'
                  f' hash crc64: {up_result.hash_crc64},'
                  )

            # Save the part upload result to the list.
            upload_parts.append(oss.UploadPart(part_number=part_number, etag=up_result.etag))

            # Increment the part number.
            part_number += 1

    # Sort the uploaded parts by part number.
    parts = sorted(upload_parts, key=lambda p: p.part_number)

    # Send a request to complete the multipart upload and combine all parts into a complete object.
    result = client.complete_multipart_upload(oss.CompleteMultipartUploadRequest(
        bucket=args.bucket,
        key=args.key,
        upload_id=result.upload_id,
        complete_multipart_upload=oss.CompleteMultipartUpload(
            parts=parts
        )
    ))

    # The following code is another approach that uses the server-side list method to merge all part data into a complete object.
    # This method is suitable when you are not sure if all parts have been successfully uploaded.
    # Merge fragmented data into a complete Object through the server-side List method
    # result = client.complete_multipart_upload(oss.CompleteMultipartUploadRequest(
    #     bucket=args.bucket,
    #     key=args.key,
    #     upload_id=result.upload_id,
    #     complete_all='yes'
    # ))

    # Output the result information of the completed multipart upload.
    print(f'status code: {result.status_code},'
          f' request id: {result.request_id},'
          f' bucket: {result.bucket},'
          f' key: {result.key},'
          f' location: {result.location},'
          f' etag: {result.etag},'
          f' encoding type: {result.encoding_type},'
          f' hash crc64: {result.hash_crc64},'
          f' version id: {result.version_id},'
    )

if __name__ == "__main__":
    main()  # The entry point of the script. When the script is directly run, the main function is called.

Common scenarios

Configure upload callbacks for multipart upload tasks

If you want to notify your application server after a file is uploaded in parts, you can refer to the following code example.

import os
import argparse
import base64
import alibabacloud_oss_v2 as oss

# Create a command-line argument parser for the multipart upload sample.
parser = argparse.ArgumentParser(description="multipart upload sample")

# The required --region parameter, which specifies the region in which the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# The required --bucket parameter, which specifies the name of the bucket. 
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# The optional --endpoint parameter, which specifies the endpoint that other services can use to access OSS. 
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# The required --key parameter, which specifies the name of the object. 
parser.add_argument('--key', help='The name of the object.', required=True)

# The required --file_path parameter, which specifies the path of the file to upload.
parser.add_argument('--file_path', help='The path of Upload file.', required=True)


def main():
    # Parse the command line parameters.
    args = parser.parse_args()

    # Obtain access credentials from environment variables for authentication.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations of the SDK and set the credentials provider.
    cfg = oss.config.load_default()
    cfg.credentials_provider = credentials_provider

    # Specify the region in which the bucket is located.
    cfg.region = args.region

    # Specify the endpoint if one is provided.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Use the configuration to create an OSSClient instance.
    client = oss.Client(cfg)

    # Initiate a multipart upload and get the upload ID.
    result = client.initiate_multipart_upload(oss.InitiateMultipartUploadRequest(
        bucket=args.bucket,
        key=args.key,
    ))

    # Define the size of each part as 1MB.
    part_size = 1 * 1024 * 1024

    # Obtain the total size of the file to upload.
    data_size = os.path.getsize(args.file_path)

    # Initialize the part number, starting from 1.
    part_number = 1

    # Store the results of each part upload.
    upload_parts = []

    # Open the file in binary mode for reading.
    with open(args.file_path, 'rb') as f:
        # Traverse the file and upload it in parts based on part_size.
        for start in range(0, data_size, part_size):
            n = part_size
            if start + n > data_size:  # Handle the case where the last part may be smaller than part_size.
                n = data_size - start

            # Create a SectionReader to read a specific portion of the file.
            reader = oss.io_utils.SectionReader(oss.io_utils.ReadAtReader(f), start, n)

            # Upload parts.
            up_result = client.upload_part(oss.UploadPartRequest(
                bucket=args.bucket,
                key=args.key,
                upload_id=result.upload_id,
                part_number=part_number,
                body=reader
            ))

            # Output the result information for each part upload.
            print(f'status code: {up_result.status_code},'
                  f' request id: {up_result.request_id},'
                  f' part number: {part_number},'
                  f' content md5: {up_result.content_md5},'
                  f' etag: {up_result.etag},'
                  f' hash crc64: {up_result.hash_crc64},'
                  )

            # Save the part upload result to the list.
            upload_parts.append(oss.UploadPart(part_number=part_number, etag=up_result.etag))

            # Increment the part number.
            part_number += 1

    # Sort the uploaded parts by part number.
    parts = sorted(upload_parts, key=lambda p: p.part_number)

    # Define the callback URL.
    call_back_url = "http://www.example.com/callback"
    # Construct callback parameters: specify the callback URL and request body, encoded in Base64.
    callback=base64.b64encode(str('{\"callbackUrl\":\"' + call_back_url + '\",\"callbackBody\":\"bucket=${bucket}&object=${object}&my_var_1=${x:var1}&my_var_2=${x:var2}\"}').encode()).decode()
    # Construct custom variables using Base64 encoding
    callback_var=base64.b64encode('{\"x:var1\":\"value1\",\"x:var2\":\"value2\"}'.encode()).decode()

    # Send a request to complete the multipart upload and combine all parts into a complete object.
    result = client.complete_multipart_upload(oss.CompleteMultipartUploadRequest(
        bucket=args.bucket,
        key=args.key,
        upload_id=result.upload_id,
        complete_multipart_upload=oss.CompleteMultipartUpload(
            parts=parts
        ),
        callback=callback,
        callback_var=callback_var
    ))

    # The following code is another approach that uses the server-side list method to merge all part data into a complete object.
    # This method is suitable when you are not sure if all parts have been successfully uploaded.
    # Merge fragmented data into a complete Object through the server-side List method
    # result = client.complete_multipart_upload(oss.CompleteMultipartUploadRequest(
    #     bucket=args.bucket,
    #     key=args.key,
    #     upload_id=result.upload_id,
    #     complete_all='yes'
    # ))

    # Output the result information of the completed multipart upload.
    print(f'status code: {result.status_code},'
          f' request id: {result.request_id},'
          f' bucket: {result.bucket},'
          f' key: {result.key},'
          f' location: {result.location},'
          f' etag: {result.etag},'
          f' encoding type: {result.encoding_type},'
          f' hash crc64: {result.hash_crc64},'
          f' version id: {result.version_id},'
    )

if __name__ == "__main__":
    main()  # The entry point of the script. When the script is directly run, the main function is called.

Display the progress bar for multipart upload

import os
import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line argument parser for the multipart upload sample.
parser = argparse.ArgumentParser(description="multipart upload sample")

# The required --region parameter, which specifies the region in which the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# The required --bucket parameter, which specifies the name of the bucket. 
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# The optional --endpoint parameter, which specifies the endpoint that other services can use to access OSS. 
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# The required --key parameter, which specifies the name of the object. 
parser.add_argument('--key', help='The name of the object.', required=True)

# The required --file_path parameter, which specifies the path of the file to upload.
parser.add_argument('--file_path', help='The path of Upload file.', required=True)

def main():
    # Parse the command-line parameters.
    args = parser.parse_args()

    # Obtain access credentials from environment variables for authentication.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations of the SDK and set the credentials provider.
    cfg = oss.config.load_default()
    cfg.credentials_provider = credentials_provider

    # Specify the region in which the bucket is located.
    cfg.region = args.region

    # If the endpoint parameter is provided, specify the endpoint.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Use the configurations to create an OSSClient instance.
    client = oss.Client(cfg)

    # Define a dictionary variable progress_state to store the upload progress status, with an initial value of 0.
    progress_state = {'saved': 0}
    def _progress_fn(n, written, total):
        # Use a dictionary to store the accumulated written bytes, avoiding the use of global variables.
        progress_state['saved'] += n

        # Calculate the current upload percentage by dividing the written bytes by the total bytes and rounding.
        rate = int(100 * (float(written) / float(total)))

        # Print the current upload progress, \r means return to the beginning of the line, achieving real-time refresh effect in the command line.
        # end='' means no line break, allowing the next print to overwrite the current line.
        print(f'\rUpload progress: {rate}% ', end='')

    # Initiate a multipart upload and get the upload ID.
    result = client.initiate_multipart_upload(oss.InitiateMultipartUploadRequest(
        bucket=args.bucket,
        key=args.key,
    ))

    # Define the size of each part as 5MB.
    part_size = 5 * 1024 * 1024

    # Obtain the total size of the file to upload.
    data_size = os.path.getsize(args.file_path)

    # Initialize the part number, starting from 1.
    part_number = 1

    # Store the results of each part upload.
    upload_parts = []

    # Open the file in binary mode for reading.
    with open(args.file_path, 'rb') as f:
        # Traverse the file and upload it in parts based on part_size.
        for start in range(0, data_size, part_size):
            n = part_size
            if start + n > data_size:  # Handle the case where the last part may be smaller than part_size.
                n = data_size - start

            # Create a SectionReader to read a specific portion of the file.
            reader = oss.io_utils.SectionReader(oss.io_utils.ReadAtReader(f), start, n)

            # Upload parts.
            up_result = client.upload_part(oss.UploadPartRequest(
                bucket=args.bucket,
                key=args.key,
                upload_id=result.upload_id,
                part_number=part_number,
                body=reader,
                progress_fn=_progress_fn
            ))

            # Output the result of each part upload.
            print(f'status code: {up_result.status_code},'
                  f' request id: {up_result.request_id},'
                  f' part number: {part_number},'
                  f' content md5: {up_result.content_md5},'
                  f' etag: {up_result.etag},'
                  f' hash crc64: {up_result.hash_crc64},'
                  )

            # Save the part upload result to the list.
            upload_parts.append(oss.UploadPart(part_number=part_number, etag=up_result.etag))

            # Increment the part number.
            part_number += 1

    # Sort the uploaded parts by part number.
    parts = sorted(upload_parts, key=lambda p: p.part_number)

    # Send a request to complete the multipart upload and combine all parts into a complete object.
    result = client.complete_multipart_upload(oss.CompleteMultipartUploadRequest(
        bucket=args.bucket,
        key=args.key,
        upload_id=result.upload_id,
        complete_multipart_upload=oss.CompleteMultipartUpload(
            parts=parts
        )
    ))

    # The following code is another approach that uses the server-side list method to merge all part data into a complete object.
    # This method is suitable when you are not sure if all parts have been successfully uploaded.
    # Merge fragmented data into a complete Object through the server-side List method
    # result = client.complete_multipart_upload(oss.CompleteMultipartUploadRequest(
    #     bucket=args.bucket,
    #     key=args.key,
    #     upload_id=result.upload_id,
    #     complete_all='yes'
    # ))

    # Output the result information of the completed multipart upload.
    print(f'status code: {result.status_code},'
          f' request id: {result.request_id},'
          f' bucket: {result.bucket},'
          f' key: {result.key},'
          f' location: {result.location},'
          f' etag: {result.etag},'
          f' encoding type: {result.encoding_type},'
          f' hash crc64: {result.hash_crc64},'
          f' version id: {result.version_id},'
    )

if __name__ == "__main__":
    main()  # The entry point of the script. When the script is directly run, the main function is called.

Reference

For the complete sample code for multipart upload, see complete_multipart_upload.py.