The multipart upload feature lets you split a large object into multiple parts. After these parts are uploaded, you can call the CompleteMultipartUpload operation to combine the parts into a complete object.
Notes
The public endpoint for the region is used by default in this topic. To access OSS from other Alibaba Cloud services in the same region, use the corresponding internal endpoint. For more information about the mapping between regions and endpoints of OSS, see Regions and endpoints.
Multipart upload requires the
oss:PutObjectpermission. For more information, see Grant custom permissions to a Resource Access Management (RAM) user.
Multipart upload process
A multipart upload consists of the following three steps:
Initialize a multipart upload.
Call the Client.InitiateMultipartUpload method to obtain an upload ID that is unique to OSS.
Upload parts.
Call the Client.UploadPart method to upload part data.
NoteFor a given upload ID, the part number identifies each part and its relative position in the final object. If you upload a new part using an existing part number, the original part is overwritten.
OSS includes the MD5 hash of the part data in the ETag header of the response.
OSS calculates the MD5 hash of the uploaded data and compares it with the hash calculated by the SDK.
Complete the multipart upload.
Call the Client.CompleteMultipartUpload method to combine these parts into a complete object.
Sample code
The following sample code demonstrates how to split a large local file into multiple parts, upload the parts to a bucket, and then combine these parts into a complete object:
import os
import argparse
import alibabacloud_oss_v2 as oss
# Create a command-line parameter parser for the multipart upload sample.
parser = argparse.ArgumentParser(description="multipart upload sample")
# The required --region parameter, which specifies the region in which the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)
# The required --bucket parameter, which specifies the name of the bucket.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)
# The optional --endpoint parameter, which specifies the endpoint that other services can use to access OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')
# The required --key parameter, which specifies the name of the object.
parser.add_argument('--key', help='The name of the object.', required=True)
# The required --file_path parameter, which specifies the path of the file to upload.
parser.add_argument('--file_path', help='The path of Upload file.', required=True)
def main():
# Parse the command-line parameters.
args = parser.parse_args()
# Obtain access credentials from environment variables for authentication.
credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()
# Use the default configuration of the SDK and set the credentials provider.
cfg = oss.config.load_default()
cfg.credentials_provider = credentials_provider
# Specify the region in which the bucket is located.
cfg.region = args.region
# If an endpoint is provided, specify the endpoint in the configuration object.
if args.endpoint is not None:
cfg.endpoint = args.endpoint
# Use the configuration to create an OSSClient instance.
client = oss.Client(cfg)
# Initiate a multipart upload and get the upload ID.
result = client.initiate_multipart_upload(oss.InitiateMultipartUploadRequest(
bucket=args.bucket,
key=args.key,
))
# Define the size of each part as 5MB.
part_size = 5 * 1024 * 1024
# Obtain the total size of the object to upload.
data_size = os.path.getsize(args.file_path)
# Initialize the part number, starting from 1.
part_number = 1
# Store the uploaded part information.
upload_parts = []
# Open the file in binary mode for reading.
with open(args.file_path, 'rb') as f:
# Traverse the file and upload it in parts based on part_size.
for start in range(0, data_size, part_size):
n = part_size
if start + n > data_size: # Handle the case where the last part may be smaller than part_size.
n = data_size - start
# Create a SectionReader to read a specific portion of the file.
reader = oss.io_utils.SectionReader(oss.io_utils.ReadAtReader(f), start, n)
# Upload parts.
up_result = client.upload_part(oss.UploadPartRequest(
bucket=args.bucket,
key=args.key,
upload_id=result.upload_id,
part_number=part_number,
body=reader
))
# Output the upload result of each part.
print(f'status code: {up_result.status_code},'
f' request id: {up_result.request_id},'
f' part number: {part_number},'
f' content md5: {up_result.content_md5},'
f' etag: {up_result.etag},'
f' hash crc64: {up_result.hash_crc64},'
)
# Save the part upload result to the list.
upload_parts.append(oss.UploadPart(part_number=part_number, etag=up_result.etag))
# Increment the part number.
part_number += 1
# Sort the uploaded parts by part number.
parts = sorted(upload_parts, key=lambda p: p.part_number)
# Send a request to complete the multipart upload and combine all parts into a complete object.
result = client.complete_multipart_upload(oss.CompleteMultipartUploadRequest(
bucket=args.bucket,
key=args.key,
upload_id=result.upload_id,
complete_multipart_upload=oss.CompleteMultipartUpload(
parts=parts
)
))
# The following code is another approach that uses the server-side list method to merge all part data into a complete object.
# This method is suitable when you are not sure if all parts have been successfully uploaded.
# Merge fragmented data into a complete Object through the server-side List method
# result = client.complete_multipart_upload(oss.CompleteMultipartUploadRequest(
# bucket=args.bucket,
# key=args.key,
# upload_id=result.upload_id,
# complete_all='yes'
# ))
# Output the result information of the completed multipart upload.
print(f'status code: {result.status_code},'
f' request id: {result.request_id},'
f' bucket: {result.bucket},'
f' key: {result.key},'
f' location: {result.location},'
f' etag: {result.etag},'
f' encoding type: {result.encoding_type},'
f' hash crc64: {result.hash_crc64},'
f' version id: {result.version_id},'
)
if __name__ == "__main__":
main() # The entry point of the script. When the script is directly run, the main function is called.Common scenarios
Reference
For the complete sample code for multipart upload, see complete_multipart_upload.py.