Upload objects

Last Updated: Oct 10, 2017

OSS provides multiple upload modes. The sizes of data to be uploaded are different in different upload modes. The simple upload (PutObject) and append upload (AppendObject) can be used to upload files smaller than or equal to 5 GB. In the multipart upload mode, the size of each part can be up to 5 GB, and the size of the whole file can be up to 48.8 TB.

Let us describe the simple upload mode first. This section will detail various modes for data provision, that is, the data parameter in the method. Similar data parameters can be found in other upload interfaces. I will not repeat it here.

Simple upload

You can use the Bucket.put_object method to upload a common file.

String upload

The following code uploads strings from the memory:

  1. # -*- coding: utf-8 -*-
  2. import oss2
  3. auth = oss2.Auth ('Your AccessKeyID', 'Your AccessKeySecret')
  4. bucket = oss2.Bucket (auth, 'Your endpoint', 'your bucket name')
  5. bucket.put_object('remote.txt', 'content of object')

You can specify the bytes to be uploaded:

  1. bucket.put_object('remote.txt', b'content of object')

Or you can specify the unicode to be uploaded:

  1. bucket.put_object('remote.txt', u'content of object')

In fact, the second parameter (the parameter name is data) of oss2.Bucket.put_object supports two types of strings:

  • bytes: The string will be directly uploaded.
  • unicode: The string will be uploaded after being automatically converted into UTF-8 bytes.

Upload a local file

  1. with open('local.txt', 'rb') as fileobj:
  2. bucket.put_object('remote.txt', fileobj)

Compared with the string upload code, you can find that the data parameter can either be a string or a file object here.

Note: The file must be opened in a binary way because the number of bytes in the file must be available before upload.

Python SDK also provides the following method to easily upload a local file:

  1. bucket.put_object_from_file('remote.txt', 'local.txt')

Upload a network stream

  1. import requests
  2. input = requests.get('http://www.aliyun.com')
  3. bucket.put_object('aliyun.txt', input)

The requests.get returns an iterable object, and the Python SDK will upload the network stream with the Chunked Encoding method.

Returned value

  1. result = bucket.put_object('remote.txt', 'content of object')
  2. print('http status: {0}'.format(result.status))
  3. print('request_id: {0}'.format(result.request_id))
  4. print('ETag: {0}'.format(result.etag))
  5. print('date: {0}'.format(result.headers['date']))

Responses returned by each OSS server share the following attributes:

  • status: HTTP return code
  • request_id: request ID
  • headers: header of the HTTP response

The etag is an attribute unique to the returned values of put_object.

Note: The request ID uniquely identifies a request. We recommend that you make it a part of the project log.

Summary

As shown in the preceding examples, the upload modes of Python SDK support multiple types of input sources, mainly thanks to the requests libraries of third parties.

In summary, the following types of input data (which can be specified by the data parameter) are supported:

  • bytes string
  • unicode string: The string will be uploaded after being automatically converted into UTF-8 bytes.
  • File object: It must be opened in a binary way, for example, in the “rb” mode.
  • Iterable object: It is uploaded with the Chunked Encoding method.

Note: If the file object can be sought and told, the file will be uploaded from the current position until all the parts are uploaded.

Resumable upload

When the local file to be uploaded is large, or the network condition is poor, the upload may be interrupted. Re-uploading the already uploaded data not only wastes time but also occupies network resources.The Python SDK provides an easy-to-use interface named oss2.resumable_upload for you to resume local file upload:

  1. oss2.resumable_upload(bucket, 'remote.txt', 'local.txt')

The principle is that when the file length is larger than or equal to the value specified by the optional parameter ‘multipart_threshold’, multipart upload is enabled. In this case, a directory named ‘.py-oss-upload’ will be created under the HOME directory,and the current progress will be stored in an object under ‘.py-oss-upload’. You can also use the optional parameter ‘store’ to specify the directory to store the progress.

A fully-custom example is described below:

  1. oss2.resumable_upload(bucket, 'remote.txt', 'local.txt',
  2. store=oss2.ResumableStore(root='/tmp'),
  3. multipart_threshold=100*1024,
  4. part_size=100*1024,
  5. num_threads=4)

The parameters are described as follows:

  • ResumableStore: specify to store the progress to the ‘/tmp/.py-oss-upload’ directory.
  • multipart_threshold: specify to apply multipart upload when the file length is not less than 100 KB.
  • part_size: suggest that the size of each part be 100 KB. When the file is too large, the size of each part may be larger than 100 KB.
  • num_threads: specify the number of concurrent upload threads to be four.

Note:

  • Set the oss2.defaults.connection_pool_size to a value greater than or equal to the number of threads.
  • 2.1.0 or a later version is required.

Multipart upload

When multipart upload is used, you can perform more refined control over the upload task. Multipart upload is applicable for the scenarios where the file size is unknown and the concurrent upload and custom resumable upload are enabled.

One multipart upload task can be performed in three steps:

  1. Initialization (Bucket.init_multipart_upload): Get an Upload ID.
  2. Uploading parts (Bucket.upload_part): Multiple parts can be uploaded concurrently.
  3. Completing multipart upload (Bucket.complete_multipart_upload): The parts are merged to generate an OSS object.

The specific example is as follows:

  1. import os
  2. from oss2 import SizedFileAdapter, determine_part_size
  3. from oss2.models import PartInfo
  4. key = 'remote.txt'
  5. filename = 'local.txt'
  6. total_size = os.path.getsize(filename)
  7. part_size = determine_part_size(total_size, preferred_size=100 * 1024)
  8. # Part initialization
  9. upload_id = bucket.init_multipart_upload(key).upload_id
  10. parts = []
  11. # Upload parts one by one
  12. with open(filename, 'rb') as fileobj:
  13. part_number = 1
  14. offset = 0
  15. while offset < total_size:
  16. num_to_upload = min(part_size, total_size - offset)
  17. result = bucket.upload_part(key, upload_id, part_number,
  18. SizedFileAdapter(fileobj, num_to_upload))
  19. parts.append(PartInfo(part_number, result.etag))
  20. offset += num_to_upload
  21. part_number += 1
  22. # Complete multipart upload
  23. bucket.complete_multipart_upload(key, upload_id, parts)
  24. # Verify the upload
  25. with open(filename, 'rb') as fileobj:
  26. assert bucket.get_object(key).read() == fileobj.read()

Specifically:

  • determine_part_size is a helper function to determine the part size.
  • SizedFileAdapter (fileobj, size) generates a new file object with the same starting offset as the original one, butit can only read a part of the specified size.

Note: The object names (keys) in the three steps must be the same. The Upload IDs during part uploading and upload completion steps must be the same.

Append upload

You can use the Bucket.append_object method to enable append upload:

  1. result = bucket.append_object('append.txt', 0, 'content of first append')
  2. bucket.append_object('append.txt', result.next_position, 'content of second append')

The offset (specified by the position parameter) of the first upload is set to 0. If the object already exists and

  • is not appendable, the ObjectNotAppendable exception will be thrown.
  • is appendable, the PositionNotEqualToLength exception will be thrown when the input offset is not equal to the current file length.

If it is not the first upload, you can use the Bucket.head_object method or the next_position attribute of the returned value of the last append operation to get the position parameter.

Set the HTTP header

During file upload, you can use the headers parameter to set the HTTP header supported by the OSS, such as Content-Type:

  1. bucket.put_object('a.json', '{"age": 1}', headers={'Content-Type': 'application/json; charset=utf-8'})

For the list of supported HTTP standard headers, see PutObject section in the API documentation.

Set custom object metadata

The following code imports an HTTP header prefixed with x-oss-meta- to set custom object metadata for the object:

  1. bucket.put_object('story.txt', 'a novel', headers={'x-oss-meta-author': 'O. Henry'})

Progress bar

The progress bar can be enabled using the optional parameter progress_callback of each upload interface.

The following code uses Bucket.put_object to enable the progress bar of a simple command line (open a Python source file):

  1. # -*- coding: utf-8 -*-
  2. from __future__ import print_function
  3. import os, sys
  4. import oss2
  5. auth = oss2.Auth ('Your AccessKeyID', 'Your AccessKeySecret')
  6. bucket = oss2.Bucket (auth, 'Your endpoint', 'your bucket name')
  7. def percentage(consumed_bytes, total_bytes):
  8. if total_bytes:
  9. rate = int(100 * (float(consumed_bytes) / float(total_bytes)))
  10. print('\r{0}% '.format(rate), end='')
  11. sys.stdout.flush()
  12. bucket.put_object('story.txt', 'a'*1024*1024, progress_callback=percentage)

Note:

  • When the length of the data to be uploaded cannot be determined, the second parameter (total_bytes) of “progress_callback” is “None”.
  • Complete example code for the progress bar can be found at GitHub.

Upload callback

The put_object, put_object_from_file and complete_multipart_upload methods provide upload callback features.

The following code uses Bucket.put_object to implement a simple upload callback feature:

  1. # -*- coding: utf-8 -*-
  2. import json
  3. import base64
  4. import os
  5. import oss2
  6. auth = oss2.Auth ('Your AccessKeyID', 'Your AccessKeySecret')
  7. bucket = oss2.Bucket (auth, 'Your endpoint', 'your bucket name')
  8. # Prepare the callback parameter
  9. callback_dict = {}
  10. callback_dict['callbackUrl'] = 'http://oss-demo.aliyuncs.com:23450'
  11. callback_dict['callbackHost'] = 'oss-cn-hangzhou.aliyuncs.com'
  12. callback_dict['callbackBody'] = 'filename=${object}&size=${size}&mimeType=${mimeType}'
  13. callback_dict['callbackBodyType'] = 'application/x-www-form-urlencoded'
  14. # The callback parameter is in the JSON format and Base64-encoded
  15. callback_param = json.dumps(callback_dict).strip()
  16. base64_callback_body = base64.b64encode(callback_param)
  17. # The encoded callback parameter is placed in the header and passed to the OSS
  18. headers = {'x-oss-callback': base64_callback_body}
  19. # Upload and callback
  20. result = bucket.put_object('story.txt', 'a'*1024*1024, headers)

Note:

  • For detailed descriptions of the upload callback, see Upload Callback.
  • Complete example code of the upload callback can be found at GitHub.

Learn more

Object Management: Information about listing and deleting objects, viewing and changing object HTTP headers, and customizing metadata.

Thank you! We've received your feedback.