Download objects

Last Updated: Oct 24, 2017

The Python SDK provides two basic download interfaces:

  • Bucket.get_object: The value returned by this interface is an iterable and file-like object.

  • Bucket.get_object_to_file: This interface directly downloads the object to a local file.

Additionally, an easy-to-use interface is also provided:

  • oss2.resumable_download helps with resumable download and concurrent downloads.

Stream download

The following code reads an OSS object altogether and prints the data:

  1. # -*- coding: utf-8 -*-
  2. import oss2
  3. auth = oss2.Auth ('Your AccessKeyID', 'Your AccessKeySecret')
  4. bucket = oss2.Bucket (auth, 'Your endpoint', 'your bucket name')
  5. remote_stream = bucket.get_object('remote.txt')
  6. print(remote_stream.read())

Since it is a file-like object, you can use some library functions conveniently. For example, you can download the object to a local file:

  1. import shutil
  2. remote_stream = bucket.get_object('remote.txt')
  3. with open('local-backup.txt', 'wb') as local_fileobj:
  4. shutil.copyfileobj(remote_stream, local_fileobj)

Since the returned value is an iterable object, you can copy the content to another object in the stream mode:

  1. remote_stream = bucket.get_object('remote.txt')
  2. bucket.put_object('remote-backup.txt', remote_stream)

Download an object to a local file

The following code downloads the remote.txt object on OSS to the local-backup.txt file under the current directory.

  1. bucket.get_object_to_file('remote.txt', 'local-backup.txt')

Specify the downloaded range

You can use the optional parameter byte_range to specify the downloaded range. The byte_range is a tuple, which specifies the starting and ending bytes for the download range. The following code specifies data of the first 100 bytes, which will be downloaded:

  1. remote_stream = bucket.get_object('remote.txt', byte_range=(0, 99))

Note: byte_range indicates the closed interval of the byte offset. The byte offset is counted beginning at 0. For example, (0, 99) indicates the downloaded range is from the 0th byte to the 99th byte (inclusive). Total of 100 bytes can be downloaded.

Progress bar

The downloading interface provides the optional parameter progress_callback to help implement the progress bar feature. The following code enables the progress bar feature display for a simple command line (creating a Python source file):

  1. # -*- coding: utf-8 -*-
  2. from __future__ import print_function
  3. import os, sys
  4. import oss2
  5. auth = oss2.Auth ('Your AccessKeyID', 'Your AccessKeySecret')
  6. bucket = oss2.Bucket (auth, 'Your endpoint', 'your bucket name')
  7. def percentage(consumed_bytes, total_bytes):
  8. if total_bytes:
  9. rate = int(100 * (float(consumed_bytes) / float(total_bytes)))
  10. print('\r{0}% '.format(rate), end='')
  11. sys.stdout.flush()
  12. bucket.get_object_to_file('remote.txt', 'local-backup.txt', progress_callback=percentage)

Note:

  • When the HTTP response header contains no Content-Length header, the second parameter (total_bytes) of progress_callback is None.

  • Complete example code of the progress bar can be found at GitHub.

Resumable upload

When a local object to be downloaded is large, or the network conditions are poor, the download may be interrupted. If you retry the download, it results in to wastage of time and bandwidth. To this end, the Python SDK provides an easy-to-use interface named oss2.resumable_download for you to resume download.

The following code downloads the OSS object remote.txt to the local current directory and renames it to local.txt:

  1. oss2.resumable_download(bucket, 'remote.txt', 'local.txt')

The resumable download process is generally as follows:

  1. Create a local temporary file, the name of which is composed of the original file name plus a random postfix.

  2. Specify the Range header of the HTTP request to read the OSS object by range, and write the data to the corresponding position in the temporary object.

  3. Once the download completes, rename the temporary file to the target file.

During the preceding process the checkpoint information, that is, the downloaded range information is saved on the local disk. If the download is interrupted for some reason and you retry the download later, the checkpoint information is in read status and only the missing part is downloaded.

A fully-custom example is described as follows:

  1. oss2.resumable_download(bucket, 'remote.txt', 'local.txt',
  2. store=oss2.ResumableDownloadStore(root='/tmp'),
  3. multiget_threshold=20*1024*1024,
  4. part_size=10*1024*1024,
  5. num_threads=3)

The parameters are described as follows:

  • The ResumableDownloadStore specifies to save the checkpoint information to the /tmp/.py-oss-download directory.

  • The multiget_threshold specifies to apply the by-range download when the object length is not less than 20 MB.

  • For part_size, we recommended that 10 MB of data is downloaded each time. If the object is too large, then the actual value is greater than the specified value.

  • The num_threads parameter specifies the number of concurrent downloading threads as three.

Consider the following details when you use this function:

  • Avoid calling this function with multiple programs (threads) at the same time for the same source object and target object. Because the checkpoint information tends to be overwritten on the disk, or the temporary object name may conflict.

  • Avoid using a range (number of parts) that is too small. We recommend not to set part_size too low. On the contrary, you can set the value to greater than or equal to the oss2.defaults.multiget_part_size value.

  • If the target object already exists, this function overwrites the object.

Note:

  • Set the oss2.defaults.connection_pool_size to a value greater than or equal to the number of threads.

  • 2.1.0 or a later version is required.

Thank you! We've received your feedback.