Download objects

Last Updated: Jun 01, 2017

The Python SDK provides two basic download interfaces:

  • Bucket.get_object: The value returned by this interface is a file-like object and also an iterable object.
  • Bucket.get_object_to_file: This interface directly downloads the object to a local file.

In addition, an easy-to-use interface is also provided:

  • oss2.resumable_download helps with resumable download and concurrent downloads.

Stream download

The following code reads an OSS object all at once and prints the data:

  1. # -*- coding: utf-8 -*-
  2. import oss2
  3. auth = oss2.Auth ('Your AccessKeyID', 'Your AccessKeySecret')
  4. bucket = oss2.Bucket (auth, 'Your endpoint', 'your bucket name')
  5. remote_stream = bucket.get_object('remote.txt')
  6. print(remote_stream.read())

Since it is a file-like object, you can use some library functions conveniently. For example, you can download the object to a local file:

  1. import shutil
  2. remote_stream = bucket.get_object('remote.txt')
  3. with open('local-backup.txt', 'wb') as local_fileobj:
  4. shutil.copyfileobj(remote_stream, local_fileobj)

Since the returned value is an iterable object, you can copy the content to another object in the stream mode:

  1. remote_stream = bucket.get_object('remote.txt')
  2. bucket.put_object('remote-backup.txt', remote_stream)

Download an object to a local file

The following code downloads the remote.txt object on the OSS to the local-backup.txt file under the current directory.

  1. bucket.get_object_to_file('remote.txt', 'local-backup.txt')

Specify the downloaded range

You can use the optional parameter ‘byte_range’ to specify the downloaded range. The ‘byte_range’ is a tuple, which specifies the starting and ending bytes for the download range.The following code specifies that data of the first 100 bytes will be downloaded:

  1. remote_stream = bucket.get_object('remote.txt', byte_range=(0, 99))

Note: byte_range indicates the closed interval of the byte offset. The byte offset is counted beginning at 0. For example, (0, 99) indicates that the downloaded range is from the 0th byte to the 99th byte (inclusive). 100 bytes of data will be downloaded in total.

Progress bar

The downloading interface provides the optional parameter ‘progress_callback’ to help implement the progress bar feature. The following code enables the progress bar feature display for a simple command line (creating a Python source file):

  1. # -*- coding: utf-8 -*-
  2. from __future__ import print_function
  3. import os, sys
  4. import oss2
  5. auth = oss2.Auth ('Your AccessKeyID', 'Your AccessKeySecret')
  6. bucket = oss2.Bucket (auth, 'Your endpoint', 'your bucket name')
  7. def percentage(consumed_bytes, total_bytes):
  8. if total_bytes:
  9. rate = int(100 * (float(consumed_bytes) / float(total_bytes)))
  10. print('\r{0}% '.format(rate), end='')
  11. sys.stdout.flush()
  12. bucket.get_object_to_file('remote.txt', 'local-backup.txt', progress_callback=percentage)

Note

  • When the HTTP response header contains no “Content-Length” header, the second parameter (total_bytes) of “progress_callback” is “None”.
  • Complete example code of the progress bar can be found at: GitHub.

Resumable upload

When the local object to be downloaded is large, or the network conditions are poor, the download may be interrupted. If you retry the download, it will waste time and bandwidth. To this end, the Python SDK provides an easy-to-use interface named oss2.resumable_download for you to resume download.

The code below downloads the OSS object remote.txt to the local current directory and renames it to local.txt.

  1. oss2.resumable_download(bucket, 'remote.txt', 'local.txt')

The resumable download process is generally as follows:

  1. Create a local temporary file, the name of which is composed of the original file name plus a random postfix;
  2. Specify the ‘Range’ header of the HTTP request to read the OSS object by range, and write the data to the corresponding position in the temporary object;
  3. After the download is completed, rename the temporary file to the target file.

During the above process, the checkpoint information, that is, the downloaded range information, will be saved on the local disk. If the download is interrupted for some reason and you retrythe download later, the checkpoint information will be read and only the missing part will be downloaded.

A fully-custom example is described below:

  1. oss2.resumable_download(bucket, 'remote.txt', 'local.txt',
  2. store=oss2.ResumableDownloadStore(root='/tmp'),
  3. multiget_threshold=20*1024*1024,
  4. part_size=10*1024*1024,
  5. num_threads=3)

The parameters are described as follows:

  • The ResumableDownloadStore specifies to save the checkpoint information to the ‘/tmp/.py-oss-download’ directory.
  • The multiget_threshold specifies to apply the by-range download when the object length is not less than 20 MB.
  • For part_size, it is recommended that 10 MB of data is downloaded each time. If the object is too large, the actual value will be greater than the specified value.
  • The num_threads parameter specifies the number of concurrent downloading threads as three.

Note the following details whenr using this function:

  • Avoid calling this function with multiple programs (threads) at the same time for the same source object and target object. Because the checkpoint information will be overwritten on the disk, or the temporary object name may conflict.
  • Avoid using a range (number of parts) that is too small, that is to say do not set part_size too low. It is recommended you set the value to greater than or equal to the oss2.defaults.multiget_part_size value.
  • If the target object already exists, this function will overwrite the object.

Note:

  • Set the oss2.defaults.connection_pool_size to a value greater than or equal to the number of threads.
  • 2.1.0 or a later version is required.
Thank you! We've received your feedback.