Read-only file-like access (Python SDK V2) - Object Storage Service

This topic describes how to use the File-Like method provided by Object Storage Service (OSS) SDK for Python 2.0 to access objects in a bucket.

Usage notes

The sample code in this topic uses the China (Hangzhou) region (cn-hangzhou) as an example. The public endpoint is used by default. If you access OSS from other Alibaba Cloud services in the same region, use an internal endpoint. For more information about OSS regions and their corresponding endpoints, see OSS regions and endpoints.
In this topic, access credentials obtained from environment variables are used. For more information about how to configure access credentials, see Configure access credentials.
To download a file, you must have the oss:GetObject permission. For more information, see Grant custom permissions to a RAM user.

Method

The File-Like method provided by OSS SDK for Python 2.0 lets you access objects in a bucket using the ReadOnlyFile method.

The ReadOnlyFile method provides the single stream and prefetch modes. You can change the number of tasks in parallel to improve the read speed.
The interface includes a built-in reconnection mechanism to handle connection drops, which improves robustness in complex network environments.

class ReadOnlyFile:
    ...


def open_file(self, bucket: str, key: str, version_id: Optional[str] = None, request_payer: Optional[str] = None, **kwargs) -> ReadOnlyFile:
    ...

Request parameters

Parameter	Type	Description
bucket	str	The name of the bucket.
key	str	The name of the object.
version_id	str	The version number of the specified object. This parameter is valid only if multiple versions of the object exist.
request_payer	str	Specifies that if pay-by-requester is enabled, RequestPayer must be set to requester.
**kwargs	Any	Optional. The options that you want to configure, which are of the Dictionary type.

Options of kwargs

Option	Type	Description
enable_prefetch	bool	Specifies whether to enable the prefetch mode. By default, the prefetch mode is disabled.
prefetch_num	int	The number of prefetched chunks. Default value: 3. This option is valid when the prefetch mode is enabled.
chunk_size	int	The size of each prefetched chunk. Default value: 6. Unit: MiB. This option is valid when the prefetch mode is enabled.
prefetch_threshold	int	The number of bytes to be read in sequence before the prefetch mode is enabled. Default value: 20. Unit: MiB. This option is valid when the prefetch mode is enabled.
block_size	int	The size of a chunk. Default value: None.

Response parameters

Parameter	Type	Description
file	ReadOnlyFile	The ReadOnlyFile instance.

Common methods of ReadOnlyFile

Method	Description
close(self)	Closes the file handles to release resources, such as memory and active sockets.
read(self, n=None)	Reads a byte whose length is len(p) from the data source, stores the byte in p, and returns the number of bytes that are read and the encountered errors.
seek(self, pos, whence=0)	Specifies the offset for the next read or write. Valid values of whence: 0: the head. 1: the current offset. 2: the tail.
Stat() (os.FileInfo, error)	Queries object information, including the object size, last modified time, and metadata.

Important

Note: If the prefetch mode is enabled and multiple out-of-order reads occur, the single stream mode is automatically used.

Examples

Read the entire object using the single stream mode

import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line parameter parser for parsing arguments from the command line.
parser = argparse.ArgumentParser(description="open file sample")

# (Required) Specify the region parameter, which specifies the region in which the bucket is located. 
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# (Required) Specify the --bucket parameter, which specifies the name of the bucket.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# (Optional) Specify the --endpoint parameter, which specifies the endpoint that other services can use to access OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# (Required) Specify the --key parameter, which specifies the name of the object.
parser.add_argument('--key', help='The name of the object.', required=True)


def main():
    # Parse the command-line parameters.
    args = parser.parse_args()

    // Obtain access credentials (AccessKey ID and AccessKey secret) from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Load the default configurations of the SDK.
    cfg = oss.config.load_default()

    # Specify the credential provider.
    cfg.credentials_provider = credentials_provider

    # Specify the region in which the bucket is located.
    cfg.region = args.region

    # If a custom endpoint is provided, modify the endpoint parameter.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Use the preceding configurations to initialize the OSSClient instance.
    client = oss.Client(cfg)

    # Use the open_file method to open the object in the bucket.
    result = client.open_file(
        bucket=args.bucket,           # The name of the bucket.
        key=args.key,                # The name of the object.
    )

    # Display the object, read the data, and decode it to the string format.
    print(f'content: {result.read().decode()}')

    # Closes the object to release resources.
    result.close()


if __name__ == "__main__":
    main() # Specify the entry points in the main function of the script when the script is directly run.
    main()

Read the entire object using the prefetch mode

import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line parameter parser for parsing arguments from the command line.
parser = argparse.ArgumentParser(description="open file sample")

# (Required) Specify the region parameter, which specifies the region in which the bucket is located. 
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# (Required) Specify the --bucket parameter, which specifies the name of the bucket.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# (Optional) Specify the --endpoint parameter, which specifies the endpoint that other services can use to access OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# (Required) Specify the --key parameter, which specifies the name of the object.
parser.add_argument('--key', help='The name of the object.', required=True)


def main():
    # Parse the command-line parameters.
    args = parser.parse_args()

    // Obtain access credentials (AccessKey ID and AccessKey secret) from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Load the default configurations of the SDK.
    cfg = oss.config.load_default()

    # Specify the credential provider.
    cfg.credentials_provider = credentials_provider

    # Specify the region in which the bucket is located.
    cfg.region = args.region

    # If a custom endpoint is provided, modify the endpoint parameter.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Use the preceding configurations to initialize the OSSClient instance.
    client = oss.Client(cfg)

    # Use the open_file method to open the object in the bucket.
    result = client.open_file(
        bucket=args.bucket,           # The name of the bucket.
        key=args.key,                # The name of the object.
        enable_prefetch=True,        # Specify whether to enable the prefetch mode. Default value: true.
   )

    # Display the object, read the data, and decode it to the string format.
    print(f'content: {result.read().decode()}')

    # Closes the object to release resources.
    result.close()


if __name__ == "__main__":
    main() # Specify the entry points in the main function of the script when the script is directly run.
    main()

Read remaining data from a specific position using the Seek method

import argparse
import os
import io
import alibabacloud_oss_v2 as oss

# Create a command-line parameter parser for parsing arguments from the command line.
parser = argparse.ArgumentParser(description="open file sample")

# (Required) Specify the region parameter, which specifies the region in which the bucket is located. 
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# (Required) Specify the --bucket parameter, which specifies the name of the bucket.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# (Optional) Specify the --endpoint parameter, which specifies the endpoint that other services can use to access OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# (Required) Specify the --key parameter, which specifies the name of the object.
parser.add_argument('--key', help='The name of the object.', required=True)


def main():
    # Parse the command-line parameters.
    args = parser.parse_args()

    // Obtain access credentials (AccessKey ID and AccessKey secret) from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Load the default configurations of the SDK.
    cfg = oss.config.load_default()

    # Specify the credential provider.
    cfg.credentials_provider = credentials_provider

    # Specify the region in which the bucket is located.
    cfg.region = args.region

    # If a custom endpoint is provided, modify the endpoint parameter.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Use the preceding configurations to initialize the OSSClient instance.
    client = oss.Client(cfg)

    // Initialize the oss.ReadOnlyFile object.
    rf: oss.ReadOnlyFile = None

    # Use the WITH statement to open the object and ensure that the resources are automatically closed after the object read operation is complete.
    with client.open_file(args.bucket, args.key) as f:
        rf = f # Assign the object to the rf variable.

        # Move the file pointer to the specified position. In this example, the file pointer is 1 byte offset to the beginning of the object.
        f.seek(1, os.SEEK_SET)

        # Read the content of the object into a byte stream (BytesIO) in memory.
        copied_stream = io.BytesIO(rf.read())

        # Display the length of the data written to the byte stream.
        print(f'written: {len(copied_stream.getvalue())}')

        # Display the read content. The byte stream is decoded to the string format.
        print(f'read: {copied_stream.getvalue()}')


if __name__ == "__main__":
    main() # Specify the entry points in the main function of the script when the script is directly run.
    main()

References

For more information about File-Like, visit File-Like.