How to use OSS SDK for Python V2 to copy files - Object Storage Service

This topic describes how to use the Copier module of the OSS SDK for Python V2 to copy files for large file transfers.

Notes

The sample code in this topic uses the region ID cn-hangzhou for the China (Hangzhou) region. A public endpoint is used by default. If you access OSS from other Alibaba Cloud services in the same region, use an internal endpoint. For more information about the mappings between OSS regions and endpoints, see Regions and endpoints.
To copy an object, you must have read permission on the source object and read and write permissions on the destination bucket.
Cross-region copy is not supported. For example, you cannot copy an object from a bucket in the China (Hangzhou) region to a bucket in the China (Qingdao) region.
When you copy an object, ensure that no retention policies are configured for the source and destination buckets. Otherwise, the The object you specified is immutable. error is returned.

Method definition

Introduction to the copy manager

To copy an object from one bucket to another or to modify the properties of an object, you can use the copy operation or the multipart copy operation. Each operation is suitable for a different scenario:

The copy operation (CopyObject) is suitable only for copying an object smaller than 5 GiB.
The multipart copy operation (UploadPartCopy) supports copying an object that is larger than 5 GiB. However, this operation does not support the (x-oss-metadata-directive) or (x-oss-tagging-directive) parameters. When you use this operation, you must specify the metadata and tags to be copied.

The copy manager Copier is a new feature in OSS SDK for Python V2 that provides a universal copy interface. This interface abstracts the underlying implementation details and automatically selects an appropriate copy operation based on the request parameters. The following code shows the common methods of the Copier:

class CopyError(exceptions.BaseError):
  ...

def copier(self, **kwargs) -> Copier:
  ...

def copy(self, request: models.CopyObjectRequest, **kwargs: Any) -> CopyResult:
  ...

Request parameters

Parameter	Type	Description
request	CopyObjectRequest	The request parameters of the operation. For more information, see CopyObjectRequest.
**kwargs	Any	(Optional) Any parameter of the dictionary type.

The following table describes the common parameters of CopyObjectRequest.

Parameter	Type	Description
bucket	str	The name of the destination bucket.
key	str	The name of the destination object.
source_bucket	str	The name of the source bucket.
source_key	str	The name of the source object.
forbid_overwrite	str	Specifies whether to overwrite a destination object that has the same name during the CopyObject operation.
tagging	str	The tags of the object. You can specify multiple tags at a time. Example: TagA=A&TagB=B.
tagging_directive	str	Specifies how to set tags for the destination object. Valid values: Copy (default): copies the tags of the source object to the destination object. Replace: ignores the tags of the source object and uses the tags specified in the request for the destination object.

You can customize the copy behavior of objects by specifying configuration options when you initialize a copy manager instance using client.copier. You can also specify configuration options for each copy call to customize the behavior for a specific object.

Set the configuration parameters for the Copier

copier = client.copier(
    part_size=100 * 1024 * 1024,
)

Set the configuration parameters for each copy request

result = copier.copy(oss.CopyObjectRequest(
        bucket="example_bucket",
        key="example_key",
        source_bucket="example_source_bucket",
        source_key="example_source_key",
    ),
    part_size=100 * 1024 * 1024,
)

The following table describes the common configuration options.

Parameter	Type	Description
part_size	int	The part size. The default value is 64 MiB.
parallel_num	int	The number of concurrent copy tasks. Default value: 3. This parameter specifies the concurrency limit for a single call, not the global concurrency limit.
multipart_copy_threshold	int64	The threshold for multipart copy. The default value is 200 MiB.
leave_parts_on_error	bool	Specifies whether to retain the copied parts if the copy fails. By default, the copied parts are not retained.
disable_shallow_copy	bool	Specifies whether to disable shallow copy. By default, shallow copy is used.

Sample code

The following sample code shows how to copy an object from a source bucket to a destination bucket.

import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line argument parser.
parser = argparse.ArgumentParser(description="copier sample")

# Add the command-line parameter: region (required), which specifies the region where the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# Add the command-line parameter: bucket (required), which specifies the name of the destination bucket.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# Add the command-line parameter: endpoint (optional), which specifies the endpoint for accessing OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# Add the command-line parameter: key (required), which specifies the name of the destination object.
parser.add_argument('--key', help='The name of the object.', required=True)

# Add the command-line parameter: source_key (required), which specifies the name of the source object.
parser.add_argument('--source_key', help='The name of the source address for object.', required=True)

# Add the command-line parameter: source_bucket (required), which specifies the name of the source bucket.
parser.add_argument('--source_bucket', help='The name of the source address for bucket.', required=True)


def main():
    # Parse the command-line arguments.
    args = parser.parse_args()

    # Load credentials from environment variables.
    # Use EnvironmentVariableCredentialsProvider to read the AccessKey ID and AccessKey secret from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations of the SDK.
    cfg = oss.config.load_default()
    cfg.credentials_provider = credentials_provider  # Set the credential provider.
    cfg.region = args.region  # Set the region where the bucket is located.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint  # If an endpoint is provided, set a custom endpoint.

    # Create an OSS client instance.
    client = oss.Client(cfg)

    # Create a Copier instance and perform the object copy operation.
    copier = client.copier()

    # Perform the object copy operation.
    result = copier.copy(
        oss.CopyObjectRequest(
            bucket=args.bucket,          # The name of the destination bucket.
            key=args.key,                # The name of the destination object.
            source_bucket=args.source_bucket,  # The name of the source bucket.
            source_key=args.source_key   # The name of the source object.
        )
    )

    # Print the copy result.
    # Use vars(result) to convert the result object to the dictionary format and print the result.
    print(vars(result))


if __name__ == "__main__":
    main()

Scenarios

Use the copy manager to set the part size and concurrency

The following sample code shows how to configure the parameters of the copy manager to set the part size and concurrency.

import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line argument parser.
parser = argparse.ArgumentParser(description="copier sample")

# Add the command-line parameter: region (required), which specifies the region where the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# Add the command-line parameter: bucket (required), which specifies the name of the destination bucket.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# Add the command-line parameter: endpoint (optional), which specifies the endpoint for accessing OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# Add the command-line parameter: key (required), which specifies the name of the destination object.
parser.add_argument('--key', help='The name of the object.', required=True)

# Add the command-line parameter: source_key (required), which specifies the name of the source object.
parser.add_argument('--source_key', help='The name of the source address for object.', required=True)

# Add the command-line parameter: source_bucket (required), which specifies the name of the source bucket.
parser.add_argument('--source_bucket', help='The name of the source address for bucket.', required=True)


def main():
    # Parse the command-line arguments.
    args = parser.parse_args()

    # Load credentials from environment variables.
    # Use EnvironmentVariableCredentialsProvider to read the AccessKey ID and AccessKey secret from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations of the SDK.
    cfg = oss.config.load_default()
    cfg.credentials_provider = credentials_provider  # Set the credential provider.
    cfg.region = args.region  # Set the region where the bucket is located.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint  # If an endpoint is provided, set a custom endpoint.

    # Create an OSS client instance.
    client = oss.Client(cfg)

    # Create a Copier instance and perform the object copy operation.
    copier = client.copier()

    # Perform the object copy operation.
    result = copier.copy(
        oss.CopyObjectRequest(
            bucket=args.bucket,          # The name of the destination bucket.
            key=args.key,                # The name of the destination object.
            source_bucket=args.source_bucket,  # The name of the source bucket.
            source_key=args.source_key   # The name of the source object.
        ),
        part_size= 1 * 1024 * 1024,          # The part size in bytes. In this example, the part size is set to 1 MiB.
        parallel_num=5,                 # The number of concurrent tasks. This parameter specifies the number of parts that can be copied at the same time.
        leave_parts_on_error=True,      # If the copy fails, retain the copied parts.
    )

    # Print the copy result.
    # Use vars(result) to convert the result object to the dictionary format and print the result.
    print(vars(result))


if __name__ == "__main__":
    main()

Use the copy manager to display a progress bar chart

The following sample code shows how to use a progress bar chart to view the copy progress.

import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line argument parser.
parser = argparse.ArgumentParser(description="copier sample")

# Add the command-line parameter: region (required), which specifies the region where the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# Add the command-line parameter: bucket (required), which specifies the name of the destination bucket.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# Add the command-line parameter: endpoint (optional), which specifies the endpoint for accessing OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# Add the command-line parameter: key (required), which specifies the name of the destination object.
parser.add_argument('--key', help='The name of the object.', required=True)

# Add the command-line parameter: source_key (required), which specifies the name of the source object.
parser.add_argument('--source_key', help='The name of the source address for object.', required=True)

# Add the command-line parameter: source_bucket (required), which specifies the name of the source bucket.
parser.add_argument('--source_bucket', help='The name of the source address for bucket.', required=True)


def main():
    # Parse the command-line arguments.
    args = parser.parse_args()

    # Load credentials from environment variables.
    # Use EnvironmentVariableCredentialsProvider to read the AccessKey ID and AccessKey secret from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations of the SDK.
    cfg = oss.config.load_default()
    cfg.credentials_provider = credentials_provider  # Set the credential provider.
    cfg.region = args.region  # Set the region where the bucket is located.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint  # If an endpoint is provided, set a custom endpoint.

    # Create an OSS client instance.
    client = oss.Client(cfg)

    # Define a dictionary variable named progress_state to save the copy progress. The initial value is 0.
    progress_state = {'saved': 0}
    def _progress_fn(n, written, total):
        # Use the dictionary to store the accumulated number of written bytes to avoid using global variables.
        progress_state['saved'] += n

        # Calculate the current copy percentage. The value is obtained by dividing the number of written bytes by the total number of bytes and rounding down the result.
        rate = int(100 * (float(written) / float(total)))

        # Print the current copy progress. \r indicates returning to the beginning of the line to implement real-time refresh in the command line.
        # end='' indicates no line break, which allows the next print to overwrite the current line.
        print(f'\rCopy progress: {rate}% ', end='')

    # Create a Copier instance and perform the object copy operation.
    copier = client.copier()

    # Perform the object copy operation.
    result = copier.copy(
        oss.CopyObjectRequest(
            bucket=args.bucket,          # The name of the destination bucket.
            key=args.key,                # The name of the destination object.
            source_bucket=args.source_bucket,  # The name of the source bucket.
            source_key=args.source_key,   # The name of the source object.
            progress_fn=_progress_fn,   # Set the progress callback function.
        )
    )

    # Print the copy result.
    # Use vars(result) to convert the result object to the dictionary format and print the result.
    print(vars(result))


if __name__ == "__main__":
    main()

References

For more information about the copy manager, see Developer Guide.
For the complete sample code for the copy manager, see copier.py.