Perform scalar search by using Python SDK V2 - Object Storage Service

Scalar search is an indexing feature provided by OSS that is based on object metadata. It lets you specify custom conditions to quickly filter and retrieve object lists. This helps you better manage and understand data structures and facilitates subsequent queries, statistical analysis, and object management.

Usage notes

The sample code in this topic uses the region ID cn-hangzhou for the China (Hangzhou) region as an example. By default, a public endpoint is used. If you want to access OSS from other Alibaba Cloud services in the same region, you must use an internal endpoint. For more information about OSS regions and endpoints, see Regions and endpoints.

Sample code

Enable the metadata management feature

The following sample code shows how to enable the metadata management feature for a specified bucket. When you enable this feature for a bucket, OSS creates a metadata index library for the bucket and indexes the metadata of all objects in it. After the metadata index library is created, OSS performs Near Real-Time incremental scans on new files in the bucket and creates metadata indexes for these files.

import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line parameter parser and add a description.
parser = argparse.ArgumentParser(description="open meta query sample")
# Add the required --region command-line parameter, which specifies the region where the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)
# Add the required --bucket command-line parameter, which specifies the name of the bucket to operate on.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)
# Add the optional --endpoint command-line parameter, which specifies the domain name to use when you access OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

def main():
    # Parse command-line arguments.
    args = parser.parse_args()

    # Load authentication information from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations provided by the SDK.
    cfg = oss.config.load_default()
    # Set the authentication information provider.
    cfg.credentials_provider = credentials_provider
    # Set the region based on command-line arguments.
    cfg.region = args.region
    # If an endpoint is provided, update the endpoint in the configuration.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Create an OSS client.
    client = oss.Client(cfg)

    # Initiate a request to enable meta query.
    result = client.open_meta_query(oss.OpenMetaQueryRequest(
            bucket=args.bucket,
    ))

    # Print the status code and request ID of the request.
    print(f'status code: {result.status_code},'
          f' request id: {result.request_id},'
          )

# Call the main function when run as the main program.
if __name__ == "__main__":
    main()

Obtain metadata index library information

The following sample code shows how to retrieve information about the metadata index library of a specified bucket.

import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line parameter parser and add the necessary parameters.
parser = argparse.ArgumentParser(description="get meta query status sample")
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)
parser.add_argument('--bucket', help='The name of the bucket.', required=True)
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

def main():
    # Parse command-line arguments.
    args = parser.parse_args()

    # Load credential information from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations of the SDK.
    cfg = oss.config.load_default()
    # Set the credential provider to the one obtained from environment variables.
    cfg.credentials_provider = credentials_provider
    # Set the region where the OSS service is located.
    cfg.region = args.region
    # If an endpoint is provided, set a custom endpoint.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Create an OSS client.
    client = oss.Client(cfg)

    # Call the get_meta_query_status method to obtain the metadata query status of the specified bucket.
    result = client.get_meta_query_status(oss.GetMetaQueryStatusRequest(
            bucket=args.bucket,
    ))

    # Print relevant information from the result, including the status code, request ID, creation time, update time, state, and phase.
    print(f'status code: {result.status_code},'
            f' request id: {result.request_id},'
            f' create time: {result.meta_query_status.create_time},'
            f' update time: {result.meta_query_status.update_time},'
            f' state: {result.meta_query_status.state},'
            f' phase: {result.meta_query_status.phase},'
    )

# Call the main function when the script is run directly.
if __name__ == "__main__":
    main()

Query objects that meet specified conditions

The following sample code shows how to use the scalar search feature to query objects that meet specified conditions and list the object information based on a specified field and sorting method.

import argparse
import alibabacloud_oss_v2 as oss

# Create a command-line parameter parser to accept user input parameters.
parser = argparse.ArgumentParser(description="do meta query sample")
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)
parser.add_argument('--bucket', help='The name of the bucket.', required=True)
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

def main():
    # Parse command-line arguments.
    args = parser.parse_args()

    # Load authentication information from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations of the SDK.
    cfg = oss.config.load_default()
    # Set the authentication provider.
    cfg.credentials_provider = credentials_provider
    # Set the region based on the input parameters.
    cfg.region = args.region
    # If an endpoint is provided, update the endpoint in the configuration.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Create an OSS client.
    client = oss.Client(cfg)

    # Execute the meta query operation.
    result = client.do_meta_query(oss.DoMetaQueryRequest(
            bucket=args.bucket,  # Specify the bucket to query.
            meta_query=oss.MetaQuery(  # Define the specific content of the query.
                aggregations=oss.MetaQueryAggregations(  # Define the aggregate operation.
                    aggregations=[  # Aggregation list.
                        oss.MetaQueryAggregation(  # First aggregation: calculate the total size.
                            field='Size',
                            operation='sum',
                        ),
                        oss.MetaQueryAggregation(  # Second aggregation: find the maximum value.
                            field='Size',
                            operation='max',
                        )
                    ],
                ),
                next_token='',  # Pagination token.
                max_results=80369,  # Maximum number of results to return.
                query='{"Field": "Size","Value": "1048576","Operation": "gt"}',  # Query condition.
                sort='Size',  # Sort field.
                order=oss.MetaQueryOrderType.DESC,  # Sort order.
            ),
    ))

    # Output the basic information of the query result.
    print(f'status code: {result.status_code},'
          f' request id: {result.request_id},'
          # The following commented out sections can be enabled as needed to obtain more detailed information.
          # f' files: {result.files},'
          # f' file: {result.files.file},'
          # f' file modified time: {result.files.file.file_modified_time},'
          # f' etag: {result.files.file.etag},'
          # f' server side encryption: {result.files.file.server_side_encryption},'
          # f' oss tagging count: {result.files.file.oss_tagging_count},'
          # f' oss tagging: {result.files.file.oss_tagging},'
          # f' key: {result.files.file.oss_tagging.taggings[0].key},'
          # f' value: {result.files.file.oss_tagging.taggings[0].value},'
          # f' key: {result.files.file.oss_tagging.taggings[1].key},'
          # f' value: {result.files.file.oss_tagging.taggings[1].value},'
          # f' oss user meta: {result.files.file.oss_user_meta},'
          # f' key: {result.files.file.oss_user_meta.user_metas[0].key},'
          # f' value: {result.files.file.oss_user_meta.user_metas[0].value},'
          # f' key: {result.files.file.oss_user_meta.user_metas[1].key},'
          # f' value: {result.files.file.oss_user_meta.user_metas[1].value},'
          # f' filename: {result.files.file.filename},'
          # f' size: {result.files.file.size},'
          # f' oss object type: {result.files.file.oss_object_type},'
          # f' oss storage class: {result.files.file.oss_storage_class},'
          # f' object acl: {result.files.file.object_acl},'
          # f' oss crc64: {result.files.file.oss_crc64},'
          # f' server side encryption customer algorithm: {result.files.file.server_side_encryption_customer_algorithm},'
          # f' aggregations: {result.aggregations},'
          f' field: {result.aggregations.aggregations[0].field},'
          f' operation: {result.aggregations.aggregations[0].operation},'
          f' field: {result.aggregations.aggregations[1].field},'
          f' operation: {result.aggregations.aggregations[1].operation},'
          f' next token: {result.next_token},'
    )

    # If file information exists, print the tags and user-defined metadata.
    if result.files:
        if result.files.file.oss_tagging.taggings:
            for r in result.files.file.oss_tagging.taggings:
                print(f'result: key: {r.key}, value: {r.value}')
        if result.files.file.oss_user_meta.user_metas:
            for r in result.files.file.oss_user_meta.user_metas:
                print(f'result: key: {r.key}, value: {r.value}')
    # Print the results of all aggregations.
    if result.aggregations.aggregations:
        for r in result.aggregations.aggregations:
            print(f'result: field: {r.field}, operation: {r.operation}')

if __name__ == "__main__":
    main()

Disable the metadata management feature

The following sample code shows how to disable the metadata management feature for a specified bucket.

import argparse
import alibabacloud_oss_v2 as oss

# Create an ArgumentParser object to process command-line arguments.
parser = argparse.ArgumentParser(description="close meta query sample")
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)
parser.add_argument('--bucket', help='The name of the bucket.', required=True)
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

def main():
    # Parse command-line arguments.
    args = parser.parse_args()

    # Load credential information from environment variables.
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # Use the default configurations of the SDK.
    cfg = oss.config.load_default()
    # Set the credential provider to the one obtained from environment variables.
    cfg.credentials_provider = credentials_provider
    # Set the region information in the configuration.
    cfg.region = args.region
    # If an endpoint is provided, set the endpoint in the configuration.
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # Create an OSS client.
    client = oss.Client(cfg)

    # Call the close_meta_query method to disable the meta query feature for the specified bucket.
    result = client.close_meta_query(oss.CloseMetaQueryRequest(
            bucket=args.bucket,
    ))

    # Print the status code and request ID of the response.
    print(f'status code: {result.status_code}, request id: {result.request_id}')

# Call the main function when this script is run directly.
if __name__ == "__main__":
    main()

References

For the complete sample code for enabling the metadata management feature, see open_meta_query.py.
For the complete sample code for retrieving metadata index library information, see get_meta_query_status.py.
For the complete sample code for querying objects that meet specified conditions and listing object information by a specified field and sorting method, see do_meta_query.py.
For the complete sample code for disabling the metadata management feature, see close_meta_query.py.