With OSS AISearch, you can quickly locate object files among a massive number of objects based on conditions such as semantic content, OSS metadata, multimedia metadata, object ETags, tags, and custom metadata to improve retrieval efficiency. This topic describes how to use Python SDK V2 for vector retrieval.
Notes
The sample code in this topic uses the China (Hangzhou) region (
cn-hangzhou) as an example. By default, the public endpoint is used. If you want to access OSS from other Alibaba Cloud products in the same region, use the internal endpoint. For more information about the mappings between OSS-supported regions and endpoints, see OSS regions and endpoints.The examples in this topic demonstrate how to read access credentials from environment variables. For more information about how to configure access credentials, see Configure access credentials.
Sample code
Enable the AISearch feature
The following code shows how to enable the AISearch feature for a specified bucket.
import argparse
import alibabacloud_oss_v2 as oss
# Create a command-line argument parser and add a description.
parser = argparse.ArgumentParser(description="open meta query sample")
# Add the required command-line argument --region to specify the region where the bucket is located.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)
# Add the required command-line argument --bucket to specify the name of the bucket to operate on.
parser.add_argument('--bucket', help='The name of the bucket.', required=True)
# Add the optional command-line argument --endpoint to specify the domain name used to access OSS.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')
def main():
# Parse command-line arguments.
args = parser.parse_args()
# Load authentication information from environment variables.
credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()
# Use the default configurations provided by the SDK.
cfg = oss.config.load_default()
# Set the authentication information provider.
cfg.credentials_provider = credentials_provider
# Set the region based on command-line arguments.
cfg.region = args.region
# If an endpoint is provided, update the endpoint in the configuration.
if args.endpoint is not None:
cfg.endpoint = args.endpoint
# Create an OSS client.
client = oss.Client(cfg)
# Build an OpenMetaQueryRequest to enable the AISearch feature for the bucket.
result = client.open_meta_query(oss.OpenMetaQueryRequest(
bucket=args.bucket,
mode='semantic',# Set to "semantic" to select AISearch.
))
# Print the status code and request ID of the request.
print(f'status code: {result.status_code},'
f' request id: {result.request_id},'
)
# Call the main function when running as the main program.
if __name__ == "__main__":
main()
Obtain metadata index information
The following code shows how to obtain the metadata index information for a specified bucket.
import argparse
import alibabacloud_oss_v2 as oss
# Create a command-line argument parser and add the necessary arguments.
parser = argparse.ArgumentParser(description="get meta query status sample")
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)
parser.add_argument('--bucket', help='The name of the bucket.', required=True)
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')
def main():
# Parse command-line arguments.
args = parser.parse_args()
# Load credential information from environment variables.
credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()
# Use the default configurations of the SDK.
cfg = oss.config.load_default()
# Set the credential provider to the credentials obtained from environment variables.
cfg.credentials_provider = credentials_provider
# Set the region where the OSS service is located.
cfg.region = args.region
# If an endpoint is provided, set a custom endpoint.
if args.endpoint is not None:
cfg.endpoint = args.endpoint
# Create an OSS client.
client = oss.Client(cfg)
# Call the get_meta_query_status method to obtain the metadata index information of the specified bucket.
result = client.get_meta_query_status(oss.GetMetaQueryStatusRequest(
bucket=args.bucket,
))
# Print the relevant information in the returned result.
print(f'status code: {result.status_code},'
f' request id: {result.request_id},'
f' create time: {result.meta_query_status.create_time},'
f' update time: {result.meta_query_status.update_time},'
f' state: {result.meta_query_status.state},'
f' phase: {result.meta_query_status.phase},'
)
# Call the main function when the script is run directly.
if __name__ == "__main__":
main()
Query objects that meet specified conditions
The following code shows how to use the AISearch feature to query objects that match the specified semantic content.
import argparse
import alibabacloud_oss_v2 as oss
# Create a command-line argument parser to process command-line input.
parser = argparse.ArgumentParser(description="do meta query semantic sample")
# Add the necessary command-line arguments.
parser.add_argument('--region', help='The region in which the bucket is located.', required=True) # The region where the bucket is located.
parser.add_argument('--bucket', help='The name of the bucket.', required=True) # The name of the bucket.
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS') # The OSS domain name, which is optional.
def main():
# Parse command-line arguments.
args = parser.parse_args()
# Load access credentials from environment variables.
# Before running, you need to set the environment variables: OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()
# Load the default SDK configurations.
cfg = oss.config.load_default()
# Set the credential provider.
cfg.credentials_provider = credentials_provider
# Set the region.
cfg.region = args.region
# If an endpoint is provided, update the endpoint in the configuration.
if args.endpoint is not None:
cfg.endpoint = args.endpoint
# Create an OSS client instance.
client = oss.Client(cfg)
# Initiate a metadata query request in AISearch mode.
result = client.do_meta_query(oss.DoMetaQueryRequest(
bucket=args.bucket,
mode='semantic',
meta_query=oss.MetaQuery(
max_results=1000,
query='An aerial view of a snow-covered forest',
order='desc',
media_types=oss.MetaQueryMediaTypes(
media_type=['image']
),
simple_query='{"Operation":"gt", "Field": "Size", "Value": "30"}',
),
))
# Print the retrieval results.
print(vars(result))
if __name__ == "__main__":
main()
Disable the AISearch feature
The following code shows how to disable the AISearch feature for a specified bucket.
import argparse
import alibabacloud_oss_v2 as oss
# Create a command-line argument parser to process command-line arguments.
parser = argparse.ArgumentParser(description="close meta query sample")
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)
parser.add_argument('--bucket', help='The name of the bucket.', required=True)
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')
def main():
# Parse command-line arguments.
args = parser.parse_args()
# Load credential information from environment variables.
credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()
# Use the default configurations of the SDK.
cfg = oss.config.load_default()
# Set the credential provider to the credentials obtained from environment variables.
cfg.credentials_provider = credentials_provider
# Set the region information in the configuration.
cfg.region = args.region
# If an endpoint is provided, set the endpoint in the configuration.
if args.endpoint is not None:
cfg.endpoint = args.endpoint
# Create an OSS client.
client = oss.Client(cfg)
# Call the close_meta_query method to disable the retrieval feature for the bucket.
result = client.close_meta_query(oss.CloseMetaQueryRequest(
bucket=args.bucket,
))
# Print the status code and request ID of the response.
print(f'status code: {result.status_code}, request id: {result.request_id}')
# Execute the main function when this script is run directly.
if __name__ == "__main__":
main()
References
For more information about AISearch, see AISearch.
For more information about the API operations for data indexing, see Data Indexing.
For the complete sample code for AISearch, see the GitHub example.