This topic describes how to build an image search service by using OpenSearch Vector Search Edition if no vector data is available.
To implement image search capabilities such as image search based on specified images or text, you can import the image source data to OpenSearch and perform several operations such as image vectorization and vector search in OpenSearch.
Architecture

You can use one of the following three service portfolios to upload images and build an image search engine:
OSS + MaxCompute + OpenSearch Vector Search Edition: You can upload images to an Object Storage Service (OSS) bucket and store the business table data and the image URL corresponding to each data entry in MaxCompute. The image URL refers to the URL of each image in the OSS bucket. Example: /image/1.jpg.
MaxCompute + OpenSearch Vector Search Edition: You can store Base64-encoded images and the corresponding table data in MaxCompute.
API + OpenSearch: You can call an operation of OpenSearch Vector Search Edition to push Base64-encoded images and the corresponding table data to an OpenSearch Vector Search Edition instance.
In this example, the first service portfolio is used to build an image search engine.
Preparations
1. Create an AccessKey pair
When you create an Alibaba Cloud account and log on to the console for the first time, the system prompts you to create an AccessKey pair before you perform subsequent operations.
You must specify an AccessKey pair for your Alibaba Cloud account because the AccessKey pair is required when you create and use an OpenSearch application.
After you create an AccessKey pair for your Alibaba Cloud account, you can create an AccessKey pair for a Resource Access Management (RAM) user. This way, you can access the OpenSearch application as the RAM user. For more information about how to grant permissions to RAM users, see Create and authorize RAM users.
2. Create an OSS bucket

In this example, 1,000 images are uploaded to the OSS bucket.

The following figure shows some of the uploaded images.

Purchase an OpenSearch Vector Search Edition instance
Log on to the OpenSearch console. In the upper-left corner, switch to OpenSearch Vector Search Edition.

In the left-side navigation pane, click Instance Management. On the Instance Management page, click Create Instance.

On the buy page, select Vector Search Edition as Service Edition and configure the following parameters for the instance: Region and Zone, Query Node Quantity,Query Node Type, Data Node Quantity, Data Node Type, Total Storage Space of Single Searcher Worker, VPC, vSwitch, Username, and Password. Then, click Buy Now. The username and password are used for permission verification in queries. We recommend that you do not specify your Alibaba Cloud account and password as the username and password.

Specify the numbers and specifications of Query Result Searcher (QRS) workers and Searcher workers that you want to purchase based on your business requirements. After you specify the specifications, the actual fee is automatically displayed on the buy page.
You must specify the same VPC and vSwitch as those of the Elastic Compute Service (ECS) instance that you use to access the OpenSearch Vector Search Edition instance. Otherwise, the error {'errors':{'code':'403','message':'Forbidden'}} is returned when you access the OpenSearch Vector Search Edition instance.
A free quota of storage space is provided for each Searcher worker. You can purchase more storage in increments of 50 GB. If the total storage space exceeds the free quota, you are charged for the excess storage space.
On the Confirm Order page, check the configurations and agree to the service agreement, and then click Activate Now.

After you purchase the instance, click Console. On the Instances page, you can view the purchased instance.

By default, the name of the instance is automatically set. To modify the name of the instance, click Manage in the Actions column to go to the details page of the instance.

In the Basic Information section of the Instance Details page, click the Modify icon next to the instance name. In the dialog box that appears, modify the instance name as prompted, and click Confirm.

Configure the cluster
On the details page of the purchased instance, you can view that the instance is in the Pending Configuration state and a cluster that contains no instances is automatically deployed for the instance. The numbers and specifications of QRS workers and Searcher workers in the cluster are those you specify when you purchase the instance. You must configure a data source and an index schema and rebuild indexes for the cluster before you can use the search service.

1. Configure a data source
Add a data source. Data from a MaxCompute data source or an API data source is supported. In this example, a MaxCompute data source is used. To configure a MaxCompute data source for the cluster, perform the following operations: In the Configure Data Source step, click Add Data Source. In the Add Data Source panel, select MaxCompute as Data Source Type. Configure the Project, AccessKey ID, AccessKey Secret, Table, and Partition Key parameters. Select Yes or No for Automatic Reindexing based on your business requirements.

After the verification is passed, click OK to add the data source.

2. Configure an index schema
After the data source is configured, click Next to configure the index schema.

Click Create Index Table.

Configure an index table. In the Select Template section, select Vector: Image Search and set the Data parameter to Need to Convert Raw Data to Vector Data.

Configure fields. The following sections describe how to configure fields if you use images stored in an OSS bucket or Base64-encoded images.
Images stored in an OSS bucket
If you select the Vector: Image Search template, OpenSearch automatically generates the following preset fields: id, cate_id, vector, and vector_source_image. The id field specifies the primary key. The cate_id field specifies the category ID. The vector field specifies the vector. The vector_source_image field specifies the path of the stored image. After you configure a MaxCompute data source, the fields that are synchronized from the data source are displayed below the preset fields.
4.1 Configure the vector_source_image field. The field must be of the STRING type.
You can modify the name of the preset field based on the corresponding business table field. Make sure that the advanced settings of the field are correct.

You can configure the following parameters:

Data Type: Set the value to image(path).
Source Content Type: Set the value to oss.
OSS Bucket: Specify the OSS bucket that the current account can access and where the images to be vectorized are stored.
For example, the OSS path of an image is /Test image/image/10031.png. The vector_source_image field must be set to /Test image/image/10031.png.
The following figure shows the OSS path.

The following figure shows the field value in MaxCompute.

4.2 Configure the vector field. The field must be of the FLOAT type.
After the vector_source_image field is configured, the engine can access the corresponding image in OSS. If you want to convert raw data to vector data, you must configure the vector field. You can modify the name of the preset field based on the corresponding business table field. Make sure that the advanced settings of the field are correct.

You can configure the following parameters:
{
"vector_model": "clip",
"vector_modal": "image",
"vector_source_field": "vector_source_image"
}vector_model: the vector model. Valid values:
clip: a model that converts a general image to a vector.
clip_ecom: a model that converts an e-commerce image to a vector.
vector_modal: Set the value to
image.vector_source_field: the OSS path of the image. The default value of this parameter is the value of the vector_source_image field.
4.3 Schema example:
"fields": [
{
"field_name": "id",
"field_type": "INT64",
"compress_type": "equal"
},
{
"user_defined_param": {
"content_type": "oss",
"oss_endpoint": "",
"oss_bucket": "The name of the OSS bucket",
"oss_secret": "The AccessKey ID of the account that you use to access OSS",
"oss_access_key": "The AccessKey secret of the account that you use to access OSS"
},
"field_name": "vector_source_image",
"field_type": "STRING",
"compress_type": "uniq"
},
{
"field_name": "cate_id",
"field_type": "INT64",
"compress_type": "equal"
},
{
"user_defined_param": {
"vector_model": "clip",
"vector_modal": "image",
"vector_source_field": "vector_source_image"
},
"field_name": "vector",
"field_type": "FLOAT",
"multi_value": true
}
]Base64-encoded images
If you select the Vector: Image Search template, OpenSearch automatically generates the following preset fields: id, cate_id, vector, and vector_source_image. The id field specifies the primary key. The cate_id field specifies the category ID. The vector field specifies the vector. The vector_source_image field specifies the path of the Base64-encoded image. After you configure a MaxCompute data source, the fields that are synchronized from the data source are displayed below the preset fields.
4.1 Configure the vector_source_image field. The field must be of the STRING type.
You can modify the name of the preset field based on the corresponding business table field.

Note: You need to manually delete the advanced settings of the vector_source_image field and retain only the braces ({}).
4.2 Configure the vector field.
After the vector_source_image field is configured, the engine can access the corresponding image in OSS. If you want to convert raw data to vector data, you must configure the vector field. You can modify the name of the preset field based on the corresponding business table field. Make sure that the advanced settings of the field are correct.

You can configure the following parameters:
{
"vector_model": "clip",
"vector_modal": "image",
"vector_source_field": "vector_source_image"
}vector_model: Set the value to
clip.vector_modal: Set the value to
image.vector_source_field: the OSS path of the image. The default value of this parameter is the value of the vector_source_image field.
4.3 Schema example:
"fields": [
{
"field_name": "id",
"field_type": "INT64",
"compress_type": "equal"
},
{
"field_name": "vector_source_image",
"field_type": "STRING",
"compress_type": "uniq"
},
{
"field_name": "cate_id",
"field_type": "INT64",
"compress_type": "equal"
},
{
"user_defined_param": {
"vector_model": "clip",
"vector_modal": "image",
"vector_source_field": "vector_source_image"
},
"field_name": "vector",
"field_type": "FLOAT",
"multi_value": true
}
]If you select the Vector: Image Search template, OpenSearch automatically generates the following indexes: the primary key index named id and the vector index named vector. After the advanced settings of the vector_source_image and vector fields are configured, the advanced settings of the vector index are automatically generated. The vector index must be of the CUSTOMIZED type.

You must modify the fields of the vector index based on the field configurations.

The default dimension of the vector that is generated from the image is 512 and cannot be modified.
Schema example:
"indexs": [
{
"index_name": "id",
"index_type": "PRIMARYKEY64",
"index_fields": "id",
"has_primary_key_attribute": true,
"is_primary_key_sorted": false
},
{
"index_name": "vector",
"index_type": "CUSTOMIZED",
"index_fields": [
{
"field_name": "id",
"boost": 1
},
{
"field_name": "vector",
"boost": 1
}
],
"parameters": {
"dimension": "512",
"distance_type": "SquaredEuclidean",
"vector_index_type": "Qc",
"build_index_params": "{\"proxima.qc.builder.quantizer_class\":\"Int8QuantizerConverter\",\"proxima.qc.builder.quantize_by_centroid\":true,\"proxima.qc.builder.optimizer_class\":\"BruteForceBuilder\",\"proxima.qc.builder.thread_count\":10,\"proxima.qc.builder.optimizer_params\":{\"proxima.linear.builder.column_major_order\":true},\"proxima.qc.builder.store_original_features\":false,\"proxima.qc.builder.train_sample_count\":3000000,\"proxima.qc.builder.train_sample_ratio\":0.5}",
"search_index_params": "{\"proxima.qc.searcher.scan_ratio\":0.01}",
"embedding_delimiter": ",",
"major_order": "col",
"linear_build_threshold": "5000",
"min_scan_doc_cnt": "20000",
"enable_recall_report": "false",
"is_embedding_saved": "false",
"enable_rt_build": "false",
"builder_name": "QcBuilder",
"searcher_name": "QcSearcher"
},
"indexer": "aitheta2_indexer"
}
]3. Rebuild an index
After the configuration is complete, click Save Version. In the dialog box that appears, enter the description and click Publish. The description is optional.

After the index is published, click Next to rebuild the index.

Rebuild the index. Configure the parameters based on your index rebuilding requirements and click Next.
API data source

MaxCompute data source

In the left-side pane, choose O&M Center > Change History. On the page that appears, click the Data Source Changes tab. On the Data Source Changes tab, you can view the progress of reindexing. After the reindexing is complete, you can perform a query test.

Perform a test
Syntax
query=image_index: 'The Base64-encoded text used for image search&modal=text&n=10&search_params={}'modal: specifies the modal type. To search for an image based on specified text, set the value to
text. To search for an image similar to a specific image, set the value toimage.n: Specifies the top n vectors that can be returned.
The text must be encoded in the Base64 format.
Search for an image based on specified text
Perform a query test on the Query Test page in the console.

vector:'5pGp5omY6L2mJuWktOeblA==&modal=text&n=10&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANKQuery the 2042.png image in OSS.
If the search text contains special characters, the text must be encoded in the Base64 format. For example, if the search content is motorcycle&helmet, the following result is returned after Base64 encoding: 5pGp5omY6L2mJuWktOeblA==.
Search for an image similar to a specific image
The image size is large after Base64 encoding. Therefore, you cannot search for an image similar to a specific image on the Query Test page in the OpenSearch console. You can use an SDK to search for an image similar to a specific image.
The following sample code provides an example on how to search for an image similar to a specific image:
vector: 'Base64-encoded image&modal=image&n=10&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANKSearch for an image by using an SDK
Run the following command to add a dependency:
pip install alibabacloud-ha3engineRun the following sample code to search for the image:
# -*- coding: utf-8 -*-
from alibabacloud_ha3engine import models, client
from alibabacloud_tea_util import models as util_models
from Tea.exceptions import TeaException, RetryError
def search():
Config = models.Config(
endpoint="The API endpoint. You can view the API endpoint in the API Endpoint section of the Instance Details page",
instance_id="",
protocol="http",
access_user_name="The username that you specify when you purchase the instance.",
access_pass_word="The password that you specify when you purchase the instance."
)
# If the request takes an extended period of time to complete, you can configure this parameter to increase the amount of wait time for the request. Unit: millisecond.
# This parameter can be used in the search_with_options method.
runtime = util_models.RuntimeOptions(
connect_timeout=5000,
read_timeout=10000,
autoretry=False,
ignore_ssl=False,
max_idle_conns=50
)
# Initialize the OpenSearch Vector Search Edition client.
ha3EngineClient = client.Client(Config)
optionsHeaders = {}
try:
# Example 1: Use a query string in OpenSearch Vector Search Edition to query data.
# =====================================================
query_str = "config=hit:4,format:json,fetch_summary_type:pk,qrs_chain:search&&query=image_index: 'The text used for image search&modal=text&n=10&search_params={}'&&cluster=general"
haSearchQuery = models.SearchQuery(query=query_str)
haSearchRequestModel = models.SearchRequestModel(optionsHeaders, haSearchQuery)
hastrSearchResponseModel = ha3EngineClient.search(haSearchRequestModel)
print(hastrSearchResponseModel)
except TeaException as e:
print(f"send request with TeaException : {e}")
except RetryError as e:
print(f"send request with Connection Exception : {e}")For more information about the examples on how to search for an image by using other SDKs, see Developer guide.
Usage notes
If you want to minimize the amount of time consumed by vector search, we recommend that you lock vector indexes into the memory. For more information, see Lock policy for vector indexes.
The field that specifies the OSS path of an image or the path of a Base64-encoded image must be of the STRING type.
The vector index must be of the CUSTOMIZED type.
The HA syntax and the RESTful API are supported. SQL is not supported.