All Products
Search
Document Center

:Solution for image search

Last Updated:Mar 31, 2025

This topic describes how to build an image search service by using OpenSearch Vector Search Edition if no vector data is available.

To implement image search capabilities such as image search based on specified images or text, you can import the image source data to OpenSearch and perform several operations such as image vectorization and vector search in OpenSearch.

Architecture

image_409b3ae0cc76

You can use one of the following three service portfolios to upload images and build an image search engine:

  • OSS + MaxCompute + OpenSearch Vector Search Edition: You can upload images to an Object Storage Service (OSS) bucket and store the business table data and the image URL corresponding to each data entry in MaxCompute. The image URL refers to the URL of each image in the OSS bucket. Example: /image/1.jpg.

  • MaxCompute + OpenSearch Vector Search Edition: You can store Base64-encoded images and the corresponding table data in MaxCompute.

  • API + OpenSearch: You can call an operation of OpenSearch Vector Search Edition to push Base64-encoded images and the corresponding table data to an OpenSearch Vector Search Edition instance.

In this example, the first service portfolio is used to build an image search engine.

Preparations

1. Create an AccessKey pair

When you create an Alibaba Cloud account and log on to the console for the first time, the system prompts you to create an AccessKey pair before you perform subsequent operations.

  • You must specify an AccessKey pair for your Alibaba Cloud account because the AccessKey pair is required when you create and use an OpenSearch application.

  • After you create an AccessKey pair for your Alibaba Cloud account, you can create an AccessKey pair for a Resource Access Management (RAM) user. This way, you can access the OpenSearch application as the RAM user. For more information about how to grant permissions to RAM users, see Create and authorize RAM users.

2. Create an OSS bucket

image_409ce896ccx9

  1. Activate OSS.

  2. Create a bucket.

  3. Upload an object.

In this example, 1,000 images are uploaded to the OSS bucket.

image_409d36b4ccr4

The following figure shows some of the uploaded images.

image_409d84d0cc9c

Purchase an OpenSearch Vector Search Edition instance

  1. Log on to the OpenSearch console. In the upper-left corner, switch to OpenSearch Vector Search Edition.

image_409dfa00cc4q

  1. In the left-side navigation pane, click Instance Management. On the Instance Management page, click Create Instance.

image_409e2114cc9c

  1. On the buy page, select Vector Search Edition as Service Edition and configure the following parameters for the instance: Region and Zone, Query Node Quantity,Query Node Type, Data Node Quantity, Data Node Type, Total Storage Space of Single Searcher Worker, VPC, vSwitch, Username, and Password. Then, click Buy Now. The username and password are used for permission verification in queries. We recommend that you do not specify your Alibaba Cloud account and password as the username and password.

image_409e6f32ccq2

Note
  • Specify the numbers and specifications of Query Result Searcher (QRS) workers and Searcher workers that you want to purchase based on your business requirements. After you specify the specifications, the actual fee is automatically displayed on the buy page.

  • You must specify the same VPC and vSwitch as those of the Elastic Compute Service (ECS) instance that you use to access the OpenSearch Vector Search Edition instance. Otherwise, the error {'errors':{'code':'403','message':'Forbidden'}} is returned when you access the OpenSearch Vector Search Edition instance.

  • A free quota of storage space is provided for each Searcher worker. You can purchase more storage in increments of 50 GB. If the total storage space exceeds the free quota, you are charged for the excess storage space.

  1. On the Confirm Order page, check the configurations and agree to the service agreement, and then click Activate Now.

image_79267af0ccvl

  1. After you purchase the instance, click Console. On the Instances page, you can view the purchased instance.

image_792828a1ccyh

  1. By default, the name of the instance is automatically set. To modify the name of the instance, click Manage in the Actions column to go to the details page of the instance.

image_792876c0ccdd

In the Basic Information section of the Instance Details page, click the Modify icon next to the instance name. In the dialog box that appears, modify the instance name as prompted, and click Confirm.

image_79289dd0ccfc

Configure the cluster

On the details page of the purchased instance, you can view that the instance is in the Pending Configuration state and a cluster that contains no instances is automatically deployed for the instance. The numbers and specifications of QRS workers and Searcher workers in the cluster are those you specify when you purchase the instance. You must configure a data source and an index schema and rebuild indexes for the cluster before you can use the search service.

image_7f9bc520ccuv

1. Configure a data source

  1. Add a data source. Data from a MaxCompute data source or an API data source is supported. In this example, a MaxCompute data source is used. To configure a MaxCompute data source for the cluster, perform the following operations: In the Configure Data Source step, click Add Data Source. In the Add Data Source panel, select MaxCompute as Data Source Type. Configure the Project, AccessKey ID, AccessKey Secret, Table, and Partition Key parameters. Select Yes or No for Automatic Reindexing based on your business requirements.

image_7f9d4bc1cc3v

After the verification is passed, click OK to add the data source.

image_7f9d72d0ccza

2. Configure an index schema

  1. After the data source is configured, click Next to configure the index schema.

image_c1ffecc2ccrr

  1. Click Create Index Table.

image_c20061f1ccgn

  1. Configure an index table. In the Select Template section, select Vector: Image Search and set the Data parameter to Need to Convert Raw Data to Vector Data.

image_c2008903ccfv

  1. Configure fields. The following sections describe how to configure fields if you use images stored in an OSS bucket or Base64-encoded images.

Images stored in an OSS bucket

If you select the Vector: Image Search template, OpenSearch automatically generates the following preset fields: id, cate_id, vector, and vector_source_image. The id field specifies the primary key. The cate_id field specifies the category ID. The vector field specifies the vector. The vector_source_image field specifies the path of the stored image. After you configure a MaxCompute data source, the fields that are synchronized from the data source are displayed below the preset fields.

4.1 Configure the vector_source_image field. The field must be of the STRING type.

You can modify the name of the preset field based on the corresponding business table field. Make sure that the advanced settings of the field are correct.

image.png

You can configure the following parameters:

image.png

  • Data Type: Set the value to image(path).

  • Source Content Type: Set the value to oss.

  • OSS Bucket: Specify the OSS bucket that the current account can access and where the images to be vectorized are stored.

Important

For example, the OSS path of an image is /Test image/image/10031.png. The vector_source_image field must be set to /Test image/image/10031.png.

The following figure shows the OSS path.

image_d3884c30ccrm

The following figure shows the field value in MaxCompute.

image_d389abc0ccqd

4.2 Configure the vector field. The field must be of the FLOAT type.

After the vector_source_image field is configured, the engine can access the corresponding image in OSS. If you want to convert raw data to vector data, you must configure the vector field. You can modify the name of the preset field based on the corresponding business table field. Make sure that the advanced settings of the field are correct.

image

You can configure the following parameters:

{
  "vector_model": "clip",
  "vector_modal": "image",
  "vector_source_field": "vector_source_image"
}
  • vector_model: the vector model. Valid values:

    • clip: a model that converts a general image to a vector.

    • clip_ecom: a model that converts an e-commerce image to a vector.

  • vector_modal: Set the value to image.

  • vector_source_field: the OSS path of the image. The default value of this parameter is the value of the vector_source_image field.

4.3 Schema example:

"fields": [
    {
      "field_name": "id",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "user_defined_param": {
        "content_type": "oss",
        "oss_endpoint": "",
        "oss_bucket": "The name of the OSS bucket",
        "oss_secret": "The AccessKey ID of the account that you use to access OSS",
        "oss_access_key": "The AccessKey secret of the account that you use to access OSS"
      },
      "field_name": "vector_source_image",
      "field_type": "STRING",
      "compress_type": "uniq"
    },
    {
      "field_name": "cate_id",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "user_defined_param": {
        "vector_model": "clip",
        "vector_modal": "image",
        "vector_source_field": "vector_source_image"
      },
      "field_name": "vector",
      "field_type": "FLOAT",
      "multi_value": true
    }
  ]

Base64-encoded images

If you select the Vector: Image Search template, OpenSearch automatically generates the following preset fields: id, cate_id, vector, and vector_source_image. The id field specifies the primary key. The cate_id field specifies the category ID. The vector field specifies the vector. The vector_source_image field specifies the path of the Base64-encoded image. After you configure a MaxCompute data source, the fields that are synchronized from the data source are displayed below the preset fields.

4.1 Configure the vector_source_image field. The field must be of the STRING type.

You can modify the name of the preset field based on the corresponding business table field.

image

Note: You need to manually delete the advanced settings of the vector_source_image field and retain only the braces ({}).

4.2 Configure the vector field.

After the vector_source_image field is configured, the engine can access the corresponding image in OSS. If you want to convert raw data to vector data, you must configure the vector field. You can modify the name of the preset field based on the corresponding business table field. Make sure that the advanced settings of the field are correct.

image

You can configure the following parameters:

{
  "vector_model": "clip",
  "vector_modal": "image",
  "vector_source_field": "vector_source_image"
}
  • vector_model: Set the value to clip.

  • vector_modal: Set the value to image.

  • vector_source_field: the OSS path of the image. The default value of this parameter is the value of the vector_source_image field.

4.3 Schema example:

 "fields": [
    {
      "field_name": "id",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "field_name": "vector_source_image",
      "field_type": "STRING",
      "compress_type": "uniq"
    },
    {
      "field_name": "cate_id",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "user_defined_param": {
        "vector_model": "clip",
        "vector_modal": "image",
        "vector_source_field": "vector_source_image"
      },
      "field_name": "vector",
      "field_type": "FLOAT",
      "multi_value": true
    }
  ]

  1. If you select the Vector: Image Search template, OpenSearch automatically generates the following indexes: the primary key index named id and the vector index named vector. After the advanced settings of the vector_source_image and vector fields are configured, the advanced settings of the vector index are automatically generated. The vector index must be of the CUSTOMIZED type.

image_fb435942cc5k

You must modify the fields of the vector index based on the field configurations.

image_fb44b8d0ccwq

Note

The default dimension of the vector that is generated from the image is 512 and cannot be modified.

Schema example:

"indexs": [
    {
      "index_name": "id",
      "index_type": "PRIMARYKEY64",
      "index_fields": "id",
      "has_primary_key_attribute": true,
      "is_primary_key_sorted": false
    },
    {
      "index_name": "vector",
      "index_type": "CUSTOMIZED",
      "index_fields": [
        {
          "field_name": "id",
          "boost": 1
        },
        {
          "field_name": "vector",
          "boost": 1
        }
      ],
      "parameters": {
        "dimension": "512",
        "distance_type": "SquaredEuclidean",
        "vector_index_type": "Qc",
        "build_index_params": "{\"proxima.qc.builder.quantizer_class\":\"Int8QuantizerConverter\",\"proxima.qc.builder.quantize_by_centroid\":true,\"proxima.qc.builder.optimizer_class\":\"BruteForceBuilder\",\"proxima.qc.builder.thread_count\":10,\"proxima.qc.builder.optimizer_params\":{\"proxima.linear.builder.column_major_order\":true},\"proxima.qc.builder.store_original_features\":false,\"proxima.qc.builder.train_sample_count\":3000000,\"proxima.qc.builder.train_sample_ratio\":0.5}",
        "search_index_params": "{\"proxima.qc.searcher.scan_ratio\":0.01}",
        "embedding_delimiter": ",",
        "major_order": "col",
        "linear_build_threshold": "5000",
        "min_scan_doc_cnt": "20000",
        "enable_recall_report": "false",
        "is_embedding_saved": "false",
        "enable_rt_build": "false",
        "builder_name": "QcBuilder",
        "searcher_name": "QcSearcher"
      },
      "indexer": "aitheta2_indexer"
    }
  ]

3. Rebuild an index

  1. After the configuration is complete, click Save Version. In the dialog box that appears, enter the description and click Publish. The description is optional.

image_0b9b00e3cc1h

After the index is published, click Next to rebuild the index.

image_0b9c6070ccv1

  1. Rebuild the index. Configure the parameters based on your index rebuilding requirements and click Next.

  • API data source

image_0b9c8782ccxe

  • MaxCompute data source

image_0b9cae92cc63

  1. In the left-side pane, choose O&M Center > Change History. On the page that appears, click the Data Source Changes tab. On the Data Source Changes tab, you can view the progress of reindexing. After the reindexing is complete, you can perform a query test.

image_0b9cd5a4ccs7

Perform a test

Syntax

query=image_index: 'The Base64-encoded text used for image search&modal=text&n=10&search_params={}'
  • modal: specifies the modal type. To search for an image based on specified text, set the value to text. To search for an image similar to a specific image, set the value to image.

  • n: Specifies the top n vectors that can be returned.

  • The text must be encoded in the Base64 format.

Search for an image based on specified text

Perform a query test on the Query Test page in the console.

image_1722f800cc62

vector:'5pGp5omY6L2mJuWktOeblA==&modal=text&n=10&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANK

Query the 2042.png image in OSS.image_1d338cf0ccyb

Note

If the search text contains special characters, the text must be encoded in the Base64 format. For example, if the search content is motorcycle&helmet, the following result is returned after Base64 encoding: 5pGp5omY6L2mJuWktOeblA==.

Search for an image similar to a specific image

The image size is large after Base64 encoding. Therefore, you cannot search for an image similar to a specific image on the Query Test page in the OpenSearch console. You can use an SDK to search for an image similar to a specific image.

The following sample code provides an example on how to search for an image similar to a specific image:

vector: 'Base64-encoded image&modal=image&n=10&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANK

Search for an image by using an SDK

Run the following command to add a dependency:

pip install alibabacloud-ha3engine

Run the following sample code to search for the image:

# -*- coding: utf-8 -*-


from alibabacloud_ha3engine import models, client
from alibabacloud_tea_util import models as util_models
from Tea.exceptions import TeaException, RetryError
def search():
    Config = models.Config(
        endpoint="The API endpoint. You can view the API endpoint in the API Endpoint section of the Instance Details page",
        instance_id="",
        protocol="http",
        access_user_name="The username that you specify when you purchase the instance.",
        access_pass_word="The password that you specify when you purchase the instance."

    )

    # If the request takes an extended period of time to complete, you can configure this parameter to increase the amount of wait time for the request. Unit: millisecond.
    # This parameter can be used in the search_with_options method.
    runtime = util_models.RuntimeOptions(
        connect_timeout=5000,
        read_timeout=10000,
        autoretry=False,
        ignore_ssl=False,
        max_idle_conns=50
    )

    # Initialize the OpenSearch Vector Search Edition client.
    ha3EngineClient = client.Client(Config)

    optionsHeaders = {}

    try:
        # Example 1: Use a query string in OpenSearch Vector Search Edition to query data.
        # =====================================================
        query_str = "config=hit:4,format:json,fetch_summary_type:pk,qrs_chain:search&&query=image_index: 'The text used for image search&modal=text&n=10&search_params={}'&&cluster=general"
        haSearchQuery = models.SearchQuery(query=query_str)
        haSearchRequestModel = models.SearchRequestModel(optionsHeaders, haSearchQuery)
        hastrSearchResponseModel = ha3EngineClient.search(haSearchRequestModel)
        print(hastrSearchResponseModel)
    except TeaException as e:
        print(f"send request with TeaException : {e}")
    except RetryError as e:
        print(f"send request with Connection Exception  : {e}")
Note

For more information about the examples on how to search for an image by using other SDKs, see Developer guide.

Usage notes

  • If you want to minimize the amount of time consumed by vector search, we recommend that you lock vector indexes into the memory. For more information, see Lock policy for vector indexes.

  • The field that specifies the OSS path of an image or the path of a Base64-encoded image must be of the STRING type.

  • The vector index must be of the CUSTOMIZED type.

  • The HA syntax and the RESTful API are supported. SQL is not supported.