All Products
Search
Document Center

OpenSearch:Best practices for vectorized image retrieval

Last Updated:Apr 01, 2026

OpenSearch Retrieval Engine Edition handles image vectorization internally — upload your images, configure the schema, and the service converts them to vectors automatically. This guide walks through building an end-to-end image search engine that supports both text-to-image and image-to-image queries.

Constraints

Review these constraints before you start:

ConstraintDetails
Vector index typeMust be CUSTOMIZED
Vector dimensionsFixed at 512 — cannot be changed
Image field typeMust be STRING (both OSS path and Base64-encoded image fields)
Supported query syntaxHA syntax and RESTful API
Unsupported query syntaxSQL is not supported

For low-latency retrieval, consider the mmap index loading strategy.

Choose an architecture

Three architecture patterns are available:

PatternHow images are storedBest for
OSS + MaxCompute + OpenSearchOSS paths (e.g., /image/1.jpg) stored in MaxComputeLarge image datasets already in OSS
MaxCompute + OpenSearchBase64-encoded images stored in MaxComputeModerate datasets without an OSS setup
API + OpenSearchBase64-encoded images pushed via data push APIReal-time or streaming ingestion

This guide uses the OSS + MaxCompute + OpenSearch pattern.

Prerequisites

Before you begin, make sure you have:

Step 1: Configure tables

A newly purchased instance shows a status of Pending Configuration. An empty cluster matching your purchased node count and specifications is automatically deployed. Complete the following configuration to enable search.

Configure table basic information

Set the Table Name, Number of Shards, and Number of Data Update Resources.

The default number of free data update resources is 2. Resources beyond the default are charged based on n - 2, where n is the total number of data update resources for a single table.

Configure data synchronization

Add a full data source. This guide uses MaxCompute:

  1. Click Add Data Source and select MaxCompute as the data source type.

  2. Fill in the project, accessKeyId, accessKeySecret, Table, and partition key fields.

  3. (Optional) Enable Automatic Index Rebuild.

Other available data source types: MaxCompute data source, API push data source, and Object Storage Service (OSS).

Configure the index schema

After the data source is connected, field mappings from MaxCompute are auto-populated. Configure three fields:

Choose a vectorization model

Two models are available. Select your model before proceeding — it cannot be changed without rebuilding the index.

ModelUse case
clipGeneral image vectorization (recommended for most use cases)
clip_ecomE-commerce product image vectorization

Field 1: Primary key

Set the field type to STRING or integer, and mark it as the primary key.

Field 2: vector_source_image

This field stores the OSS image path (e.g., /test/images/10031.png). Set the field type to STRING with the following advanced configuration:

{
  "content_type": "oss",
  "oss_endpoint": "oss-cn-hangzhou-internal.aliyuncs.com",
  "oss_bucket": "test-image-vector",
  "crop": "true",
  "oss_use_slr": "true",
  "uid": "<your-alibaba-cloud-uid>"
}
ParameterDescription
content_typeFixed as oss for OSS image sources
oss_endpointThe internal endpoint for your OSS bucket's region
oss_bucketThe OSS bucket name containing your images
cropMust be "true" (string) when vectorizing images from OSS
oss_use_slrMust be "true" (string) to use a service-linked role for OSS access
uidYour Alibaba Cloud account UID

Field 3: vector

This field stores the generated vector. Set the field type to FLOAT and enable multi-value. Advanced configuration:

{
  "vector_model": "clip",
  "vector_modal": "image",
  "vector_source_field": "vector_source_image"
}
ParameterDescription
vector_modelVectorization model: clip or clip_ecom
vector_modalFixed as image
vector_source_fieldThe name of the field storing the image path; here, vector_source_image

Index settings

Configure two indexes:

  • Primary key index

  • Vector index — set to CUSTOMIZED type with 512 dimensions (fixed)

Example schema

"fields": [
  {
    "field_name": "id",
    "field_type": "INT64",
    "compress_type": "equal"
  },
  {
    "user_defined_param": {
      "oss_endpoint": "oss-cn-hangzhou-internal.aliyuncs.com",
      "oss_bucket": "/opensearch",
      "crop": "true",
      "content_type": "oss",
      "oss_use_slr": "true",
      "uid": "xxx"
    },
    "field_name": "source_image",
    "field_type": "STRING",
    "compress_type": "uniq"
  },
  {
    "field_name": "cate_id",
    "field_type": "INT64",
    "compress_type": "equal"
  },
  {
    "user_defined_param": {
      "vector_model": "clip",
      "vector_modal": "image",
      "vector_source_field": "vector_source_image"
    },
    "field_name": "vector",
    "field_type": "FLOAT",
    "multi_value": true
  }
]

Step 2: Rebuild the index

Click Confirm to create the configuration. Monitor progress in Function Extension > Change History. Once complete, the instance is ready for queries.

Step 3: Run search queries

Query syntax

All queries use the following HA syntax pattern:

query=image_index:'<search-content>&modal=<text|image>&n=<top-n>&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANK
ParameterDescription
modalSearch mode: text for text-to-image search, image for image-to-image search
nNumber of top results to return from the vector search

Search by text

Run the following HA query on the query test page:

vector:'motorcycle helmet&modal=text&n=10&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANK

This returns the top 10 images matching the query. In this example, the result includes 2042.png in OSS.

If the search text contains special characters (for example, &), encode the entire string as Base64 before submitting. For example, motorcycle&helmet encodes to 5pGp5omY6L2mJuWktOeblA==.

Search by image

The console query test page does not support image search because Base64-encoded images exceed the input length limit. Use the SDK instead.

Query syntax for image search:

vector:'<base64-encoded-image>&modal=image&n=10&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANK

To convert a local image to a Base64 string in Python:

import base64

def image_to_base64(file_path):
    with open(file_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

base64_image = image_to_base64("/path/to/your/image.png")

Step 4: Search with the SDK

Install the SDK:

pip install alibabacloud-ha3engine

The following example submits a text-based image search request:

# -*- coding: utf-8 -*-

from alibabacloud_ha3engine import models, client
from alibabacloud_tea_util import models as util_models
from Tea.exceptions import TeaException, RetryError

def search():
    config = models.Config(
        endpoint="<api-endpoint>",        # From the API entry section on the instance details page
        instance_id="",
        protocol="http",
        access_user_name="<username>",    # Set when purchasing the instance
        access_pass_word="<password>"     # Set when purchasing the instance
    )

    # Increase timeout values for long-running requests (in milliseconds)
    runtime = util_models.RuntimeOptions(
        connect_timeout=5000,
        read_timeout=10000,
        autoretry=False,
        ignore_ssl=False,
        max_idle_conns=50
    )

    ha3_client = client.Client(config)

    try:
        query_str = (
            "config=hit:4,format:json,fetch_summary_type:pk,qrs_chain:search"
            "&&query=image_index:'motorcycle helmet&modal=text&n=10&search_params={}'"
            "&&cluster=general"
        )
        search_query = models.SearchQuery(query=query_str)
        request = models.SearchRequestModel({}, search_query)
        response = ha3_client.search(request)
        print(response)
    except TeaException as e:
        print(f"Request failed with TeaException: {e}")
    except RetryError as e:
        print(f"Request failed with connection error: {e}")

Replace the following placeholders:

PlaceholderDescription
<api-endpoint>The API domain name from the API entry section on the instance details page
<username>The username set when purchasing the instance
<password>The password set when purchasing the instance

For more SDK examples, see the Developer Guide.

What's next