OpenSearch Retrieval Engine Edition handles image vectorization internally — upload your images, configure the schema, and the service converts them to vectors automatically. This guide walks through building an end-to-end image search engine that supports both text-to-image and image-to-image queries.
Constraints
Review these constraints before you start:
| Constraint | Details |
|---|---|
| Vector index type | Must be CUSTOMIZED |
| Vector dimensions | Fixed at 512 — cannot be changed |
| Image field type | Must be STRING (both OSS path and Base64-encoded image fields) |
| Supported query syntax | HA syntax and RESTful API |
| Unsupported query syntax | SQL is not supported |
For low-latency retrieval, consider the mmap index loading strategy.
Choose an architecture
Three architecture patterns are available:
| Pattern | How images are stored | Best for |
|---|---|---|
| OSS + MaxCompute + OpenSearch | OSS paths (e.g., /image/1.jpg) stored in MaxCompute | Large image datasets already in OSS |
| MaxCompute + OpenSearch | Base64-encoded images stored in MaxCompute | Moderate datasets without an OSS setup |
| API + OpenSearch | Base64-encoded images pushed via data push API | Real-time or streaming ingestion |
This guide uses the OSS + MaxCompute + OpenSearch pattern.
Prerequisites
Before you begin, make sure you have:
An Alibaba Cloud account with an AccessKey ID and AccessKey Secret. For RAM user access, see RAM user creation and authorization.
An Object Storage Service (OSS) bucket with images uploaded: This guide uses 1,000 images uploaded to
/test/images/in a bucket namedtest-image-vector.An OpenSearch Retrieval Engine Edition instance. See Purchase an OpenSearch Retrieval Engine Edition instance.
Step 1: Configure tables
A newly purchased instance shows a status of Pending Configuration. An empty cluster matching your purchased node count and specifications is automatically deployed. Complete the following configuration to enable search.
Configure table basic information
Set the Table Name, Number of Shards, and Number of Data Update Resources.
The default number of free data update resources is 2. Resources beyond the default are charged based onn - 2, wherenis the total number of data update resources for a single table.
Configure data synchronization
Add a full data source. This guide uses MaxCompute:
Click Add Data Source and select MaxCompute as the data source type.
Fill in the project, accessKeyId, accessKeySecret, Table, and partition key fields.
(Optional) Enable Automatic Index Rebuild.
Other available data source types: MaxCompute data source, API push data source, and Object Storage Service (OSS).
Configure the index schema
After the data source is connected, field mappings from MaxCompute are auto-populated. Configure three fields:
Choose a vectorization model
Two models are available. Select your model before proceeding — it cannot be changed without rebuilding the index.
| Model | Use case |
|---|---|
clip | General image vectorization (recommended for most use cases) |
clip_ecom | E-commerce product image vectorization |
Field 1: Primary key
Set the field type to STRING or integer, and mark it as the primary key.
Field 2: vector_source_image
This field stores the OSS image path (e.g., /test/images/10031.png). Set the field type to STRING with the following advanced configuration:
{
"content_type": "oss",
"oss_endpoint": "oss-cn-hangzhou-internal.aliyuncs.com",
"oss_bucket": "test-image-vector",
"crop": "true",
"oss_use_slr": "true",
"uid": "<your-alibaba-cloud-uid>"
}| Parameter | Description |
|---|---|
content_type | Fixed as oss for OSS image sources |
oss_endpoint | The internal endpoint for your OSS bucket's region |
oss_bucket | The OSS bucket name containing your images |
crop | Must be "true" (string) when vectorizing images from OSS |
oss_use_slr | Must be "true" (string) to use a service-linked role for OSS access |
uid | Your Alibaba Cloud account UID |
Field 3: vector
This field stores the generated vector. Set the field type to FLOAT and enable multi-value. Advanced configuration:
{
"vector_model": "clip",
"vector_modal": "image",
"vector_source_field": "vector_source_image"
}| Parameter | Description |
|---|---|
vector_model | Vectorization model: clip or clip_ecom |
vector_modal | Fixed as image |
vector_source_field | The name of the field storing the image path; here, vector_source_image |
Index settings
Configure two indexes:
Primary key index
Vector index — set to CUSTOMIZED type with 512 dimensions (fixed)
Example schema
"fields": [
{
"field_name": "id",
"field_type": "INT64",
"compress_type": "equal"
},
{
"user_defined_param": {
"oss_endpoint": "oss-cn-hangzhou-internal.aliyuncs.com",
"oss_bucket": "/opensearch",
"crop": "true",
"content_type": "oss",
"oss_use_slr": "true",
"uid": "xxx"
},
"field_name": "source_image",
"field_type": "STRING",
"compress_type": "uniq"
},
{
"field_name": "cate_id",
"field_type": "INT64",
"compress_type": "equal"
},
{
"user_defined_param": {
"vector_model": "clip",
"vector_modal": "image",
"vector_source_field": "vector_source_image"
},
"field_name": "vector",
"field_type": "FLOAT",
"multi_value": true
}
]Step 2: Rebuild the index
Click Confirm to create the configuration. Monitor progress in Function Extension > Change History. Once complete, the instance is ready for queries.
Step 3: Run search queries
Query syntax
All queries use the following HA syntax pattern:
query=image_index:'<search-content>&modal=<text|image>&n=<top-n>&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANK| Parameter | Description |
|---|---|
modal | Search mode: text for text-to-image search, image for image-to-image search |
n | Number of top results to return from the vector search |
Search by text
Run the following HA query on the query test page:
vector:'motorcycle helmet&modal=text&n=10&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANKThis returns the top 10 images matching the query. In this example, the result includes 2042.png in OSS.
If the search text contains special characters (for example,&), encode the entire string as Base64 before submitting. For example,motorcycle&helmetencodes to5pGp5omY6L2mJuWktOeblA==.
Search by image
The console query test page does not support image search because Base64-encoded images exceed the input length limit. Use the SDK instead.
Query syntax for image search:
vector:'<base64-encoded-image>&modal=image&n=10&search_params={}'&&kvpairs=formula:proxima_score(vector)&&sort=+RANKTo convert a local image to a Base64 string in Python:
import base64
def image_to_base64(file_path):
with open(file_path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")
base64_image = image_to_base64("/path/to/your/image.png")Step 4: Search with the SDK
Install the SDK:
pip install alibabacloud-ha3engineThe following example submits a text-based image search request:
# -*- coding: utf-8 -*-
from alibabacloud_ha3engine import models, client
from alibabacloud_tea_util import models as util_models
from Tea.exceptions import TeaException, RetryError
def search():
config = models.Config(
endpoint="<api-endpoint>", # From the API entry section on the instance details page
instance_id="",
protocol="http",
access_user_name="<username>", # Set when purchasing the instance
access_pass_word="<password>" # Set when purchasing the instance
)
# Increase timeout values for long-running requests (in milliseconds)
runtime = util_models.RuntimeOptions(
connect_timeout=5000,
read_timeout=10000,
autoretry=False,
ignore_ssl=False,
max_idle_conns=50
)
ha3_client = client.Client(config)
try:
query_str = (
"config=hit:4,format:json,fetch_summary_type:pk,qrs_chain:search"
"&&query=image_index:'motorcycle helmet&modal=text&n=10&search_params={}'"
"&&cluster=general"
)
search_query = models.SearchQuery(query=query_str)
request = models.SearchRequestModel({}, search_query)
response = ha3_client.search(request)
print(response)
except TeaException as e:
print(f"Request failed with TeaException: {e}")
except RetryError as e:
print(f"Request failed with connection error: {e}")Replace the following placeholders:
| Placeholder | Description |
|---|---|
<api-endpoint> | The API domain name from the API entry section on the instance details page |
<username> | The username set when purchasing the instance |
<password> | The password set when purchasing the instance |
For more SDK examples, see the Developer Guide.