All Products
Search
Document Center

OpenSearch:Multimodal embedding

Last Updated:Feb 28, 2026

The multimodal embedding API converts text, images, or a combination of both into dense vector representations. Use the resulting embeddings for cross-modal retrieval (text-to-image, image-to-text) and similarity search.

This service is trained on the Qwen2-VL multimodal large language model (MLLM) and supports both single-modal and multimodal input combinations.

Available models

ModelService IDDimensionsLanguageDescription
M2-Encoder-Multimodal Vector Modelops-m2-encoder768Chinese-English bilingualTrained on 6 billion image-text pairs (3 billion Chinese, 3 billion English) based on BM-6B. Supports cross-modal retrieval between text and images, and image classification.
M2-Encoder-Large-Multimodal Vector Modelops-m2-encoder-large1024Chinese-English bilingualLarger model with 1 billion parameters. Provides stronger expression capabilities and higher performance in multimodal tasks compared to ops-m2-encoder.
GME Multimodal Vector-Qwen2-VL-2Bops-gme-qwen2-vl-2b-instruct1536-Trained on the Qwen2-VL MLLM. Supports single-modal and multimodal input combinations, processing text, images, and combined data types.
Note

For ops-m2-encoder and ops-m2-encoder-large, text and images cannot be entered in the same doc.

Rate limits

The queries per second (QPS) limits apply per Alibaba Cloud account, including all RAM users under that account.

ModelQPS
ops-m2-encoder10

To request a higher QPS limit, submit a ticket.

Prerequisites

Before you begin, make sure you have:

API reference

Endpoint

POST {host}/v3/openapi/workspaces/{workspace_name}/multi-modal-embedding/{service_id}

Replace the path parameters with actual values:

ParameterDescriptionExample
hostService endpoint. Supports Internet and VPC access.http://ops-cn-hangzhou.opensearch.aliyuncs.com
workspace_nameWorkspace name.default
service_idID of the embedding model.ops-m2-encoder

Request

The request body must not exceed 8 MB.

Headers

ParameterTypeRequiredDescriptionExample
Content-TypeStringYesRequest content type.application/json
AuthorizationStringYesAPI key for authentication.Bearer OS-d1**2a

Body

ParameterTypeRequiredDescription
inputList<ContentObject>YesList of inputs. Maximum 32 entries per request.

ContentObject fields:

FieldTypeRequiredDescription
textStringNoText to embed.
imageStringNoImage to embed. Accepts a URL or Base64-encoded data.

For ops-m2-encoder and ops-m2-encoder-large, each ContentObject must contain either a text field or an image field. Providing both fields causes the image to be ignored.

For ops-gme-qwen2-vl-2b-instruct, a ContentObject can contain both text and image fields for combined multimodal embedding.

Image input formats:

  • URL -- Must be accessible.

      {
        "image": "http://example.com/photo.jpg"
      }
  • Base64 -- Use the format data:image/{format};base64,{base64_image}, where {format} is the actual image type (e.g., jpeg, png) and {base64_image} is the encoded data.

      {
        "image": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAoHCB..."
      }

Response

A successful response includes the following fields:

FieldTypeDescription
request_idStringUnique request identifier.
latencyIntProcessing time in milliseconds.
usage.imageIntNumber of images processed.
usage.token_countIntNumber of tokens processed.
result.embeddingsListArray of embedding results. Each element corresponds to one input entry.
result.embeddings[].indexIntPosition of the input in the request array (zero-based).
result.embeddings[].embeddingList<Double>The embedding vector.

Error responses include code and message fields describing the error:

FieldTypeDescription
request_idStringUnique request identifier.
latencyInt0 for error responses.
codeStringError code.
messageStringError description.

For a full list of error codes, see Status codes.

Examples

Generate an image embedding

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  "http://<your-endpoint>/v3/openapi/workspaces/default/multi-modal-embedding/ops-m2-encoder" \
  -d '{
    "input": [
      {
        "image": "http://example.com/photo.jpg"
      }
    ]
  }'

Replace the following placeholders with actual values:

PlaceholderDescriptionExample
<your-api-key>API key for authenticationOS-d1xxxxx2a
<your-endpoint>Service endpointops-cn-hangzhou.opensearch.aliyuncs.com

Sample success response

{
    "request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
    "latency": 38,
    "usage": {
        "image": 1,
        "token_count": 28
    },
    "result": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                    -0.033447265625,
                    0.10577392578125,
                    -0.0015211105346679688,
                    -0.044189453125,
                    "...",
                    0.004688262939453125,
                    -4.5239925384521484E-5
                ]
            }
        ]
    }
}

Sample error response

{
    "request_id": "651B3087-8A07-****-B931-9C4E7B60F52D",
    "latency": 0,
    "code": "InvalidParameter",
    "message": "JSON parse error: Cannot deserialize value of type `InputType` from String \"xxx\""
}