All Products
Search
Document Center

OpenSearch:Multimodal embedding

Last Updated:Nov 12, 2025

Multimodal embedding is a service trained on the Qwen2-VL multimodal large language model (MLLM). It supports single-modal and multimodal input combinations, efficiently processing text, images, and combined data types.

Service

service_id

Dimension

Service description

QPS limit for API calls (Alibaba Cloud account and RAM users)

M2-Encoder-Multimodal Vector Model

ops-m2-encoder

768 dimensions

The Chinese-English bilingual multimodal service trained on 6 billion image-text pairs (including 3 billion Chinese data records and 3 billion English data records) based on BM-6B. This model supports cross-modal retrieval between text and images (including text searching for images and image searching for text), along with image classification tasks.

Note

Text and images cannot be entered in the same doc.

10

Note

To apply for higher QPS, submit a ticket.

M2-Encoder-Large-Multimodal Vector Model

ops-m2-encoder-large

1024 dimensions

The Chinese-English bilingual multimodal service. Compared with the m2-encoder model, this model has a large number (1 billion parameters) of parameters, providing stronger expression capabilities and higher performance in multimodal task processing.

Note

Text and images cannot be entered in the same doc.

GME Multimodal Vector-Qwen2-VL-2B

ops-gme-qwen2-vl-2b-instruct

1536 dimensions

The multimodal embedding service trained based on the Qwen2-VL MLLM. It supports single-modal and multimodal input combinations, efficiently processing text, images, and combined data types.

Prerequisites

  • The authentication information is obtained.

    When you call an AI Search Open Platform service by using an API, you need to authenticate the caller's identity.

  • The service access address is obtained.

    You can call a service over the Internet or a virtual private cloud (VPC). For more information, see Get service registration address.

Request description

Common description

The request body cannot exceed 8 MB in size.

Request method

POST

URL

{host}/v3/openapi/workspaces/{workspace_name}/multi-modal-embedding/{service_id} 

  • host: the address for calling the service. You can call the API service over the Internet or a VPC. For more information, see Obtain a service endpoint.

    AI apikey截图.png

  • workspace_name: the name of the workspace, such as default.

  • service_id: the ID of the built-in service, such as ops-m2-encoder.

Request parameters

Header parameters

API key authentication

Parameter

Type

Required

Description

Example

Content-Type

String

Yes

Request type: application/json

application/json

Authorization

String

Yes

API key

Bearer OS-d1**2a

Body parameters

Parameter

Type

Required

Description

Example

input

List[ContentObject]

Yes

Supports multiple inputs with a maximum of 32 entries per request.

[
  {
    "text":"Science and technology are the primary productive forces"
  },
  {
    "image":"http://***/a.jpg"
  }
]

ContentObject

Parameter

Type

Required

Description

Example

text

String

No

The text.

{
  "text":"Text input"
}

image

String

No

The image information, supporting the URL or Base64-encoded information.

  • If an image uses a URL, the URL must be accessible.

  • If an image uses Base64-encoded information, pass the Base64-encoded information to the image parameter in the data:image/{format};base64,{base64_image} format.

    image/{format}: the format of the local image. Use the actual image format. For example, if the image is in the JPG format, set it to image/jpeg.

    base64_image: the Base64 data of the image.

{
  "image":"http://xxxxx/a.jpg"
}

or

{
  "image":"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAoHCB..."
}

Response parameters

Parameter

Type

Description

Example

result.embeddings

List

The output of the request. This parameter is an array. Each element of the array corresponds to a segment of input text.

[
    {
      "index": 0,
      "embedding": [0.003143,0.009750,omitted,-0.017395]
    },
    {}
]

result.embeddings[].index

Int

The sequence number of the corresponding request text in the input.

0

result.embeddings[].embedding

List[Double]

The vectorization result.

[0.003143,0.009750,omitted,-0.017395]

Curl request example

curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer Your API key" \
"http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/multi-modal-embedding/ops-m2-encoder" \
-d '{
"input":[
  {
    "image":"http://***/a.jpg"
  }
]
}'

Response examples

Sample success response

{
    "request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
    "latency": 38,
    "usage": {
        "image":1,
        "token_count":28
    },
    "result": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                   -0.033447265625,
                   0.10577392578125,
                   -0.0015211105346679688,
                   -0.044189453125,
                    ...
                   0.004688262939453125,
                   -4.5239925384521484E-5
                ]
            }
        ]
    }
}

Sample error response

If a request error occurs, the code and message fields in the output result will describe the error cause.

{
    "request_id": "651B3087-8A07-****-B931-9C4E7B60F52D",
    "latency": 0,
    "code": "InvalidParameter",
    "message": "JSON parse error: Cannot deserialize value of type `InputType` from String \"xxx\""
}

Status codes

For more information, see Status codes of AI Search Open Platform.