Multimodal embedding is a service trained on the Qwen2-VL multimodal large language model (MLLM). It supports single-modal and multimodal input combinations, efficiently processing text, images, and combined data types.
Service | service_id | Dimension | Service description | QPS limit for API calls (Alibaba Cloud account and RAM users) |
M2-Encoder-Multimodal Vector Model | ops-m2-encoder | 768 dimensions | The Chinese-English bilingual multimodal service trained on 6 billion image-text pairs (including 3 billion Chinese data records and 3 billion English data records) based on BM-6B. This model supports cross-modal retrieval between text and images (including text searching for images and image searching for text), along with image classification tasks. Note Text and images cannot be entered in the same doc. | 10 Note To apply for higher QPS, submit a ticket. |
M2-Encoder-Large-Multimodal Vector Model | ops-m2-encoder-large | 1024 dimensions | The Chinese-English bilingual multimodal service. Compared with the m2-encoder model, this model has a large number (1 billion parameters) of parameters, providing stronger expression capabilities and higher performance in multimodal task processing. Note Text and images cannot be entered in the same doc. | |
GME Multimodal Vector-Qwen2-VL-2B | ops-gme-qwen2-vl-2b-instruct | 1536 dimensions | The multimodal embedding service trained based on the Qwen2-VL MLLM. It supports single-modal and multimodal input combinations, efficiently processing text, images, and combined data types. |
Prerequisites
The authentication information is obtained.
When you call an AI Search Open Platform service by using an API, you need to authenticate the caller's identity.
The service access address is obtained.
You can call a service over the Internet or a virtual private cloud (VPC). For more information, see Get service registration address.
Request description
Common description
The request body cannot exceed 8 MB in size.
Request method
POST
URL
{host}/v3/openapi/workspaces/{workspace_name}/multi-modal-embedding/{service_id} host: the address for calling the service. You can call the API service over the Internet or a VPC. For more information, see Obtain a service endpoint.

workspace_name: the name of the workspace, such as default.
service_id: the ID of the built-in service, such as ops-m2-encoder.
Request parameters
Header parameters
API key authentication
Parameter | Type | Required | Description | Example |
Content-Type | String | Yes | Request type: application/json | application/json |
Authorization | String | Yes | API key | Bearer OS-d1**2a |
Body parameters
Parameter | Type | Required | Description | Example |
input | List[ContentObject] | Yes | Supports multiple inputs with a maximum of 32 entries per request. | |
ContentObject
Parameter | Type | Required | Description | Example |
text | String | No | The text. | |
image | String | No | The image information, supporting the URL or Base64-encoded information.
| or |
Response parameters
Parameter | Type | Description | Example |
result.embeddings | List | The output of the request. This parameter is an array. Each element of the array corresponds to a segment of input text. | |
result.embeddings[].index | Int | The sequence number of the corresponding request text in the input. | 0 |
result.embeddings[].embedding | List[Double] | The vectorization result. | [0.003143,0.009750,omitted,-0.017395] |
Curl request example
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer Your API key" \
"http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/multi-modal-embedding/ops-m2-encoder" \
-d '{
"input":[
{
"image":"http://***/a.jpg"
}
]
}'Response examples
Sample success response
{
"request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
"latency": 38,
"usage": {
"image":1,
"token_count":28
},
"result": {
"embeddings": [
{
"index": 0,
"embedding": [
-0.033447265625,
0.10577392578125,
-0.0015211105346679688,
-0.044189453125,
...
0.004688262939453125,
-4.5239925384521484E-5
]
}
]
}
}Sample error response
If a request error occurs, the code and message fields in the output result will describe the error cause.
{
"request_id": "651B3087-8A07-****-B931-9C4E7B60F52D",
"latency": 0,
"code": "InvalidParameter",
"message": "JSON parse error: Cannot deserialize value of type `InputType` from String \"xxx\""
}Status codes
For more information, see Status codes of AI Search Open Platform.