Multimodal embedding API - OpenSearch - Alibaba Cloud Documentation Center

Multimodal embedding is a service trained on the Qwen2-VL multimodal large language model (MLLM). It supports single-modal and multimodal input combinations, efficiently processing text, images, and combined data types.

Service	service_id	Dimension	Service description	QPS limit for API calls (Alibaba Cloud account and RAM users)
M2-Encoder-Multimodal Vector Model	ops-m2-encoder	768 dimensions	The Chinese-English bilingual multimodal service trained on 6 billion image-text pairs (including 3 billion Chinese data records and 3 billion English data records) based on BM-6B. This model supports cross-modal retrieval between text and images (including text searching for images and image searching for text), along with image classification tasks. Note Text and images cannot be entered in the same doc.	10 Note To apply for higher QPS, submit a ticket.
M2-Encoder-Large-Multimodal Vector Model	ops-m2-encoder-large	1024 dimensions	The Chinese-English bilingual multimodal service. Compared with the m2-encoder model, this model has a large number (1 billion parameters) of parameters, providing stronger expression capabilities and higher performance in multimodal task processing. Note Text and images cannot be entered in the same doc.
GME Multimodal Vector-Qwen2-VL-2B	ops-gme-qwen2-vl-2b-instruct	1536 dimensions	The multimodal embedding service trained based on the Qwen2-VL MLLM. It supports single-modal and multimodal input combinations, efficiently processing text, images, and combined data types.

Prerequisites

The authentication information is obtained.
When you call an AI Search Open Platform service by using an API, you need to authenticate the caller's identity.
The service access address is obtained.
You can call a service over the Internet or a virtual private cloud (VPC). For more information, see Get service registration address.

Request description

Common description

The request body cannot exceed 8 MB in size.

Request method

POST

URL

{host}/v3/openapi/workspaces/{workspace_name}/multi-modal-embedding/{service_id}

host: the address for calling the service. You can call the API service over the Internet or a VPC. For more information, see Obtain a service endpoint.
workspace_name: the name of the workspace, such as default.
service_id: the ID of the built-in service, such as ops-m2-encoder.

Request parameters

Header parameters

API key authentication

Parameter	Type	Required	Description	Example
Content-Type	String	Yes	Request type: application/json	application/json
Authorization	String	Yes	API key	Bearer OS-d1**2a

Body parameters

Parameter	Type	Required	Description	Example
input	List[ContentObject]	Yes	Supports multiple inputs with a maximum of 32 entries per request.	`[ { "text":"Science and technology are the primary productive forces" }, { "image":"http://***/a.jpg" } ]`

ContentObject

Parameter

Type

Required

Description

Example

text

String

The text.

{
  "text":"Text input"
}

image

String

The image information, supporting the URL or Base64-encoded information.

If an image uses a URL, the URL must be accessible.
If an image uses Base64-encoded information, pass the Base64-encoded information to the image parameter in the data:image/{format};base64,{base64_image} format.
image/{format}: the format of the local image. Use the actual image format. For example, if the image is in the JPG format, set it to image/jpeg.
base64_image: the Base64 data of the image.

{
  "image":"http://xxxxx/a.jpg"
}

{
  "image":"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAoHCB..."
}

Response parameters

Parameter	Type	Description	Example
result.embeddings	List	The output of the request. This parameter is an array. Each element of the array corresponds to a segment of input text.	`[ { "index": 0, "embedding": [0.003143,0.009750,omitted,-0.017395] }, {} ]`
result.embeddings[].index	Int	The sequence number of the corresponding request text in the input.	0
result.embeddings[].embedding	List[Double]	The vectorization result.	[0.003143,0.009750,omitted,-0.017395]

Curl request example

curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer Your API key" \
"http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/multi-modal-embedding/ops-m2-encoder" \
-d '{
"input":[
  {
    "image":"http://***/a.jpg"
  }
]
}'

Response examples

Sample success response

{
    "request_id": "B4AB89C8-B135-****-A6F8-2BAB801A2CE4",
    "latency": 38,
    "usage": {
        "image":1,
        "token_count":28
    },
    "result": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                   -0.033447265625,
                   0.10577392578125,
                   -0.0015211105346679688,
                   -0.044189453125,
                    ...
                   0.004688262939453125,
                   -4.5239925384521484E-5
                ]
            }
        ]
    }
}

Sample error response

If a request error occurs, the code and message fields in the output result will describe the error cause.

{
    "request_id": "651B3087-8A07-****-B931-9C4E7B60F52D",
    "latency": 0,
    "code": "InvalidParameter",
    "message": "JSON parse error: Cannot deserialize value of type `InputType` from String \"xxx\""
}

Status codes

For more information, see Status codes of AI Search Open Platform.