General rerank model API usage details - Alibaba Cloud Model Studio

To ensure efficiency, retrieval systems may return results that are not sufficiently precise during the initial retrieval phase. A rerank model performs a more accurate sorting of the retrieved documents to ensure the most relevant results appear at the top.

Model overview

Singapore

Model

Max number of documents

Max input tokens per item

Max input tokens per request

Supported languages

Price (per 1M tokens)

Free quota

Scenarios

qwen3-rerank

500

4,000

120,000

Over 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

$0.1

1 million tokens

Valid for 90 days after activating Model Studio

Text semantic retrieval
RAG applications

Beijing

Model

Max number of documents

Max input tokens per item

Max input tokens per request

Supported languages

Price (per 1M tokens)

Free quota

Scenarios

qwen3-vl-rerank

100

8,000

800,000

33 major languages, such as Chinese, English, Japanese, Korean, French, and German

Image: $0.258

Text: $0.1

No free quota

Image clustering
Cross-modal search
Image retrieval
Video retrieval

gte-rerank-v2

500

4,000

30,000

Over 50 languages, such as Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic

$0.115

Text semantic retrieval
RAG applications

Max input tokens per item: The maximum number of tokens allowed for each query or document. If the input exceeds this limit, it is truncated. The API computes results based on the truncated content, which may lead to inaccurate ranking.
Max number of documents: The maximum number of documents permitted in a single request.
Max input tokens per request: Calculated using the formula Query Tokens × Number of documents + Total document tokens. This total must not exceed the maximum input tokens allowed per request.

Input limitations

Model	Image	Video
qwen3-vl-rerank	JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, and SGI (URL or Base64 supported)	MP4, AVI, and MOV (URL only)

Prerequisites

Get an API key and set the API key as an environment variable. To use the SDK, install the DashScope SDK.

HTTP

POST https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank

Request	qwen3-rerank curl --request POST \ --url https://dashscope-intl.aliyuncs.com/compatible-api/v1/reranks \ --header "Authorization: Bearer $DASHSCOPE_API_KEY" \ --header "Content-Type: application/json" \ --data '{ "model": "qwen3-rerank", "documents": [ "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance.", "Quantum computing is a cutting-edge field of computer science.", "The development of pre-trained language models has brought new advancements to rerank models." ], "query": "What is a rerank model?", "top_n": 2, "instruct": "Given a web search query, retrieve relevant passages that answer the query." }' qwen3-vl-rerank curl --location 'https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank' \ --header "Authorization: Bearer $DASHSCOPE_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen3-vl-rerank", "input":{ "query": "What is a rerank model?", "documents": [ {"text": "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance."}, {"image": "https://img.alicdn.com/imgextra/i3/O1CN01rdstgY1uiZWt8gqSL_!!6000000006071-0-tps-1970-356.jpg"}, {"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"} ] }, "parameters": { "return_documents": true, "top_n": 2, "fps": 1.0 } }' gte-rerank-v2 curl --location 'https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank' \ --header "Authorization: Bearer $DASHSCOPE_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "model": "gte-rerank-v2", "input":{ "query": "What is a rerank model?", "documents": [ "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance.", "Quantum computing is a cutting-edge field of computer science.", "The development of pre-trained language models has brought new advancements to rerank models." ] }, "parameters": { "return_documents": true, "top_n": 2 } }'
Request headers
Content-Type `string` (Required) The content type of the request. Must be `application/json`.
Authorization `string` (Required) The authentication credentials using a Model Studio API key. Example: `Bearer sk-xxxx`
Request body
model `string` (Required) The model name. Supported models include qwen3-rerank, gte-rerank-v2, and qwen3-vl-rerank.
input `object` (Required) The input content. When you use `qwen3-rerank`, the `input` object is not required. In this case, the `query` and documents must be at the same level as the `model` parameter. Properties query `string` (Required) The query text. The maximum length is 4,000 tokens. documents `array` (Required) A list of candidate documents to sort. Each element is a string. When you use the `qwen3-vl-embedding` model, each element is a dictionary or string that specifies the content type and value. The format is {"modality type": "an input string, or an image or video URL"}. The supported modality types are `text`, `image`, and `video`. `text`: The value is a string. You can also pass the string directly without using a dictionary. `image`: The value can be a publicly accessible URL or a Base64-encoded Data URI. The Base64 format is `data:image/{format};base64,{data}`, where `{format}` is the image format, such as `jpeg` or `png`, and `{data}` is the Base64-encoded string. `video`: The value must be a publicly accessible URL.
parameters object (Optional) Optional parameters. When you use `qwen3-rerank`, the `parameters` object is not required. In this case, the `top_n` and `instruct` parameters must be at the same level as the `model` parameter. Properties top_n `int` (Optional) The number of top-ranked documents to return. By default, all documents are returned. If the specified value exceeds the total number of documents, all documents are returned. return_documents `bool` (Optional) Specifies whether to return the original text of the documents in the sorting results. The default value is `false` to reduce network overhead. Supported models: `gte-rerank-v2` and `qwen3-vl-rerank`. instruct `string` (Optional) A custom instruction for the sorting task. This parameter applies only when you use `qwen3-rerank` or `qwen3-vl-rerank`. You can use this parameter to guide the model to apply different sorting policies. Examples: Q&A retrieval task (default): `"Given a web search query, retrieve relevant passages that answer the query."` Focus: Find answers to questions. The model prioritizes evaluating whether a document answers the question in the query. Example: For the query "How to prevent a cold?", the document "Washing hands frequently is an effective way to prevent colds" receives a high score. The document "A cold is a common illness", although topically relevant, receives a significantly lower score because it does not provide an answer. Semantic similarity sorting task: `"Retrieve semantically similar text."` Focus: Determine semantic equivalence. The model evaluates whether the core meanings of the query and the document are consistent, regardless of specific wording or sentence structure. Example: In a frequently asked questions (FAQ) scenario, the user query "How do I change my password?" and the candidate question "What if I forget my password?" are semantically similar and should receive a high score. The model focuses on whether both reflect the same user intent. Write the instruction in English. If you do not specify this parameter, the model performs a Q&A retrieval task by default. For more task instructions, see the examples in the model repository. fps `float` (Optional) This parameter is supported only by `qwen3-vl-rerank`. It controls the number of frames extracted from a video. A smaller value indicates fewer frames are extracted. The value ranges from 0 to 1. The default value is 1.0.

Response	Successful response { "output": { "results": [ { "document": { "text": "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance." }, "index": 0, "relevance_score": 0.9334521178273196 }, { "document": { "text": "The development of pre-trained language models has brought new advancements to rerank models." }, "index": 2, "relevance_score": 0.34100082626411193 } ] }, "usage": { "total_tokens": 79 }, "request_id": "85ba5752-1900-47d2-8896-23f99b13f6e1" } Failed response If a request fails, the `code` and `message` fields in the response indicate the cause of the error. `{ "code":"InvalidApiKey", "message":"Invalid API-key provided.", "request_id":"fb53c4ec-1c12-4fc4-a580-cdb7c3261fc1" }`
request_id `string` Unique identifier for the request. Use for tracing and troubleshooting issues.
output `object` The task output. Properties results `array` A list of sorting results, sorted by `relevance_score` in descending order. Properties document `dict` The original document object. This is returned only when the `return_documents` request parameter is `true`. The structure is `{"text": "Original document text"}`. index `int` The original index of the corresponding document in the input `documents` list. relevance_score `double` The semantic relevance score between the document and the query. The value ranges from 0.0 to 1.0. A higher score indicates stronger relevance. Note This score is a relative value within the current request and is used primarily for sorting documents within this request. It cannot be used as an absolute value for comparison across different requests.
usage `object` Provides output statistics. Properties total_tokens `int` The total number of tokens consumed by the request.
code `string` The error code. Returned only when the request fails. See error codes for details.
message `string` Detailed error message. Returned only when the request fails. See error codes for details.

Use the SDK

Example

The following example shows how to call the rerank model API.

The parameter names in the SDK are mostly consistent with those in the HTTP API, but the parameter structure is encapsulated. For example, the HTTP API uses nested input and parameters structures, while the SDK uses a flat structure. Note this difference during development.

Python

import dashscope

def text_rerank():
    resp = dashscope.TextReRank.call(
        model="gte-rerank-v2",
        query="What is a rerank model?",
        documents=[
            "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance.",
            "Quantum computing is a cutting-edge field of computer science.",
            "The development of pre-trained language models has brought new advancements to rerank models."
        ],
        top_n=2,
        return_documents=True
    )
    print(resp)

if __name__ == '__main__':
    text_rerank()

Sample output

Note

The SDK encapsulates the original HTTP response. For a successful request, the SDK always returns the code and message fields with empty strings as their values.

{
    "status_code": 200,
    "request_id": "4b0805c0-6b36-490d-8bc1-4365f4c89905",
    "code": "",
    "message": "",
    "output": {
        "results": [
            {
                "index": 0,
                "relevance_score": 0.9334521178273196,
                "document": {
                    "text": "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance."
                }
            },
            {
                "index": 2,
                "relevance_score": 0.34100082626411193,
                "document": {
                    "text": "The development of pre-trained language models has brought new advancements to rerank models."
                }
            }
        ]
    },
    "usage": {
        "total_tokens": 79
    }
}

Error Codes

If the model call fails and returns an error message, see Error messages for resolution.

Model overview

Singapore

Beijing

Input limitations

Prerequisites

HTTP

Request

qwen3-rerank

qwen3-vl-rerank

gte-rerank-v2

Request headers

Request body

Response

Successful response

Failed response

Use the SDK

Example

Sample output

Error Codes