All Products
Search
Document Center

Alibaba Cloud Model Studio:Text rerank

Last Updated:Feb 11, 2026

To ensure efficiency, retrieval systems may return results that are not sufficiently precise during the initial retrieval phase. A rerank model performs a more accurate sorting of the retrieved documents to ensure the most relevant results appear at the top.

Model overview

Singapore

Model

Max number of documents

Max input tokens per item

Max input tokens per request

Supported languages

Price (per 1M tokens)

Free quota

Scenarios

qwen3-rerank

500

4,000

120,000

Over 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

$0.1

1 million tokens

Valid for 90 days after activating Model Studio

  • Text semantic retrieval

  • RAG applications

Beijing

Model

Max number of documents

Max input tokens per item

Max input tokens per request

Supported languages

Price (per 1M tokens)

Free quota

Scenarios

qwen3-vl-rerank

100

8,000

800,000

33 major languages, such as Chinese, English, Japanese, Korean, French, and German

Image: $0.258

Text: $0.1

No free quota

  • Image clustering

  • Cross-modal search

  • Image retrieval

  • Video retrieval

gte-rerank-v2

500

4,000

30,000

Over 50 languages, such as Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic

$0.115

  • Text semantic retrieval

  • RAG applications

  • Max input tokens per item: The maximum number of tokens allowed for each query or document. If the input exceeds this limit, it is truncated. The API computes results based on the truncated content, which may lead to inaccurate ranking.

  • Max number of documents: The maximum number of documents permitted in a single request.

  • Max input tokens per request: Calculated using the formula Query Tokens × Number of documents + Total document tokens. This total must not exceed the maximum input tokens allowed per request.

Input limitations

Model

Image

Video

qwen3-vl-rerank

JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, and SGI (URL or Base64 supported)

MP4, AVI, and MOV (URL only)

Prerequisites

Get an API key and set the API key as an environment variable. To use the SDK, install the DashScope SDK.

HTTP

POST https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank

Request

qwen3-rerank

curl --request POST \
  --url https://dashscope-intl.aliyuncs.com/compatible-api/v1/reranks \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
        "model": "qwen3-rerank",
        "documents": [
                "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance.",
                "Quantum computing is a cutting-edge field of computer science.",
                "The development of pre-trained language models has brought new advancements to rerank models."
        ],
        "query": "What is a rerank model?",
        "top_n": 2,
        "instruct": "Given a web search query, retrieve relevant passages that answer the query."
}'

qwen3-vl-rerank

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen3-vl-rerank",
    "input":{
         "query": "What is a rerank model?",
         "documents": [
            {"text": "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance."},
            {"image": "https://img.alicdn.com/imgextra/i3/O1CN01rdstgY1uiZWt8gqSL_!!6000000006071-0-tps-1970-356.jpg"},
            {"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"}
         ]
    },
    "parameters": {
        "return_documents": true,
        "top_n": 2,
        "fps": 1.0
    }
}'

gte-rerank-v2

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "gte-rerank-v2",
    "input":{
         "query": "What is a rerank model?",
         "documents": [
         "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance.",
         "Quantum computing is a cutting-edge field of computer science.",
         "The development of pre-trained language models has brought new advancements to rerank models."
         ]
    },
    "parameters": {
        "return_documents": true,
        "top_n": 2
    }
}'

Request headers

Content-Type string (Required)

The content type of the request. Must be application/json.

Authorization string (Required)

The authentication credentials using a Model Studio API key.

Example: Bearer sk-xxxx

Request body

model string (Required)

The model name. Supported models include qwen3-rerank, gte-rerank-v2, and qwen3-vl-rerank.

input object (Required)

The input content.

When you use qwen3-rerank, the input object is not required. In this case, the query and documents must be at the same level as the model parameter.

Properties

query string (Required)

The query text. The maximum length is 4,000 tokens.

documents array (Required)

A list of candidate documents to sort. Each element is a string.

When you use the qwen3-vl-embedding model, each element is a dictionary or string that specifies the content type and value. The format is {"modality type": "an input string, or an image or video URL"}. The supported modality types are text, image, and video.

  • text: The value is a string. You can also pass the string directly without using a dictionary.

  • image: The value can be a publicly accessible URL or a Base64-encoded Data URI. The Base64 format is data:image/{format};base64,{data}, where {format} is the image format, such as jpeg or png, and {data} is the Base64-encoded string.

  • video: The value must be a publicly accessible URL.

parameters object (Optional)

Optional parameters.

When you use qwen3-rerank, the parameters object is not required. In this case, the top_n and instruct parameters must be at the same level as the model parameter.

Properties

top_n int (Optional)

The number of top-ranked documents to return. By default, all documents are returned. If the specified value exceeds the total number of documents, all documents are returned.

return_documents bool (Optional)

Specifies whether to return the original text of the documents in the sorting results. The default value is false to reduce network overhead. Supported models: gte-rerank-v2 and qwen3-vl-rerank.

instruct string (Optional)

A custom instruction for the sorting task. This parameter applies only when you use qwen3-rerank or qwen3-vl-rerank. You can use this parameter to guide the model to apply different sorting policies. Examples:

  • Q&A retrieval task (default): "Given a web search query, retrieve relevant passages that answer the query."

    • Focus: Find answers to questions. The model prioritizes evaluating whether a document answers the question in the query.

    • Example: For the query "How to prevent a cold?", the document "Washing hands frequently is an effective way to prevent colds" receives a high score. The document "A cold is a common illness", although topically relevant, receives a significantly lower score because it does not provide an answer.

  • Semantic similarity sorting task: "Retrieve semantically similar text."

    • Focus: Determine semantic equivalence. The model evaluates whether the core meanings of the query and the document are consistent, regardless of specific wording or sentence structure.

    • Example: In a frequently asked questions (FAQ) scenario, the user query "How do I change my password?" and the candidate question "What if I forget my password?" are semantically similar and should receive a high score. The model focuses on whether both reflect the same user intent.

Write the instruction in English. If you do not specify this parameter, the model performs a Q&A retrieval task by default. For more task instructions, see the examples in the model repository.

fps float (Optional)

This parameter is supported only by qwen3-vl-rerank. It controls the number of frames extracted from a video. A smaller value indicates fewer frames are extracted. The value ranges from 0 to 1. The default value is 1.0.

Response

Successful response

{
    "output": {
        "results": [
            {
                "document": {
                    "text": "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance."
                },
                "index": 0,
                "relevance_score": 0.9334521178273196
            },
            {
                "document": {
                    "text": "The development of pre-trained language models has brought new advancements to rerank models."
                },
                "index": 2,
                "relevance_score": 0.34100082626411193
            }
        ]
    },
    "usage": {
        "total_tokens": 79
    },
    "request_id": "85ba5752-1900-47d2-8896-23f99b13f6e1"
}

Failed response

If a request fails, the code and message fields in the response indicate the cause of the error.

{
    "code":"InvalidApiKey",
    "message":"Invalid API-key provided.",
    "request_id":"fb53c4ec-1c12-4fc4-a580-cdb7c3261fc1"
}

request_id string

Unique identifier for the request. Use for tracing and troubleshooting issues.

output object

The task output.

Properties

results array

A list of sorting results, sorted by relevance_score in descending order.

Properties

document dict

The original document object. This is returned only when the return_documents request parameter is true. The structure is {"text": "Original document text"}.

index int

The original index of the corresponding document in the input documents list.

relevance_score double

The semantic relevance score between the document and the query. The value ranges from 0.0 to 1.0. A higher score indicates stronger relevance.

Note

This score is a relative value within the current request and is used primarily for sorting documents within this request. It cannot be used as an absolute value for comparison across different requests.

usage object

Provides output statistics.

Properties

total_tokens int

The total number of tokens consumed by the request.

code string

The error code. Returned only when the request fails. See error codes for details.

message string

Detailed error message. Returned only when the request fails. See error codes for details.

Use the SDK

Example

The following example shows how to call the rerank model API.

The parameter names in the SDK are mostly consistent with those in the HTTP API, but the parameter structure is encapsulated. For example, the HTTP API uses nested input and parameters structures, while the SDK uses a flat structure. Note this difference during development.
import dashscope

def text_rerank():
    resp = dashscope.TextReRank.call(
        model="gte-rerank-v2",
        query="What is a rerank model?",
        documents=[
            "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance.",
            "Quantum computing is a cutting-edge field of computer science.",
            "The development of pre-trained language models has brought new advancements to rerank models."
        ],
        top_n=2,
        return_documents=True
    )
    print(resp)

if __name__ == '__main__':
    text_rerank()

Sample output

Note

The SDK encapsulates the original HTTP response. For a successful request, the SDK always returns the code and message fields with empty strings as their values.

{
    "status_code": 200,
    "request_id": "4b0805c0-6b36-490d-8bc1-4365f4c89905",
    "code": "",
    "message": "",
    "output": {
        "results": [
            {
                "index": 0,
                "relevance_score": 0.9334521178273196,
                "document": {
                    "text": "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance."
                }
            },
            {
                "index": 2,
                "relevance_score": 0.34100082626411193,
                "document": {
                    "text": "The development of pre-trained language models has brought new advancements to rerank models."
                }
            }
        ]
    },
    "usage": {
        "total_tokens": 79
    }
}

Error Codes

If the model call fails and returns an error message, see Error messages for resolution.