A rerank model re-scores documents returned by initial retrieval, surfacing the most relevant results at the top.
Model overview
The gte-rerank model will be discontinued on May 30, 2026. Switch to qwen3-rerank.
Singapore
|
Model |
Max documents |
Max input tokens per item |
Max input tokens per request |
Supported languages |
Scenarios |
|
qwen3-rerank |
500 |
4,000 |
120,000 |
Over 100 major languages, such as Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian |
|
Beijing
|
Model |
Max documents |
Max input tokens per item |
Max input tokens per request |
Supported languages |
Scenarios |
|
qwen3-vl-rerank |
Text: 100 Image: 40 Video: 4 |
8,000 |
120,000 |
33 major languages, such as Chinese, English, Japanese, Korean, French, and German |
|
|
gte-rerank-v2 |
500 |
4,000 |
30,000 |
Over 50 languages, such as Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, German, Indonesian, and Arabic |
|
-
Max input tokens per item: Maximum tokens per query or document. Exceeding this limit triggers truncation, which may reduce ranking accuracy.
-
Max documents: Maximum documents per request. For qwen3-vl-rerank, the limit varies by document type (text, image, video, or mixed).
-
Max input tokens per request: Calculated as
Query Tokens × Number of documents + Total document tokens. Must not exceed the per-request limit.
Input limitations
|
Model |
Image |
Video |
|
qwen3-vl-rerank |
JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, and SGI (URL or Base64 supported) |
MP4, AVI, and MOV (URL only) |
Prerequisites
Create an API key and set the API key as an environment variable. To use the SDK: install the DashScope SDK.
HTTP
Each model uses a different endpoint:
-
qwen3-rerank:
POST https://dashscope.aliyuncs.com/compatible-api/v1/reranks -
qwen3-vl-rerank / gte-rerank-v2:
POST https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank
The two APIs differ in request body structure and response format. See the request and response examples for each model.
Request |
qwen3-rerank
Replace
qwen3-vl-rerankText query
Image query
gte-rerank-v2
|
Request headers |
|
|
Content-Type The content type of the request. Must be |
|
|
Authorization Authenticates the request with a Model Studio API key. Example: Bearer sk-xxxx. |
|
Request body |
|
|
model The model name. Supported values: qwen3-rerank, gte-rerank-v2, qwen3-vl-rerank. |
|
|
input Input content. For |
|
|
parameters Optional parameters. For |
Response |
Successful responseqwen3-rerank
qwen3-vl-rerank / gte-rerank-v2
Failed responseIf a request fails,
|
|
request_id Unique request identifier for tracing and troubleshooting. |
|
|
output Task output. For |
|
|
usage Token usage statistics. |
|
|
code Error code. Returned only for failed requests. See Error codes. |
|
|
message Detailed error message. Returned only for failed requests. See Error codes. |
Use the SDK
Example
Call the rerank model API.
SDK parameter names match the HTTP API, but the structure differs. HTTP uses nestedinputandparametersobjects; the SDK uses a flat structure.
import dashscope
def text_rerank():
resp = dashscope.TextReRank.call(
model="gte-rerank-v2",
query="What is a rerank model?",
documents=[
"Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance.",
"Quantum computing is a cutting-edge field of computer science.",
"The development of pre-trained language models has brought new advancements to rerank models."
],
top_n=2,
return_documents=True
)
print(resp)
if __name__ == '__main__':
text_rerank()Use qwen3-vl-rerank for multimodal reranking with an image query.
import dashscope
from http import HTTPStatus
import json
def vl_rerank():
resp = dashscope.TextReRank.call(
model="qwen3-vl-rerank",
query={"image": "https://img.alicdn.com/imgextra/i3/O1CN01rdstgY1uiZWt8gqSL_!!6000000006071-0-tps-1970-356.jpg"},
documents=[
{"text": "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance."},
{"image": "https://img.alicdn.com/imgextra/i3/O1CN01rdstgY1uiZWt8gqSL_!!6000000006071-0-tps-1970-356.jpg"},
{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"}
],
top_n=2,
return_documents=True
)
if resp.status_code == HTTPStatus.OK:
print(json.dumps(resp, default=str, ensure_ascii=False, indent=4))
else:
print(resp)
if __name__ == '__main__':
vl_rerank()Sample output
The SDK wraps the HTTP response. For successful requests, code and message are always empty strings.
{
"status_code": 200,
"request_id": "4b0805c0-6b36-490d-8bc1-4365f4c89905",
"code": "",
"message": "",
"output": {
"results": [
{
"index": 0,
"relevance_score": 0.9334521178273196,
"document": {
"text": "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance."
}
},
{
"index": 2,
"relevance_score": 0.34100082626411193,
"document": {
"text": "The development of pre-trained language models has brought new advancements to rerank models."
}
}
]
},
"usage": {
"total_tokens": 79
}
}
Error codes
If the model call fails and returns an error message, see Error codes for resolution.