All Products
Search
Document Center

Alibaba Cloud Model Studio:Embedding and rerank

Last Updated:Jun 15, 2026

Find the right models for semantic search, Retrieval-Augmented Generation (RAG), cross-modal matching, and reranking.

Text embedding

For plain text search, RAG, or clustering, use text-embedding-v4. If you need to migrate existing v3 indexes, use text-embedding-v3 for dimension compatibility.

How to choose a dimension

  • For large-scale search with limited storage: choose 256 or 512 dimensions.

  • For general-purpose use cases: choose 1024 dimensions (the default, offering a good balance).

  • For high retrieval accuracy: choose 1536 or 2048 dimensions.

Multimodal embedding

For cross-modal retrieval, such as text-to-image or text-to-video search, choose between fused and independent vectors.

Fused vs. independent vectors

  • Fused vector: Fuses text and images into a single vector for mixed text-and-image retrieval. Use qwen3-vl-embedding.

  • Independent vector: Generates a separate vector for each modality. Suitable for cross-modal searches (text-to-image, image-to-image). Use tongyi-embedding-vision-plus.

Working with text-only data?

Use text-embedding-v4 — it is faster, more cost-effective, and offers more dimension options. Multimodal embedding is for cross-modal retrieval such as text-to-image and text-to-video search.

Rerank

After embedding-based retrieval, use a rerank model to reorder the top-N results with cross-attention for higher accuracy.

  • Plain text rerank: Uses qwen3-rerank, with support for over 100 languages and up to 500 documents.

  • Multimodal reranking: Uses qwen3-vl-rerank to rerank a mix of text, images, and videos.

All models

Model ID

Type

Dimension

Max tokens

Use case

text-embedding-v4

Text embedding

64–2048 (default: 1024)

8,192

Text search, RAG, clustering

text-embedding-v3

Text embedding

512–1024 (default: 1024)

8,192

Migrating existing v3 indexes

tongyi-embedding-vision-plus

Multimodal embedding

64–1152 (default: 1152)

1,024

Cross-modal search (independent vector only)

tongyi-embedding-vision-flash

Multimodal embedding

64–768 (default: 768)

1,024

Cost-sensitive cross-modal search (independent vector only)

qwen3-rerank

Rerank

-

4,000 per item

Reranking text search results, RAG

qwen3-vl-embedding

Multimodal embedding

256–2560 (default: 2560)

32,000

Mixed text-and-image retrieval (fused vector + independent vector)

qwen3-vl-rerank

Rerank

-

8,000 per item

Reranking multimodal search results