Find the right models for semantic search, Retrieval-Augmented Generation (RAG), cross-modal matching, and reranking.
Text embedding
For plain text search, RAG, or clustering, use text-embedding-v4. If you need to migrate existing v3 indexes, use text-embedding-v3 for dimension compatibility.
How to choose a dimension
For large-scale search with limited storage: choose 256 or 512 dimensions.
For general-purpose use cases: choose 1024 dimensions (the default, offering a good balance).
For high retrieval accuracy: choose 1536 or 2048 dimensions.
Multimodal embedding
For cross-modal retrieval, such as text-to-image or text-to-video search, choose between fused and independent vectors.
Fused vs. independent vectors
Fused vector: Fuses text and images into a single vector for mixed text-and-image retrieval. Use
qwen3-vl-embedding.Independent vector: Generates a separate vector for each modality. Suitable for cross-modal searches (text-to-image, image-to-image). Use
tongyi-embedding-vision-plus.
Working with text-only data?
Use text-embedding-v4 — it is faster, more cost-effective, and offers more dimension options. Multimodal embedding is for cross-modal retrieval such as text-to-image and text-to-video search.
Rerank
After embedding-based retrieval, use a rerank model to reorder the top-N results with cross-attention for higher accuracy.
Plain text rerank: Uses
qwen3-rerank, with support for over 100 languages and up to 500 documents.Multimodal reranking: Uses
qwen3-vl-rerankto rerank a mix of text, images, and videos.
All models
Model ID | Type | Dimension | Max tokens | Use case |
| Text embedding | 64–2048 (default: 1024) | 8,192 | Text search, RAG, clustering |
| Text embedding | 512–1024 (default: 1024) | 8,192 | Migrating existing v3 indexes |
| Multimodal embedding | 64–1152 (default: 1152) | 1,024 | Cross-modal search (independent vector only) |
| Multimodal embedding | 64–768 (default: 768) | 1,024 | Cost-sensitive cross-modal search (independent vector only) |
| Rerank | - | 4,000 per item | Reranking text search results, RAG |
| Multimodal embedding | 256–2560 (default: 2560) | 32,000 | Mixed text-and-image retrieval (fused vector + independent vector) |
| Rerank | - | 8,000 per item | Reranking multimodal search results |