All Products
Search
Document Center

Alibaba Cloud Model Studio:Embedding

Last Updated:Mar 25, 2026

Embedding models convert data such as text, images, and videos into vectors for downstream tasks, including semantic search, recommendation, clustering, classification, and anomaly detection.

Preparations

Get an API key and export the API key as an environment variable. If you use the OpenAI SDK or DashScope SDK to make calls, install the SDK.

Get embeddings

Text embedding

To call the API, specify the text to embed and the model in the request.

OpenAI compatible interface

import os
from openai import OpenAI

input_text = "The quality of the clothes is excellent"

client = OpenAI(
    # API keys differ by Region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # If an environment variable is not configured, replace this with your API key.
    # This is the URL for the Singapore Region. To use a model in the China (Beijing) Region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.embeddings.create(
    model="text-embedding-v4",
    input=input_text
)

print(completion.model_dump_json())
const OpenAI = require("openai");

// Initialize the OpenAI client
const openai = new OpenAI({
    // If an environment variable is not configured, replace this with your API key.
    // API keys differ by Region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    apiKey: process.env.DASHSCOPE_API_KEY, 
    // This is the URL for the Singapore Region. To use a model in the China (Beijing) Region, replace the baseURL with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});

async function getEmbedding() {
    try {
        const inputTexts = "The quality of the clothes is excellent";
        const completion = await openai.embeddings.create({
            model: "text-embedding-v4",
            input: inputTexts,
            dimensions: 1024 // Specifies the vector dimension. This parameter is supported only by text-embedding-v3 and text-embedding-v4.
        });

        console.log(JSON.stringify(completion, null, 2));
    } catch (error) {
        console.error('Error:', error);
    }
}

getEmbedding();
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/embeddings' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "text-embedding-v4",
    "input": "The quality of the clothes is excellent"
}'

DashScope

import dashscope
from http import HTTPStatus

# If you use a model in the China (Beijing) Region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

input_text = "The quality of the clothes is excellent"
resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input=input_text,
)

if resp.status_code == HTTPStatus.OK:
    print(resp)
import com.alibaba.dashscope.embeddings.TextEmbedding;
import com.alibaba.dashscope.embeddings.TextEmbeddingParam;
import com.alibaba.dashscope.embeddings.TextEmbeddingResult;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;

import java.util.Collections;
public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
        // For the China (Beijing) Region, replace it with: https://dashscope.aliyuncs.com/api/v1
    }
     public static void main(String[] args) {
        String inputTexts = "The quality of the clothes is excellent";
        try {
            // Build the request parameters
            TextEmbeddingParam param = TextEmbeddingParam
                    .builder()
                    .model("text-embedding-v4")
                    // Input text
                    .texts(Collections.singleton(inputTexts))
                    .build();

            // Create a model instance and call it
            TextEmbedding textEmbedding = new TextEmbedding();
            TextEmbeddingResult result = textEmbedding.call(param);

            // Print the result
            System.out.println(result);

        } catch (NoApiKeyException e) {
            // Catch and handle the exception for an unset API key
            System.err.println("An exception occurred during the API call: " + e.getMessage());
            System.err.println("Check if your API key is configured correctly.");
            e.printStackTrace();
        }
    }
}
# ======= Important =======
# If you use a model in the China (Beijing) Region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding
# === Delete this comment before execution ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "text-embedding-v4",
    "input": {
        "texts": [
        "The quality of the clothes is excellent"
        ]
    }
}'

Independent multimodal vector

This feature generates Individual Vectors for content in different modalities, such as text, image, and video. It is ideal for use cases that require processing each content type separately.

To generate Independent Multimodal Vectors, use the DashScope SDK or call the API directly. This feature is not supported by the OpenAI compatible interface or the Console.

Python

import dashscope
import json
import os
from http import HTTPStatus

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# The base URL above is for the Singapore Region. To use a model in the China (Beijing) Region, replace it with: https://dashscope.aliyuncs.com/api/v1

# The input can be a video
# video = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"
# input = [{'video': video}]
# or an image
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"
input = [{'image': image}]
resp = dashscope.MultiModalEmbedding.call(
    # If an environment variable is not configured, provide your Model Studio API key directly, for example: api_key="sk-xxx"
    # API keys differ by Region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model="tongyi-embedding-vision-plus",
    input=input
)

print(json.dumps(resp.output, indent=4))
    

Java

import com.alibaba.dashscope.embeddings.MultiModalEmbedding;
import com.alibaba.dashscope.embeddings.MultiModalEmbeddingItemImage;
import com.alibaba.dashscope.embeddings.MultiModalEmbeddingItemVideo;
import com.alibaba.dashscope.embeddings.MultiModalEmbeddingParam;
import com.alibaba.dashscope.embeddings.MultiModalEmbeddingResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

import java.util.Collections;

public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
        // For the China (Beijing) Region, replace it with: https://dashscope.aliyuncs.com/api/v1
    }
    public static void main(String[] args) {
        try {
            MultiModalEmbedding embedding = new MultiModalEmbedding();
            // The input can be a video
            // MultiModalEmbeddingItemVideo video = new MultiModalEmbeddingItemVideo(
            //     "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4");
            // or an image
            MultiModalEmbeddingItemImage image = new MultiModalEmbeddingItemImage(
                "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png");

            MultiModalEmbeddingParam param = MultiModalEmbeddingParam.builder()
                // If an environment variable is not configured, add your Model Studio API key, for example: .apiKey("sk-xxx")
                .model("tongyi-embedding-vision-plus")
                .contents(Collections.singletonList(image))
                .build();

            MultiModalEmbeddingResult result = embedding.call(param);
            System.out.println(result);

        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.err.println("An exception occurred during the API call: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Multimodal fused vector

This feature combines content from different modalities, such as text, image, and video, into a single Fused Vector. It is suitable for use cases like text-to-image search, image-to-image search, text-to-video search, and cross-modal retrieval.

To generate a Multimodal Fused Vector, use the Python DashScope SDK or call the API directly. This feature is not supported by the OpenAI compatible interface, the Java DashScope SDK, or the Console.
  • qwen3-vl-embedding: Supports generating both fused vectors and independent vectors. It generates a fused vector when text, image, and video are in the same object, and an independent vector for each of these modalities when they are provided as separate elements.

  • qwen2.5-vl-embedding: Supports only Fused Vectors and does not support Individual Vectors.

  • tongyi-embedding-vision-plus-2026-03-06 and tongyi-embedding-vision-flash-2026-03-06 support both fused and independent embeddings. create a fused embedding by placing text, image, and video in the same content object.

Python

import dashscope
import json
import os
from http import HTTPStatus

# Multimodal Fused Vector: Combines text, image, and video into one Fused Vector.
# Suitable for use cases like cross-modal retrieval and image search.
text = "This is a test text for generating a Multimodal Fused Vector."
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"
video = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"

# The input contains text, image, and video, which the model fuses into one Fused Vector.
input_data = [
    {
        "text": text,
        "image": image,
        "video": video
    }
]

# Use qwen3-vl-embedding to generate a Fused Vector
resp = dashscope.MultiModalEmbedding.call(
    # If an environment variable is not configured, provide your Model Studio API key directly, for example: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen3-vl-embedding",
    input=input_data,
    # Optional parameter: Specifies the vector dimension. Supported values: 2560, 2048, 1536, 1024, 768, 512, 256. Default: 2560
    # dimension = 1024
)

print(json.dumps(resp.output, indent=4))

Java (HTTP)

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class Main {
    public static void main(String[] args) throws Exception {
        // If an environment variable is not configured, provide your Model Studio API key directly, for example: String apiKey = "sk-xxx";
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // Multimodal Fused Vector: Combines text, image, and video into one Fused Vector.
        // When text, image, and video are placed in the same object in the input, the model generates a Fused Vector.
        String requestBody = "{"
                + "\"model\": \"qwen3-vl-embedding\","
                + "\"input\": {"
                + "  \"contents\": [{"
                + "    \"text\": \"This is a test text for generating a Multimodal Fused Vector.\","
                + "    \"image\": \"https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png\","
                + "    \"video\": \"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4\""
                + "  }]"
                + "}"
                + "}";

        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create("https://dashscope.aliyuncs.com/api/v1/services/embeddings/multimodal-embedding/multimodal-embedding"))
                .header("Authorization", "Bearer " + apiKey)
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(requestBody))
                .build();

        HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
        System.out.println(response.body());
    }
}

Model selection

The appropriate model depends on your input data type and use case.

  • For plain text or code: We recommend using text-embedding-v4. It is our highest-performing model, supports advanced features like task instructions and sparse vectors, and covers most text processing scenarios.

  • For multimodal content:

    • Fused multimodal embeddings: To represent single-modal or mixed-modal inputs as a fused embedding for scenarios like cross-modal retrieval and image search, use qwen3-vl-embedding, tongyi-embedding-vision-plus-2026-03-06, or tongyi-embedding-vision-flash-2026-03-06. For example, input an image of a shirt with the text "Find a similar style that looks more youthful." The model fuses the image and text instruction into a single embedding.

    • Independent embeddings: To generate an independent embedding for each input, such as an image and its corresponding text title, select tongyi-embedding-vision-plus, tongyi-embedding-vision-flash, tongyi-embedding-vision-plus-2026-03-06, tongyi-embedding-vision-flash-2026-03-06, or the general-purpose multimodal model multimodal-embedding-v1.

The table below lists the specifications for all embedding models.

Text embedding

Singapore

Model

Embedding dimensions

Batch size

Max tokens per batch (Note)

Price (per 1M input tokens)

Language

Free quota (Note)

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64

10

8,192

$0.07

100+ major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

1 million tokens

Validity Period: 90 days after you activate Model Studio

text-embedding-v3

1,024 (default), 768, 512

50+ major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian

500,000 tokens

Validity Period: 90 days after you activate Model Studio

China (Beijing)

Model

Embedding dimensions

Batch size

Max tokens per batch (Note)

Price (per 1M input tokens)

Language

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64

10

8,192

$0.072

100+ major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, as well as multiple programming languages

China (Hong Kong)

Model

Embedding dimensions

Batch size

Max tokens per batch (Note)

Price (per 1M input tokens)

Language

text-embedding-v4

Part of the Qwen3-Embedding series

2,048, 1,536, 1,024 (default), 768, 512, 256, 128, 64

10

8,192

$0.07

100+ major languages, including Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, and Russian, as well as multiple programming languages

Note

Batch size is the maximum number of texts process in a single API call. For example, text-embedding-v4 has a batch size of 10. This means include up to 10 texts to embed in a single request, and each text cannot exceed 8,192 Tokens. This limit applies to:

  • String array input: The array can contain up to 10 elements.

  • File input: The text file can contain up to 10 lines of text.

Multimodal embedding

The model generates vector embeddings from text, image, or video inputs. use these embeddings for video classification, image classification, image-text retrieval, text-to-image search, and text-to-video search.

The API supports single inputs of text, image, or video, and combined inputs like text and images. Some models support multiple inputs of the same type, such as multiple images in one request. For specific restrictions, see the limitations for each model.

Singapore

Model

Embedding dimensions

Text length limit

Image size limit

Video size limit

Price

Free quota(Note)

tongyi-embedding-vision-plus

1152 (default), 1024, 512, 256, 128, 64

1,024 tokens

Up to 3 MB per image

Up to 10 MB per video file

Image/Video: $0.09

Text: $0.09

1 million tokens

Validity Period: 90 days after activating Model Studio

tongyi-embedding-vision-flash

768 (default), 512, 256, 128, 64

Image/Video: $0.03

Text: $0.09

Beijing

Model

Embedding dimensions

Text length limit

Image size limit

Video size limit

Price

qwen3-vl-embedding

2560 (default), 2048, 1536, 1024, 768, 512, 256

32,000 tokens

Max. 1 image, up to 5 MB

Up to 50 MB per video file

Image/Video: $0.258

Text: $0.1

multimodal-embedding-v1

1024

512 tokens

Up to 8 images, 3 MB each

Up to 10 MB per video file

Free trial

Input and language restrictions:

Fused multimodal model

Model

Text

Image

Video

Request limit

qwen3-vl-embedding

Supports 33 major languages, including Chinese, English, Japanese, Korean, French, and German.

JPEG, PNG, WEBP, BMP, TIFF, ICO, DIB, ICNS, and SGI (URL or Base64 supported).

MP4, AVI, and MOV (URL only).

Max 20 elements per request (up to 5 images).

Independent multimodal model

Model

Text

Image

Video

Request limit

tongyi-embedding-vision-plus

Chinese and English

JPG, PNG, and BMP (URL or Base64 supported).

MP4, MPEG, AVI, MOV, MPG, WEBM, FLV, and MKV (URL only).

No element count limit; requests are limited by the token count per batch.

tongyi-embedding-vision-flash

multimodal-embedding-v1

Up to 20 content elements per request, including a maximum of 1 image and 1 video.

Core features

Custom embedding dimension

The text-embedding-v4, text-embedding-v3, tongyi-embedding-vision-plus, tongyi-embedding-vision-flash, and qwen3-vl-embedding models support custom vector dimensions. A higher dimension preserves more semantic information but also increases storage and compute costs.

  • General use cases (Recommended): A dimension of 1024 offers an optimal balance between performance and cost, making it suitable for most Semantic Search tasks.

  • High-precision requirements: For high-precision applications, select a dimension of 1536 or 2048. This improves precision but also significantly increases storage and compute overhead.

  • Resource-constrained environments: In cost-sensitive scenarios, select a dimension of 768 or lower. This significantly reduces resource consumption at the cost of some semantic information.

OpenAI API

import os
from openai import OpenAI

client = OpenAI(
    # API Keys differ by Region. To get an API Key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore Region. If you use a model in the China (Beijing) Region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

resp = client.embeddings.create(
    model="text-embedding-v4",
    input=["I like it and will buy from here again"],
    # Set the vector dimension to 256
    dimensions=256
)
print(f"Vector dimension: {len(resp.data[0].embedding)}")

DashScope

import dashscope

# If you use a model in the China (Beijing) Region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input=["I like it and will buy from here again"],
    # Set the vector dimension to 256
    dimension=256
)

print(f"Vector dimension: {len(resp.output['embeddings'][0]['embedding'])}")

Query and document text (text_type)

This parameter is only available through the DashScope SDK and API.

To achieve optimal results in search-related tasks, process different types of content based on their role. The text_type parameter is designed for this purpose:

  • text_type: 'query': Use for the Query Text that the user inputs. The Model generates a "title-like" vector that is more directional and optimized for "asking" and "finding."

  • text_type: 'document' (default): Use for the Document Text stored in your knowledge base. The Model generates a "body-like" vector that contains more comprehensive information and is optimized for matching.

When you match a short text against a long text, you should distinguish between query and document. However, for tasks such as Clustering or Classification where all texts have the same role, you do not need to set this parameter.

Task instructions (instruct)

This parameter is only available through the DashScope SDK and API.

Provide a clear English task instruction to guide the text-embedding-v4 Model to optimize vector quality for a specific retrieval scenario, improving precision. When you use this feature, you must set the text_type parameter to query.

# Scenario: When building document vectors for a search engine, add an instruction to optimize vector quality for retrieval.
resp = dashscope.TextEmbedding.call(
    model="text-embedding-v4",
    input="Research papers on Machine Learning",
    text_type="query",
    instruct="Given a research paper query, retrieve relevant research paper"
)

Dense and sparse vectors

This parameter is only available through the DashScope SDK and API.

The text-embedding-v4 and text-embedding-v3 models support three types of vector outputs for different retrieval strategies.

Vector type (output_type)

Core advantages

Key limitations

Typical use cases

dense

Deep semantic understanding. Identifies synonyms and context, leading to more relevant retrieval results.

Higher compute and storage costs. Does not guarantee an Exact Match for keywords.

Semantic Search, AI-Powered Q&A, and Content Recommendation.

sparse

High computational efficiency. Focuses on Exact Match for keywords and enables fast filtering.

Lacks semantic understanding. Cannot process synonyms or context.

Log Retrieval, product SKU search, and precise information filtering.

dense&sparse

Combines semantic and keyword matching for optimal search results. The generation cost is the same, and the API Call overhead is identical to that of single-vector mode.

Requires more storage. The system architecture and retrieval logic are more complex.

High-quality, production-grade Hybrid Search Engines.

Examples

This code is for demonstration only. In a production environment, pre-compute and store embeddings in a vector database. For retrieval, you only need to generate the query embedding.

Semantic search

Compute the vector similarity between a query and documents to perform precise semantic search.

import dashscope
import numpy as np
from dashscope import TextEmbedding

# If you use a model in the China (Beijing) Region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def cosine_similarity(a, b):
    ""Calculate cosine similarity.""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def semantic_search(query, documents, top_k=5):
    ""Perform semantic search.""
    # Generate the query embedding.
    query_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=query,
        dimension=1024
    )
    query_embedding = query_resp.output['embeddings'][0]['embedding']

    # Generate the document embeddings.
    doc_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=documents,
        dimension=1024
    )

    # Calculate similarities.
    similarities = []
    for i, doc_emb in enumerate(doc_resp.output['embeddings']):
        similarity = cosine_similarity(query_embedding, doc_emb['embedding'])
        similarities.append((i, similarity))

    # Sort and return the top_k results.
    similarities.sort(key=lambda x: x[1], reverse=True)
    return [(documents[i], sim) for i, sim in similarities[:top_k]]

# Example usage
documents = [
    "Artificial intelligence is a branch of computer science",
    "Machine learning is an important method for achieving artificial intelligence",
    "Deep learning is a subfield of machine learning"
]
query = "What is AI?"
results = semantic_search(query, documents, top_k=2)
for doc, sim in results:
    print(f"Similarity: {sim:.3f}, Document: {doc}")

Recommendation system

Analyze vectors from user behavior history to identify preferences and recommend similar items.

import dashscope
import numpy as np
from dashscope import TextEmbedding

# If you use a model in the China (Beijing) Region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def cosine_similarity(a, b):
    ""Calculate cosine similarity.""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def build_recommendation_system(user_history, all_items, top_k=10):
    ""Build a recommendation system.""
    # Generate user history embeddings.
    history_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=user_history,
        dimension=1024
    )

    # Calculate the user preference vector by averaging.
    user_embedding = np.mean([
        emb['embedding'] for emb in history_resp.output['embeddings']
    ], axis=0)

    # Generate all item embeddings.
    items_resp = TextEmbedding.call(
        model="text-embedding-v4",
        input=all_items,
        dimension=1024
    )

    # Calculate recommendation scores.
    recommendations = []
    for i, item_emb in enumerate(items_resp.output['embeddings']):
        score = cosine_similarity(user_embedding, item_emb['embedding'])
        recommendations.append((all_items[i], score))

    # Sort and return the recommendation results.
    recommendations.sort(key=lambda x: x[1], reverse=True)
    return recommendations[:top_k]

# Example usage
user_history = ["Science Fiction", "Action", "Suspense"]
all_movies = ["Future World", "Space Adventure", "Ancient War", "Romantic Journey", "Superhero"]
recommendations = build_recommendation_system(user_history, all_movies)
for movie, score in recommendations:
    print(f"Recommendation Score: {score:.3f}, Movie: {movie}")

Text clustering

Automatically group similar texts by analyzing the distances between their vectors.

# scikit-learn is required: pip install scikit-learn
import dashscope
import numpy as np
from sklearn.cluster import KMeans

# If you use a model in the China (Beijing) Region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def cluster_texts(texts, n_clusters=2):
    ""Cluster a set of texts.""
    # 1. Get the embeddings for all texts.
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=texts,
        dimension=1024
    )
    embeddings = np.array([item['embedding'] for item in resp.output['embeddings']])

    # 2. Use the KMeans algorithm for clustering.
    kmeans = KMeans(n_clusters=n_clusters, random_state=0, n_init='auto').fit(embeddings)

    # 3. Organize and return the results.
    clusters = {i: [] for i in range(n_clusters)}
    for i, label in enumerate(kmeans.labels_):
        clusters[label].append(texts[i])
    return clusters


# Example usage
documents_to_cluster = [
    "Mobile phone company A releases a new phone",
    "Search engine company B launches a new system",
    "World Cup final: Argentina vs. France",
    "China wins another gold medal at the Olympics",
    "A company releases its latest AI chip",
    "European Cup match report"
]
clusters = cluster_texts(documents_to_cluster, n_clusters=2)
for cluster_id, docs in clusters.items():
    print(f"--- Cluster {cluster_id} ---")
    for doc in docs:
        print(f"- {doc}")

Text classification

Perform zero-shot text classification by computing the vector similarity between an input text and predefined labels. This method classifies text without requiring pre-labeled examples.

import dashscope
import numpy as np

# If you use a model in the China (Beijing) Region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def cosine_similarity(a, b):
    ""Calculate cosine similarity.""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


def classify_text_zero_shot(text, labels):
    ""Perform zero-shot text classification.""
    # 1. Get the embeddings for the input text and all labels.
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=[text] + labels,
        dimension=1024
    )
    embeddings = resp.output['embeddings']
    text_embedding = embeddings[0]['embedding']
    label_embeddings = [emb['embedding'] for emb in embeddings[1:]]

    # 2. Calculate the similarity with each label.
    scores = [cosine_similarity(text_embedding, label_emb) for label_emb in label_embeddings]

    # 3. Return the label with the highest similarity.
    best_match_index = np.argmax(scores)
    return labels[best_match_index], scores[best_match_index]


# Example usage
text_to_classify = "The fabric of this dress is comfortable and the style is nice"
possible_labels = ["Digital Products", "Apparel & Accessories", "Food & Beverage", "Home & Living"]

label, score = classify_text_zero_shot(text_to_classify, possible_labels)
print(f"Input text: '{text_to_classify}'")
print(f"Best matching category: '{label}' (Similarity: {score:.3f})")

Anomaly detection

Identify anomalous data by computing its vector similarity to the center vector of normal samples. A low similarity score indicates a potential anomaly.

The threshold in the example code is for demonstration purposes only. In practice, the ideal threshold depends on your data's content and distribution, so there is no fixed value. Calibrate this value on your own dataset.
import dashscope
import numpy as np


def cosine_similarity(a, b):
    ""Calculate cosine similarity.""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


def detect_anomaly(new_comment, normal_comments, threshold=0.6):
    ""Detect an anomaly.""
    # 1. Generate embeddings for all normal comments and the new comment.
    all_texts = normal_comments + [new_comment]
    resp = dashscope.TextEmbedding.call(
        model="text-embedding-v4",
        input=all_texts,
        dimension=1024
    )
    embeddings = [item['embedding'] for item in resp.output['embeddings']]

    # 2. Calculate the center vector (average) of the normal comments.
    normal_embeddings = np.array(embeddings[:-1])
    normal_center_vector = np.mean(normal_embeddings, axis=0)

    # 3. Calculate the similarity between the new comment embedding and the center vector.
    new_comment_embedding = np.array(embeddings[-1])
    similarity = cosine_similarity(new_comment_embedding, normal_center_vector)

    # 4. Determine if it is an anomaly.
    is_anomaly = similarity < threshold
    return is_anomaly, similarity


# Example usage
normal_user_comments = [
    "Today's meeting was productive",
    "The project is progressing smoothly",
    "The new version will be released next week",
    "User feedback is positive"
]

test_comments = {
    "Normal comment": "The feature works as expected",
    "Anomaly - meaningless garbled text": "asdfghjkl zxcvbnm"
}

print("--- Anomaly Detection Example ---")
for desc, comment in test_comments.items():
    is_anomaly, score = detect_anomaly(comment, normal_user_comments)
    result = "Yes" if is_anomaly else "No"
    print(f"Comment: '{comment}'")
    print(f"Is anomaly: {result} (Similarity to normal samples: {score:.3f})\n")

API reference

Error codes

If the model call fails and returns an error message, see Error messages for resolution.

Rate limiting

For the model's rate limits, see Rate limits.

Model performance (MTEB/CMTEB)

Evaluation benchmarks

  • MTEB: Massive Text Embedding Benchmark, a comprehensive benchmark that evaluates general performance on tasks such as classification, clustering, and retrieval.

  • CMTEB: Chinese Massive Text Embedding Benchmark, a benchmark that evaluates performance specifically on Chinese text.

  • Scores range from 0 to 100. Higher values indicate better performance.

Model

MTEB

MTEB (Retrieval task)

CMTEB

CMTEB (Retrieval task)

text-embedding-v3 (512 dimensions)

62.11

54.30

66.81

71.88

text-embedding-v3 (768 dimensions)

62.43

54.74

67.90

72.29

text-embedding-v3 (1024 dimensions)

63.39

55.41

68.92

73.23

text-embedding-v4 (512 dimensions)

64.73

56.34

68.79

73.33

text-embedding-v4 (1024 dimensions)

68.36

59.30

70.14

73.98

text-embedding-v4 (2048 dimensions)

71.58

61.97

71.99

75.01