AI_SIMILARITY - MaxCompute - Alibaba Cloud Documentation Center

AI_SIMILARITY is a MaxCompute AI function that calls a model to determine the textual and semantic similarity between two texts, returning a float value between 0 and 1 where a higher value indicates greater similarity.

Syntax

FLOAT AI_SIMILARITY(
  STRING <model_name>,
  STRING <version_name>,
  STRING <input1>,
  STRING <input2>
  [, STRING <model_parameters>]
);

Parameters

model_name: Required. A string that specifies the name of the model to use. For more information, see SQL AI functions.
version_name: Required. A string that specifies the name of the model version to use. To call the default version, you can use DEFAULT_VERSION.
input1: Required. A string representing the first text for the semantic similarity calculation.
input2: Required. A string representing the second text for the semantic similarity calculation.
model_parameters: Optional. A JSON-formatted string that specifies model parameters, such as max_tokens, temperature, and top_p.For example: '{"max_tokens": 500, "temperature": 0.6, "top_p": 0.95}'. The following parameters are supported:
- max_tokens: Sets the maximum number of tokens to generate in a single model call. For MaxCompute public models, the default value is 4096.
- temperature: A value between 0 and 1 that controls the randomness of the model's output. A higher value results in more creative and diverse output, while a lower value makes the output more deterministic and conservative.
- top_p: A value between 0 and 1 that limits the range of candidate tokens the model considers. A higher value results in a wider range and more diversity, while a lower value results in a narrower range and more focused output.

Return value

Returns a FLOAT value between 0 and 1, representing the similarity score of the two input texts. The return rules are as follows:

A score of 1.0 indicates that the two texts are identical, while a score of 0.0 indicates that they are completely unrelated.
If input1 or input2 is NULL, the function returns NULL.
If input1 or input2 is not a STRING, the function returns an error.

Examples

Example 1: Compare text similarity

Calls the public model Qwen3-4B-GGUF provided by MaxCompute to calculate the semantic similarity between two pieces of text about MaxCompute.

SET odps.sql.ai.treat.as.common.model=true;
SET odps.namespace.schema=true;

SELECT AI_SIMILARITY(
    bigdata_public_modelset.default.`Qwen3-4B-GGUF`,
    DEFAULT_VERSION,
    'MaxCompute is a big data computing platform.',
    'MaxCompute provides large-scale data processing capabilities.'
) AS similarity_score;

-- Result:
+------------------+
| similarity_score |
+------------------+
| 0.85             |
+------------------+

Example 2: Compare and sort text pairs

Calls the public model Qwen3-4B-GGUF provided by MaxCompute to compare multiple text pairs and sort them in descending order of similarity. This method is suitable for identifying the most semantically relevant text pairs from a dataset.

-- Sample data
CREATE TABLE text_pairs (
    text1 STRING,
    text2 STRING
);

INSERT INTO text_pairs VALUES
    ('Cloud computing enables scalable infrastructure.', 'Businesses can scale their IT resources using cloud services.'),
    ('The weather is sunny today.', 'Machine learning algorithms require large datasets.'),
    ('Data warehousing stores historical data for analysis.', 'A data warehouse provides analytical processing of large data volumes.'),
    ('I enjoy reading books.', 'Reading is my favorite hobby.');

-- Compare multiple text pairs for semantic similarity
SET odps.sql.ai.treat.as.common.model=true;
SET odps.namespace.schema=true;

SELECT
    text1,
    text2,
    AI_SIMILARITY(
        bigdata_public_modelset.default.`Qwen3-4B-GGUF`,
        DEFAULT_VERSION,
        text1,
        text2
    ) AS similarity_score
FROM text_pairs
ORDER BY similarity_score DESC;

-- Result
+------------------------------------------------------+-----------------------------------------------------------------------+------------------+
| text1                                                | text2                                                                 | similarity_score |
+------------------------------------------------------+-----------------------------------------------------------------------+------------------+
| Cloud computing enables scalable infrastructure.     | Businesses can scale their IT resources using cloud services.         | 0.92             |
| Data warehousing stores historical data for analysis.| A data warehouse provides analytical processing of large data volumes.| 0.88             |
| I enjoy reading books.                               | Reading is my favorite hobby.                                         | 0.82             |
| The weather is sunny today.                          | Machine learning algorithms require large datasets.                   | 0.05             |
+------------------------------------------------------+-----------------------------------------------------------------------+------------------+