AI_SIMILARITY - ApsaraDB for SelectDB - Alibaba Cloud Documentation Center

Measures the semantic similarity between two text strings using a Large Language Model (LLM) and returns a score between 0 and 10.

Syntax

AI_AI_SIMILARITY([<resource_name>], <text_1>, <text_2>)

Parameters

Parameter	Required	Description
`<resource_name>`	No	The specified resource name.
`<text_1>`	Yes	A text string.
`<text_2>`	Yes	Text.

Return value

Returns a floating-point number between 0 and 10. A score of 0 indicates no similarity; a score of 10 indicates strong similarity. Higher values indicate greater similarity.
The score is relative and is best suited for ranking results rather than applying as an absolute threshold.
Returns NULL if any input is NULL.
Because the result is generated by an LLM, the output may vary between calls.

Examples

Rank comments by sentiment similarity

The following example creates a table of customer comments for a delivery service, then queries the five comments most semantically similar to a reference phrase.

CREATE TABLE user_comments (
    id      INT,
    comment VARCHAR(500)
) DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 10
PROPERTIES (
    "replication_num" = "1"
);

SELECT comment,
    AI_SIMILARITY('resource_name', 'I am extremely dissatisfied with their service.', comment) AS score
FROM user_comments ORDER BY score DESC LIMIT 5;

The output is similar to:

+-------------------------------------------------+-------+
| comment                                         | score |
+-------------------------------------------------+-------+
| It arrived broken and I am really disappointed. |   7.5 |
| Delivery was very slow and frustrating.         |   6.5 |
| Not bad, but the packaging could be better.     |   3.5 |
| It is fine, nothing special to mention.         |     3 |
| Absolutely fantastic, highly recommend it.      |     1 |
+-------------------------------------------------+-------+

Usage notes

The similarity score reflects relative ranking, not an absolute measure of semantic equivalence. Avoid using a fixed score threshold to filter rows; instead, rely on ORDER BY ... DESC LIMIT n to retrieve the most similar results.
Because the LLM output is non-deterministic, repeated calls with the same inputs may produce slightly different scores. Use the function for exploratory ranking rather than strict deterministic filtering.