All Products
Search
Document Center

ApsaraDB for SelectDB:AI_SIMILARITY

Last Updated:Mar 28, 2026

Measures the semantic similarity between two text strings using a Large Language Model (LLM) and returns a score between 0 and 10.

Syntax

AI_AI_SIMILARITY([<resource_name>], <text_1>, <text_2>)

Parameters

ParameterRequiredDescription
<resource_name>NoThe specified resource name.
<text_1>YesA text string.
<text_2>YesText.

Return value

  • Returns a floating-point number between 0 and 10. A score of 0 indicates no similarity; a score of 10 indicates strong similarity. Higher values indicate greater similarity.

  • The score is relative and is best suited for ranking results rather than applying as an absolute threshold.

  • Returns NULL if any input is NULL.

  • Because the result is generated by an LLM, the output may vary between calls.

Examples

Rank comments by sentiment similarity

The following example creates a table of customer comments for a delivery service, then queries the five comments most semantically similar to a reference phrase.

CREATE TABLE user_comments (
    id      INT,
    comment VARCHAR(500)
) DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 10
PROPERTIES (
    "replication_num" = "1"
);
SELECT comment,
    AI_SIMILARITY('resource_name', 'I am extremely dissatisfied with their service.', comment) AS score
FROM user_comments ORDER BY score DESC LIMIT 5;

The output is similar to:

+-------------------------------------------------+-------+
| comment                                         | score |
+-------------------------------------------------+-------+
| It arrived broken and I am really disappointed. |   7.5 |
| Delivery was very slow and frustrating.         |   6.5 |
| Not bad, but the packaging could be better.     |   3.5 |
| It is fine, nothing special to mention.         |     3 |
| Absolutely fantastic, highly recommend it.      |     1 |
+-------------------------------------------------+-------+

Usage notes

  • The similarity score reflects relative ranking, not an absolute measure of semantic equivalence. Avoid using a fixed score threshold to filter rows; instead, rely on ORDER BY ... DESC LIMIT n to retrieve the most similar results.

  • Because the LLM output is non-deterministic, repeated calls with the same inputs may produce slightly different scores. Use the function for exploratory ranking rather than strict deterministic filtering.