Measures the semantic similarity between two text strings using a Large Language Model (LLM) and returns a score between 0 and 10.
Syntax
AI_AI_SIMILARITY([<resource_name>], <text_1>, <text_2>)Parameters
| Parameter | Required | Description |
|---|---|---|
<resource_name> | No | The specified resource name. |
<text_1> | Yes | A text string. |
<text_2> | Yes | Text. |
Return value
Returns a floating-point number between 0 and 10. A score of 0 indicates no similarity; a score of 10 indicates strong similarity. Higher values indicate greater similarity.
The score is relative and is best suited for ranking results rather than applying as an absolute threshold.
Returns NULL if any input is NULL.
Because the result is generated by an LLM, the output may vary between calls.
Examples
Rank comments by sentiment similarity
The following example creates a table of customer comments for a delivery service, then queries the five comments most semantically similar to a reference phrase.
CREATE TABLE user_comments (
id INT,
comment VARCHAR(500)
) DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 10
PROPERTIES (
"replication_num" = "1"
);SELECT comment,
AI_SIMILARITY('resource_name', 'I am extremely dissatisfied with their service.', comment) AS score
FROM user_comments ORDER BY score DESC LIMIT 5;The output is similar to:
+-------------------------------------------------+-------+
| comment | score |
+-------------------------------------------------+-------+
| It arrived broken and I am really disappointed. | 7.5 |
| Delivery was very slow and frustrating. | 6.5 |
| Not bad, but the packaging could be better. | 3.5 |
| It is fine, nothing special to mention. | 3 |
| Absolutely fantastic, highly recommend it. | 1 |
+-------------------------------------------------+-------+Usage notes
The similarity score reflects relative ranking, not an absolute measure of semantic equivalence. Avoid using a fixed score threshold to filter rows; instead, rely on
ORDER BY ... DESC LIMIT nto retrieve the most similar results.Because the LLM output is non-deterministic, repeated calls with the same inputs may produce slightly different scores. Use the function for exploratory ranking rather than strict deterministic filtering.