This topic describes how to use AI_EMBED to generate vector representations of texts.
Limitations
This feature requires Ververica Runtime (VVR) 11.4 or later.
The throughput of
AI_EMBEDoperators is subject to the rate limits of Alibaba Cloud Model Studio. When the rate limits for a model are reached, the Flink job will be backpressured withAI_EMBEDoperators as the bottleneck. In some cases, timeout errors and job restarts may be triggered.
Syntax
AI_EMBED(
MODEL => MODEL <MODEL NAME>,
INPUT => <INPUT COLUMN NAME>
)Input
Parameter | Data type | Description |
MODEL <MODEL NAME> | MODEL | The registered model's name. Note: The output type for the model must be |
<INPUT COLUMN NAME> | STRING | The source text for the model to analyze. |
Result
Parameter | Data type | Description |
embedding | ARRAY<FLOAT> | The generated 1024-dimensional vector. |
Example
Test data
id | content |
1 | Flink |
Test statement
This SQL example uses the text-embedding-v3 model and AI_EMBED to generate vectors.
CREATE TEMPORARY MODEL embedding_model
INPUT (`input` STRING)
OUTPUT (`embedding` ARRAY<FLOAT>)
WITH (
'provider' = 'openai-compat',
'endpoint'='<YOUR ENDPOINT>',
'apiKey' = '<YOUR KEY>',
'model' = 'text-embedding-v3',
'dimension' = '1024'
);
CREATE TEMPORARY VIEW infos(id, content)
AS VALUES (1, 'Flink');
-- Use positional argument to call AI_EMBED
SELECT id, embedding
FROM infos,
LATERAL TABLE(
AI_EMBED(
MODEL embedding_model,
content
));
-- Use named argument to call AI_EMBED
SELECT id, embedding
FROM infos,
LATERAL TABLE(
AI_EMBED(
MODEL => MODEL embedding_model,
INPUT => content
)); Outputs
id | embedding |
1 | [-0.13219477, 0.054332353, -0.033010617, -0.0039787884, ...] |