This topic describes how to use AI_EMBED to generate vector representations of texts.
Limitations
This feature requires Ververica Runtime (VVR) 11.4 or later.
The throughput of
AI_EMBEDoperators is subject to the rate limits of Alibaba Cloud Model Studio. When the rate limits for a model are reached, the Flink job will be backpressured withAI_EMBEDoperators as the bottleneck. In some cases, timeout errors and job restarts may be triggered.
Syntax
AI_EMBED(
MODEL => MODEL <MODEL NAME>,
INPUT => <INPUT COLUMN NAME>
)Input
Parameter | Data type | Description |
MODEL <MODEL NAME> | MODEL | The registered model's name. Note: The output type for the model must be |
<INPUT COLUMN NAME> | STRING | The source text for the model to analyze. |
<DIMENSION VALUE> | INTEGER | Optional. The dimension of the output vector. The default is 1024. If your model does not support 1024 dimensions, specify a supported value instead. The DIMENSION value in AI_EMBED overrides the one set in the CREATE MODEL statement. The AI_EMBED function always passes the DIMENSION parameter. If your model does not support this parameter, use ML_PREDICT instead. |
Result
Parameter | Data type | Description |
embedding | ARRAY<FLOAT> | The generated vector. Its dimension is determined by the DIMENSION parameter. Defaults to 1024 if the parameter is omitted. |
Example
Test data
id | content |
1 | Flink |
Test statement
This SQL example uses the text-embedding-v3 model and AI_EMBED to generate vectors.
CREATE TEMPORARY MODEL embedding_model
INPUT (`input` STRING)
OUTPUT (`embedding` ARRAY<FLOAT>)
WITH (
'provider' = 'openai-compat',
'endpoint'='<YOUR ENDPOINT>',
'apiKey' = '<YOUR KEY>',
'model' = 'text-embedding-v3',
'dimension' = '1024'
);
CREATE TEMPORARY VIEW infos(id, content)
AS VALUES (1, 'Flink');
-- Use positional argument to call AI_EMBED
SELECT id, embedding
FROM infos,
LATERAL TABLE(
AI_EMBED(
MODEL embedding_model,
content
));
-- Use named argument to call AI_EMBED
SELECT id, embedding
FROM infos,
LATERAL TABLE(
AI_EMBED(
MODEL => MODEL embedding_model,
INPUT => content
)); Outputs
id | embedding |
1 | [-0.13219477, 0.054332353, -0.033010617, -0.0039787884, ...] |