AI_SUMMARIZE is a table-valued function that calls a large language model (LLM) to generate a text summary of an input string.
Limitations
Requires Ververica Runtime (VVR) 11.4 or later.
Throughput is bounded by the rate limits of the underlying model platform. When those limits are reached, the Flink job experiences backpressure and
AI_SUMMARIZEbecomes the bottleneck. In severe cases, this triggers timeout errors and job restarts.
Syntax
AI_SUMMARIZE(
MODEL => MODEL <model_name>,
INPUT => <input_column>,
MAX_LENGTH => <max_length>
)Supports both positional and named argument styles.
Parameters
| Parameter | Data type | Description |
|---|---|---|
MODEL <model_name> | MODEL | Name of the registered model service. The model must return output of type VARIANT. See Model Settings to register a model service. |
<input_column> | STRING | The column whose content is summarized by the model. |
<max_length> | INTEGER | Maximum length of the model output. Must be a constant value. |
Output
| Column | Data type | Description |
|---|---|---|
summary | STRING | The generated summary. |
Examples
Full example with table data
The following example creates a Qwen-Plus model, loads test data into a temporary view, and calls AI_SUMMARIZE on a table column using both positional and named argument styles.
Test data
| id | description |
|---|---|
| 1 | What is Flink? Apache Flink is an open source distributed stream processing framework for stateful computation over real-time data streams and batch data. In simple terms: Flink is a compute engine for processing real-time data. It handles continuous data streams such as website clicks, Internet of Things sensor data, and stock trades. It provides low latency, high throughput, and exactly-once semantics. It supports both stream processing and batch processing. |
SQL
CREATE TEMPORARY MODEL general_model
INPUT (`input` STRING)
OUTPUT (`content` VARIANT)
WITH (
'provider' = 'openai-compat',
'endpoint' = '<YOUR ENDPOINT>',
'apiKey' = '<YOUR KEY>',
'model' = 'qwen-plus'
);
CREATE TEMPORARY VIEW infos(id, description)
AS VALUES (1, '
What is Flink?
Apache Flink is an open source distributed stream processing framework for stateful computation over real-time data streams and batch data.
In simple terms:
Flink is a compute engine for processing real-time data.
It handles continuous data streams such as website clicks, Internet of Things sensor data, and stock trades.
It provides low latency, high throughput, and exactly-once semantics.
It supports both stream processing and batch processing.
');
-- Positional argument style
SELECT id, summary
FROM infos, LATERAL TABLE(
AI_SUMMARIZE(MODEL general_model, description, 10));
-- Named argument style
SELECT id, summary
FROM infos, LATERAL TABLE(
AI_SUMMARIZE(
MODEL => MODEL general_model,
INPUT => description,
MAX_LENGTH => 10));Output
| id | summary |
|---|---|
| 1 | Real-time stream processing engine |