This topic describes how to use AI_SUMMARIZE to generate text summaries with a large AI model.
Limitations
This function requires Ververica Runtime (VVR) 11.4+.
The throughput of
AI_SUMMARIZEoperators is subject to the rate limits of Alibaba Cloud Model Studio. When the rate limits for a model are reached, the Flink job will be backpressured withAI_SUMMARIZEoperators as the bottleneck. In some cases, timeout errors and job restarts may be triggered.
Syntax
AI_SUMMARIZE(
MODEL => MODEL <MODEL NAME>,
INPUT => <INPUT COLUMN NAME>,
MAX_LENGTH => <MAX LENGTH>
)Input parameters
Parameter | Data type | Description |
MODEL <MODEL NAME> | MODEL | The name of the registered model. For more information, see Model settings to register a model service. Note: The output type of this model must be VARIANT. |
<INPUT COLUMN NAME> | STRING | The data for the model to analyze. |
<MAX LENGTH> | INTEGER | The maximum length of the model output. Note: This input parameter must be a constant. |
Outputs
Parameter | Data type | Description |
summary | STRING | The summarized content. |
Examples
Test data
id | description |
1 | What is Apache Flink? Apache Flink is an open-source, distributed compute engine for processing real-time data streams and batch data. It excels at stateful computations, handling continuous data like website clicks, IoT sensor data, and stock trades with low latency, high throughput, and exactly-once semantics. Flink seamlessly supports both stream and batch processing. |
Test statements
The following sample SQL commands use the Qwen-Plus model and the AI_SUMMARIZE function to summarize the input data.
CREATE TEMPORARY MODEL general_model
INPUT (`input` STRING)
OUTPUT (`content` VARIANT)
WITH (
'provider' = 'openai-compat',
'endpoint'='<YOUR ENDPOINT>',
'apiKey' = '<YOUR KEY>',
'model' = 'qwen-plus'
);
CREATE TEMPORARY VIEW infos(id, description)
AS VALUES (1, '
What is Flink?
Apache Flink is an open-source, distributed compute engine for processing real-time data streams and batch data. It excels at stateful computations, handling continuous data like website clicks, IoT sensor data, and stock trades with low latency, high throughput, and exactly-once semantics. Flink seamlessly supports both stream and batch processing.
');
-- Use positional argument to call AI_SUMMARIZE
SELECT id, summary
FROM infos, LATERAL TABLE(
AI_SUMMARIZE(MODEL general_model, description, 10));
-- Use named argument to call AI_SUMMARIZE
SELECT id, summary
FROM infos, LATERAL TABLE(
AI_SUMMARIZE(
MODEL => MODEL general_model,
INPUT => description,
MAX_LENGTH => 10));Output result
id | summary |
1 | Apache Flink is a real-time stream processing engine. |