Model DDLs - Realtime Compute for Apache Flink - Alibaba Cloud Documentation Center

This document describes the data definition language (DDL) statements for registering, querying, modifying, and deleting AI models in Flink SQL.

Usage notes

Supports AI model services from Alibaba Cloud Model Studio, Platform for AI (PAI), and other providers with OpenAI-compatible APIs.
The inference service deployed on Platform for AI (PAI) must be in the same region as your Realtime Compute for Apache Flink service.
Requires Ververica Runtime (VVR) version 11.1 or later.

CREATE MODEL

Register a model

In the Data Query editor, run the following command.

CREATE [TEMPORARY] MODEL [catalog_name.][db_name.]model_name
INPUT ( { <physical_column_definition> [, ...n] )
OUTPUT ( { <physical_column_definition> [, ...n] )
WITH (key1=val1, key2=val2, ...)

<physical_column_definition>:
  column_name column_type [COMMENT column_comment]

Clause	Description	Key parameters	Schema constraints	Example
INPUT	Defines the column names, data types, and order of the model's input data.	`column_name` `column_type` `COMMENT`	Requires exactly one column of type STRING.	INPUT (`input_text` STRING COMMENT 'User comment')
OUTPUT	Defines the column names, data types, and order of the model's output data.	`column_name` `column_type` `COMMENT`	Constraints vary by model task type: chat/completions: Requires exactly one output column. The column type must be STRING when invoked via ML_PREDICT, or VARIANT when invoked via vertical AI functions (AI_CLASSIFY, AI_SUMMARIZE, etc.). embeddings: Requires exactly one column of type Array<Float>.	OUTPUT (`sentiment_label` STRING COMMENT 'Sentiment label')
WITH	See WITH parameters.	`provider` `endpoint` `api-key` `model`	None.	`WITH ('provider'='openai-compat', 'endpoint'='${ENDPOINT}', 'model'='qwen-turbo', 'api-key'='${KEY}')`

Examples

Alibaba Cloud Model Studio

CREATE MODEL model_bailian
INPUT (`input` STRING)
OUTPUT (`content` STRING)
WITH (
  'provider'='openai-compat',
  'endpoint'='<Endpoint>',
  'api-key'='<bailian-key>',
  'model'='qwen3-235b-a22b'
);

The endpoint format for Alibaba Cloud Model Studio is <base-url>/compatible-mode/v1/<task>. For example, https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions.

base-url
Internet access: https://dashscope-intl.aliyuncs.com. To use internet access, you must enable it for your Flink workspace. For more information, see Select a network connection type.
Internal network access: See Access Alibaba Cloud Model Studio APIs over an internal network.
Internal network access is supported from the same region and across regions. To access the service from a different region, for example, a Flink instance in Shanghai accessing a Model Studio service in Beijing, you must configure Cloud Enterprise Network (CEN). For more information, see Cross-region private access to Alibaba Cloud Model Studio APIs.
Important
Data processing for Alibaba Cloud Model Studio occurs in the selected Model Studio region. If you have data residency requirements, select an appropriate region based on your compliance needs. For more information, see Service deployment scope comparison.
task: The model task type. The following values are supported:
- chat/completions
- embeddings

Platform for AI (PAI)

CREATE MODEL model_pai
INPUT (`input` STRING)
OUTPUT (`embedding` ARRAY<FLOAT>) 
WITH (
  'provider'='openai-compat',
  'endpoint'='<VPC_endpoint>',
  'api-key'='<Token>',
  'model'='qwen3-235b-a22b'
);

To get the endpoint and API key, you must first deploy a model service. For more information, see Deploy DeepSeek-V3 and DeepSeek-R1 models with one click and Quick start for Elastic Algorithm Service (EAS).

Model Gallery

Log on to the Platform for AI (PAI) console.
In the left navigation pane, go to Model Gallery > Job Management > Deployment Jobs and click the name of the target service.
Click View Invocation Information.
- The VPC endpoint uses HTTP. You must change it to HTTPS and append /v1/<task> to the URL. The task can be one of the following values:
  - chat/completions
  - embeddings
  For example: https://************.vpc.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/quickstart_deploy_20250722_7b22/v1/chat/completions.
- Use the value from the Token field for the api-key parameter.

Elastic Algorithm Service (EAS)

In the left navigation pane, go to Elastic Algorithm Service (EAS) > Inference Services, and click the name of the target service to go to the Overview page.
In the Basic Information section, click View Invocation Information.
On the Invocation Information panel, copy the endpoint and token:
- The VPC endpoint uses HTTP. You must change it to HTTPS and append /v1/<task> to the URL. The task can be one of the following values:
  - chat/completions
  - embeddings
  For example: https://************.vpc.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/quickstart_deploy_20250722_7b22/v1/chat/completions.
- Use the value from the Token field for the api-key parameter.

For more information, see Invoke a service by using a public or internal endpoint through a gateway.

WITH parameters

General

Parameter	Description	Type	Required	Default	Notes
provider	The model service type.	string	Yes	None	VVR 11.1–11.2: The value must be `bailian`. VVR 11.3–11.6: Valid values are `openai-compat` and `bailian`. We recommend using `openai-compat`. Note For PAI or other OpenAI-compatible model services, the value must be `openai-compat`.
endpoint	The endpoint for the OpenAI-compatible model service, such as an embeddings or chat/completions service.	string	Yes	None	For Alibaba Cloud Model Studio or PAI endpoints, see Examples. For other OpenAI-compatible model services, refer to the API documentation of the service.
api-key	The API key for the model service.	string	Yes	None	For more information, see Obtain an API key. Previous key name: `apiKey` (in VVR 11.1).
max-context-size	The maximum context size for a single request, in tokens.	integer	No	None	If the maximum capacity is exceeded, the action defined in `context-overflow-action` is triggered. Note Supported in VVR 11.2 and later.
context-overflow-action	The action to take when a request's context exceeds the token limit.	string	No	`truncated-tail`	Valid values: `truncated-tail`: Automatically truncates tokens when the capacity is exceeded, retaining the most recent `max-context-size` tokens. No logs are recorded. `truncated-tail-log`: Automatically truncates tokens from the tail that exceed the capacity, retaining the most recent `max-context-size` tokens. The truncation is logged. `truncated-head`: Truncates the earliest tokens from the head, retaining the most recent `max-context-size` tokens. `truncated-head-log`: Trims the earliest tokens from the head, keeping the latest `max-context-size` tokens. Logs the truncation. `skipped`: Discards the data record directly. No logs are recorded. `skipped-log`: Discards the data and records a log. Note Supported in VVR 11.2 and later.
max-context-size	The maximum context length (number of records).	integer	No	None	Note Supported in VVR 11.2 and later.
context-overflow-action	The action to take when the context length exceeds the record limit.	string	No	truncated-tail	Valid values: `truncated-tail`: Automatically truncates data from the tail end that exceeds the capacity, retaining the most recent `max-context-size` items. `truncated-head`: Truncates the oldest data from the beginning, keeping the most recent `max-context-size` data items. `skipped`: Directly discards new data that exceeds the capacity and does not update the context. `truncated-tail-log`: Builds on `truncated-tail` by logging the action of truncating the context. `truncated-head-log`: Builds on `truncated-head` and logs the action of truncating the context. `skipped-log`: Performs the same action as `skipped` and logs when the context is truncated. Note Supported in VVR 11.2 and later.
error-handling-strategy	The strategy for handling model request errors.	string	No	retry	Valid values: retry: Resend the request. failover: Throw an exception. ignore: Ignore the exception and skip the data record. Note Supported in VVR 11.4 and later.
retry-num	The number of retry attempts.	integer	No	100	Takes effect only when `error-handling-strategy = retry`. Note Supported in VVR 11.4 and later.
retry-fallback-strategy	The fallback strategy to use after the maximum number of retries is reached.	string	No	failover	Valid values are `failover` and `ignore`. This setting takes effect only when `error-handling-strategy` is set to a value other than `retry`. Note Supported in VVR 11.4 and later.
retry-backoff-strategy	The retry backoff strategy.	string	No	fixed	Valid values: fixed: Fixed interval. exponential: Exponential interval. Note Supported in VVR 11.4 and later.
retry-backoff-base-interval	The base time interval for the retry backoff strategy.	duration	No	1 s	Note Supported in VVR 11.4 and later.

chat/completions

The following parameters are specific to chat/completions model tasks:

Parameter	Description	Type	Required	Default	Notes
model	The model to invoke.	string	Yes	None	Supports text generation models. Note You are charged based on the selected model and the number of tokens in the input and output.
system-prompt	The system prompt for the request.	string	Yes	"You are a helpful assistant."	Previous key name: `systemPrompt` (in VVR 11.1). Note In VVR 11.6 and later, you can set this parameter to an empty value.
temperature	Controls the smoothness of the probability distribution for each candidate token.	float	No	None	Valid range: [0, 2). A value of 0 is not recommended. A higher temperature makes the output more random, while a lower value makes it more deterministic.
top-p	The probability threshold for nucleus sampling.	float	No	None	A higher value increases randomness, while a lower value increases determinism. Previous key name: `topP` (in VVR 11.1).
stop	A stop sequence.	string	No	None	The model stops generating tokens when this sequence is produced. The sequence is not included in the final output.
max-tokens	The maximum number of tokens that the model can generate.	integer	No	None	Previous key name: `maxTokens` (in VVR 11.1).
content-type	The type of input data.	string	No	text	Valid values: `text` and `image_url`. Note Supported in VVR 11.6 and later.
presence-penalty	Controls token repetition.	double	No	None	Valid range: -2.0 to 2.0. Positive values penalize tokens that have already appeared in the text, making the model more likely to discuss new topics. Note Supported in VVR 11.3 and later.
n	The number of output choices to generate for each input message.	integer	No	None	Note Supported in VVR 11.3 and later.
seed	A random number seed for the model's response.	long	No	None	If specified, the model provider attempts deterministic sampling, so repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Note Supported in VVR 11.3 and later.
response-format	The format of the return value.	string	No	text	Valid values: text json_object Note Supported in VVR 11.3 and later.
extra-header	Extra HTTP headers for the request.	string	No	None	Must be a JSON-formatted string. The values in the JSON key-value pairs must be strings or lists of strings. Note Supported in VVR 11.3 and later.
extra-body	Extra HTTP body for the request.	string	No	None	Must be a JSON-formatted string. Note Supported in VVR 11.3 and later.

embeddings

The following parameters are specific to embeddings model tasks:

Parameter	Description	Type	Required	Default	Notes
model	The model to invoke.	string	Yes	None	Supports text embedding models. Note You are charged based on the selected model and the number of tokens in the input and output.
dimension	The dimension of the output vectors.	integer	No	None	The supported dimensions depend on the specific model. Common values include 1024, 768, and 512.

Querying models

In the Data Query editor, run one of the following commands.

List the names of registered models:

SHOW MODELS [ ( FROM | IN ) [catalog_name.]database_name ];

Show the statement used to create a model:

SHOW CREATE MODEL [catalog_name.][db_name.]model_name;

Show the input and output schema of a model:

DESCRIBE MODEL [catalog_name.][db_name.]model_name;

Example

SHOW MODELS;

-- RESULT
--+------------+
--| model name |
--+------------+
--|          m |
--+------------+

DESCRIBE MODEL m;

-- RESULT
-- +---------+--------+------+----------+
-- |    name |   type | null | is input |
-- +---------+--------+------+----------+
-- | content | STRING | TRUE |     TRUE |
-- |   label | BIGINT | TRUE |    FALSE |
-- +---------+--------+------+----------+

Modifying models

In the Data Query editor, run the following command.

ALTER MODEL [IF EXISTS] [catalog_name.][db_name.]model_name {
  RENAME TO new_table_name
  SET (key1=val1, ...)
  RESET (key1, ...)
}

Examples

Rename a registered model:

ALTER MODEL m RENAME TO m1; -- Renames the model to m1.

Modify a model parameter:

ALTER MODEL m SET ('endpoint' = '<Your_Endpoint>'); -- Adjusts the endpoint path.

Reset a model parameter to its default value:

ALTER MODEL m RESET ('endpoint'); -- Resets the endpoint path.

Deleting models

In the Data Query editor, run the following command.

DROP [TEMPORARY] MODEL [IF EXISTS] [catalog_name.][db_name.]model_name

Example

DROP MODEL m;