AI Function - AnalyticDB - Alibaba Cloud Documentation Center

AI functions let you call AI services directly from SQL — no external API calls or data pipelines required. Use them to analyze, transform, and extract information from text and audio data stored in AnalyticDB for MySQL.

Important

AI models are probabilistic. The same input may produce different outputs across calls.

Available functions:

Text functions

Function	What it does
`ai_filter`	Checks whether text is factual or compliant — returns `true` or `false`
`ai_translate`	Translates text into a target language
`ai_classify`	Categorizes text into predefined labels
`ai_extract`	Extracts structured fields (names, dates, addresses, keywords) from unstructured text
`ai_generate`	Generates a text response from a prompt
`ai_sentiment`	Returns the sentiment of text: `positive`, `negative`, `neutral`, or `mixed`
`ai_similarity`	Scores the semantic similarity between two text segments (0–10)
`ai_mask`	Masks sensitive information such as phone numbers, email addresses, and ID numbers
`ai_summarize`	Compresses long text into a concise summary

Audio functions

Function	What it does
`ai_audio_transcribe`	Transcribes an audio file into text, with options for language and segmentation mode
`ai_audio_embed`	Converts an audio file into an embedding vector

Prerequisites

Before you begin, make sure that:

Your cluster is one of the following: Enterprise Edition, Basic Edition, Data Lakehouse Edition, or Data Warehouse Edition (Elastic Mode)
Elastic Network Interface (ENI) access is enabled on your cluster
This feature is in public preview. Submit a ticket to activate it before use.

To enable ENI access, log in to the AnalyticDB for MySQL console. In the left navigation pane, choose Cluster Management > Cluster Information > Network Information. In the Network Information section, turn on the ENI access switch.

Important

Enabling or disabling ENI access interrupts database connections for approximately 2 minutes. During this period, the cluster cannot read or write data. Plan this change during a maintenance window.

Set up network connectivity and create a model

AI functions call Alibaba Cloud Model Studio to run inference. Before calling any AI function, configure a network connection from your cluster to Model Studio, then create a model.

Option 1: Public network connection

Create an Internet NAT gateway in the same region as your cluster. Associate an Elastic IP Address (EIP) with the gateway, then create a Source Network Address Translation (SNAT) entry at the vSwitch level and specify a vSwitch. For details, see Internet NAT Gateway.

Create a model. For the full syntax, see CREATE MODEL.

CREATE MODEL qwen_plus_external
OPTIONS (
    type='external',
    provider='bailian',
    name='qwen-plus',
    interface='TEXT_TO_TEXT',
    api_key='sk-xxx'  -- Replace with your Alibaba Cloud Model Studio API key.
)

Option 2: PrivateLink private connection

Important

Model Studio PrivateLink endpoints are available in the following regions: Singapore, China (Beijing). The endpoint must be in the same region as the Model Studio service. To connect from a virtual private cloud (VPC) in a different region, see Cross-region private access.

Configure a PrivateLink endpoint and get its private endpoint address. For details, see Access Model Studio models or application APIs over a private endpoint.

Create a model with the endpoint parameter set to your private endpoint address.

CREATE MODEL qwen_plus_external
OPTIONS (
    type='external',
    provider='bailian',
    name='qwen-plus',
    interface='TEXT_TO_TEXT',
    api_key='sk-xxx',       -- Replace with your Alibaba Cloud Model Studio API key.
    endpoint='ep-xxx.com'   -- Replace with the domain of your PrivateLink endpoint.
)

Text functions

All text functions accept an optional model_name as the first argument. If omitted, the function uses the default model for its interface type. See Configure default models.

ai_filter

Checks whether the input text is factual or compliant. Returns true if the content is credible or compliant, false otherwise.

ai_filter(text)
ai_filter(model_name, text)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text`	VARCHAR	Text to evaluate

Return type: BOOLEAN

Example:

SELECT ai_filter('Is Zhejiang in China?');
-- Returns: 1

SELECT ai_filter('qwen_plus_external', 'Is Zhejiang in China?');
-- Returns: 1

ai_translate

Translates input text into the specified language.

ai_translate(text, targetLang)
ai_translate(model_name, text, targetLang)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text`	VARCHAR	Text to translate
`targetLang`	VARCHAR	Target language code

Important

The model must support the target language.

Return type: VARCHAR

Example:

SELECT ai_translate(
  'AnalyticDB for MySQL is a data analytics platform based on a lakehouse architecture.',
  'cn'
) AS translate_text;

ai_classify

Categorizes text into one of the predefined labels.

ai_classify(text, labels)
ai_classify(model_name, text, labels)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text`	VARCHAR	Text to classify
`labels`	VARCHAR or Array\<VARCHAR\>	List of category labels

Return type: VARCHAR

Classify a single value (string-format labels):

SELECT ai_classify(
  'Last night, I tried a new pasta recipe in the kitchen. The sauce was rich, and my family loved it!',
  "['travel', 'cooking', 'reading', 'driving']"
) AS predicted_label;
-- Returns: cooking

Classify a single value (array-format labels):

SELECT ai_classify(
  'Last night, I tried a new pasta recipe in the kitchen. The sauce was rich, and my family loved it!',
  ARRAY['travel', 'cooking', 'reading', 'driving']
) AS predicted_label;
-- Returns: cooking

Classify a column in a table (batch usage):

SELECT id, comment,
  ai_classify('qwen_plus_external', comment, ARRAY['complaint', 'inquiry', 'compliment']) AS category
FROM customer_feedback
WHERE dt = '20260101';

ai_extract

Extracts structured fields from unstructured text.

ai_extract(text, labels)
ai_extract(model_name, text, labels)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text`	VARCHAR	Source text
`labels`	VARCHAR or Array\<VARCHAR\>	Field names to extract

Return type: VARCHAR

Extract fields from a single value:

SELECT ai_extract(
  'Yesterday (June 15, 2024), I bought an iPhone 15 Pro on Taobao. Its titanium body and A17 chip are truly amazing!',
  "['product_name', 'date', 'key_feature']"
) AS result;
-- Returns: product_name=iPhone 15 Pro, date=June 15, 2024, key_feature=titanium body and A17 chip

Extract fields from a table column (batch usage):

SELECT id, feedback,
  ai_extract('qwen_plus_external', feedback, "['product_name', 'issue', 'contact_info']") AS extracted
FROM customer_feedback
WHERE dt = '20260101';

ai_generate

Generates a text response from the input prompt.

ai_generate(text)
ai_generate(model_name, text)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text`	VARCHAR	Input prompt

Return type: VARCHAR

Example:

SELECT ai_generate('Introduce the TPC-H test set in one sentence.') AS result;
-- Returns: TPC-H is a standard benchmark test set used to evaluate the decision support capabilities of databases.

ai_sentiment

Analyzes the sentiment of the input text. Returns one of: positive, negative, neutral, or mixed.

ai_sentiment(text)
ai_sentiment(model_name, text)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text`	VARCHAR	Text to analyze

Return type: VARCHAR

Example:

SELECT ai_sentiment(
  'The sound quality of these headphones is excellent, and they are comfortable to wear. Highly recommended!'
) AS sentiment;
-- Returns: positive

ai_similarity

Calculates the semantic similarity between two text segments. Returns a score from 0 to 10 — higher means more similar.

ai_similarity(text1, text2)
ai_similarity(model_name, text1, text2)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text1`	VARCHAR	First text segment
`text2`	VARCHAR	Second text segment

Return type: VARCHAR

Example:

SELECT ai_similarity(
  'How do I reset my account password?',
  'I forgot my logon password. How can I retrieve it?'
) AS result;
-- Returns: 9.3

ai_mask

Masks sensitive information in the input text. Detected values are replaced with [MSKED].

ai_mask(text, labels)
ai_mask(model_name, text, labels)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text`	VARCHAR	Text containing sensitive information
`labels`	VARCHAR or Array\<VARCHAR\>	Types of sensitive information to mask (for example, `phone`, `email`, `id_card`)

Return type: VARCHAR

Example:

SELECT ai_mask(
  'Contact me: 1381234****, email is user@example.com, ID card number 110101199003072316',
  "['phone', 'email', 'id_card']"
) AS result;
-- Returns: Contact me: [MSKED], email is [MSKED], ID card number [MSKED]

ai_summarize

Compresses long text into a concise, accurate summary.

ai_summarize(text)
ai_summarize(model_name, text)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the model to use
`text`	VARCHAR	Long text to summarize

Return type: VARCHAR

Example:

SELECT ai_summarize(
  'On the winding lotus pond, what meets the eye are lush lotus leaves...'
) AS result;
-- Returns a concise summary of the input text.

Audio functions

ai_audio_transcribe and ai_audio_embed support only built-in models. You cannot specify a custom model_name from a CREATE MODEL statement for these functions.

ai_audio_transcribe

Transcribes an audio file into text. Returns transcribed text as JSON or a URL to a file saved in Object Storage Service (OSS).

ai_audio_transcribe(url)
ai_audio_transcribe(model_name, url)
ai_audio_transcribe(model_name, url, options)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the built-in model to use
`url`	VARCHAR	URL of the input audio file
`options`	JSON	Optional parameters (see table below)

options parameters

Parameter	Options	Default
`language`	`cn` (Chinese), `en` (English), `ja` (Japanese), `yue` (Cantonese), `fspk` (free speech in Chinese and English)	`cn`
`diarization_mode`	`word` (by word), `sentence` (by sentence), `speaker` (by speaker)	`word`
`output_type`	`json` (returns transcribed JSON text directly), `url` (saves to OSS and returns file URL)	`json`

Return type: VARCHAR

Example 1 — Transcribe with default settings:

SELECT ai_audio_transcribe(
  'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
) AS result;

Example 2 — Transcribe with sentence-level segmentation:

SELECT ai_audio_transcribe(
  'tingwu',
  'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav',
  "{'diarization_mode':'sentence'}"
) AS result;

ai_audio_embed

Converts an audio file into an embedding vector. Optionally specify start and end times to embed a segment of the audio.

ai_audio_embed(text)
ai_audio_embed(model_name, text)
ai_audio_embed(model_name, text, options)

Parameters

Parameter	Type	Description
`model_name`	VARCHAR	Name of the built-in model to use
`text`	VARCHAR	URL of the input audio file
`options`	JSON	Optional parameters (see table below)

options parameters

Parameter	Description	Notes
`access_id`	OSS AccessKey ID	Not required if you use `role_arn`
`access_secret`	OSS AccessKey secret	Not required if you use `role_arn`
`token`	OSS security token	Not required if you use `role_arn`
`role_arn`	Role assumption credential	Use instead of `access_id`/`access_secret`/`token`
`start_time`	Start time of the audio segment	Defaults to the beginning of the file
`end_time`	End time of the audio segment	Defaults to the end of the file

Return type: Array\<FLOAT\>

Example:

SELECT ai_audio_embed(
  'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
);
-- Returns an array of FLOAT values representing the embedding vector.

Configure default models

When you call an AI function without specifying a model_name, the function uses the default model for its interface type. Change the default using the configuration items below. For instructions on modifying configuration items, see Config and Hint Configuration Parameters.

Interface type	Default model	Configuration item
TEXT_TO_TEXT	qwen-plus	`AI_FUNCTION_TEXT_TO_TEXT_MODEL`
TEXT_TO_EMBEDDING	text-embedding-v2	`AI_FUNCTION_TEXT_TO_EMBEDDING_MODEL`
AUDIO_TO_TEXT	tingwu	`AI_FUNCTION_AUDIO_TO_TEXT_MODEL`
AUDIO_TO_EMBEDDING	qwen2.5-vl-embedding	`AI_FUNCTION_AUDIO_TO_EMBEDDING_MODEL`

Use cases

Text retrieval for RAG

In Retrieval-Augmented Generation (RAG) and customer service applications, use ai_embedd with cosine_similarity to find the most semantically relevant text segments for a given query.

SELECT cosine_similarity(ai_embedd('<query text>'), ai_embedd(text)) AS cos
FROM text_table
ORDER BY cos DESC
LIMIT 10;

To optimize vector search performance at scale, see AISearch.

Voiceprint retrieval

In meeting recordings or conversation analysis, transcribe audio by speaker, then compare audio embeddings against a voice dataset to identify speakers.

Transcribe and segment by speaker:

SELECT AI_AUDIO_TRANSCRIBE('oss://xxx', "{'diarization_mode':'speaker'}");

Retrieve the 10 closest voiceprint matches using l2_distance:

SELECT l2_distance(input_embedding, origin_embedding) AS ld
FROM (
  SELECT ai_audio_embed('oss://xxx') AS input_embedding
  JOIN
  SELECT ai_audio_embed(url) AS origin_embedding
  FROM audio_table
)
ORDER BY ld
LIMIT 10;

For the complete solution, see Voiceprint Retrieval.

Filter dissatisfied customer reviews

Use ai_sentiment in a WHERE clause to identify negative feedback from a reviews table.

In this example, the qwen_plus_external model (backed by the qwen-plus model in Alibaba Cloud Model Studio) is created in advance.

SELECT name, comment
FROM customer_comment
WHERE ai_sentiment('qwen_plus_external', comment) = 'negative'
AND dt = '20260101';