AI functions let you call AI services directly from SQL — no external API calls or data pipelines required. Use them to analyze, transform, and extract information from text and audio data stored in AnalyticDB for MySQL.
AI models are probabilistic. The same input may produce different outputs across calls.
Available functions:
Text functions
| Function | What it does |
|---|---|
ai_filter | Checks whether text is factual or compliant — returns true or false |
ai_translate | Translates text into a target language |
ai_classify | Categorizes text into predefined labels |
ai_extract | Extracts structured fields (names, dates, addresses, keywords) from unstructured text |
ai_generate | Generates a text response from a prompt |
ai_sentiment | Returns the sentiment of text: positive, negative, neutral, or mixed |
ai_similarity | Scores the semantic similarity between two text segments (0–10) |
ai_mask | Masks sensitive information such as phone numbers, email addresses, and ID numbers |
ai_summarize | Compresses long text into a concise summary |
Audio functions
| Function | What it does |
|---|---|
ai_audio_transcribe | Transcribes an audio file into text, with options for language and segmentation mode |
ai_audio_embed | Converts an audio file into an embedding vector |
Prerequisites
Before you begin, make sure that:
Your cluster is one of the following: Enterprise Edition, Basic Edition, Data Lakehouse Edition, or Data Warehouse Edition (Elastic Mode)
Elastic Network Interface (ENI) access is enabled on your cluster
This feature is in public preview. Submit a ticket to activate it before use.
To enable ENI access, log in to the AnalyticDB for MySQL console. In the left navigation pane, choose Cluster Management > Cluster Information > Network Information. In the Network Information section, turn on the ENI access switch.
Enabling or disabling ENI access interrupts database connections for approximately 2 minutes. During this period, the cluster cannot read or write data. Plan this change during a maintenance window.
Set up network connectivity and create a model
AI functions call Alibaba Cloud Model Studio to run inference. Before calling any AI function, configure a network connection from your cluster to Model Studio, then create a model.
Option 1: Public network connection
Create an Internet NAT gateway in the same region as your cluster. Associate an Elastic IP Address (EIP) with the gateway, then create a Source Network Address Translation (SNAT) entry at the vSwitch level and specify a vSwitch. For details, see Internet NAT Gateway.
Create a model. For the full syntax, see CREATE MODEL.
CREATE MODEL qwen_plus_external OPTIONS ( type='external', provider='bailian', name='qwen-plus', interface='TEXT_TO_TEXT', api_key='sk-xxx' -- Replace with your Alibaba Cloud Model Studio API key. )
Option 2: PrivateLink private connection
Model Studio PrivateLink endpoints are available in the following regions: Singapore, China (Beijing). The endpoint must be in the same region as the Model Studio service. To connect from a virtual private cloud (VPC) in a different region, see Cross-region private access.
Configure a PrivateLink endpoint and get its private endpoint address. For details, see Access Model Studio models or application APIs over a private endpoint.
Create a model with the
endpointparameter set to your private endpoint address.CREATE MODEL qwen_plus_external OPTIONS ( type='external', provider='bailian', name='qwen-plus', interface='TEXT_TO_TEXT', api_key='sk-xxx', -- Replace with your Alibaba Cloud Model Studio API key. endpoint='ep-xxx.com' -- Replace with the domain of your PrivateLink endpoint. )
Text functions
All text functions accept an optional model_name as the first argument. If omitted, the function uses the default model for its interface type. See Configure default models.
ai_filter
Checks whether the input text is factual or compliant. Returns true if the content is credible or compliant, false otherwise.
ai_filter(text)
ai_filter(model_name, text)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text | VARCHAR | Text to evaluate |
Return type: BOOLEAN
Example:
SELECT ai_filter('Is Zhejiang in China?');
-- Returns: 1
SELECT ai_filter('qwen_plus_external', 'Is Zhejiang in China?');
-- Returns: 1ai_translate
Translates input text into the specified language.
ai_translate(text, targetLang)
ai_translate(model_name, text, targetLang)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text | VARCHAR | Text to translate |
targetLang | VARCHAR | Target language code |
The model must support the target language.
Return type: VARCHAR
Example:
SELECT ai_translate(
'AnalyticDB for MySQL is a data analytics platform based on a lakehouse architecture.',
'cn'
) AS translate_text;ai_classify
Categorizes text into one of the predefined labels.
ai_classify(text, labels)
ai_classify(model_name, text, labels)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text | VARCHAR | Text to classify |
labels | VARCHAR or Array\<VARCHAR\> | List of category labels |
Return type: VARCHAR
Classify a single value (string-format labels):
SELECT ai_classify(
'Last night, I tried a new pasta recipe in the kitchen. The sauce was rich, and my family loved it!',
"['travel', 'cooking', 'reading', 'driving']"
) AS predicted_label;
-- Returns: cookingClassify a single value (array-format labels):
SELECT ai_classify(
'Last night, I tried a new pasta recipe in the kitchen. The sauce was rich, and my family loved it!',
ARRAY['travel', 'cooking', 'reading', 'driving']
) AS predicted_label;
-- Returns: cookingClassify a column in a table (batch usage):
SELECT id, comment,
ai_classify('qwen_plus_external', comment, ARRAY['complaint', 'inquiry', 'compliment']) AS category
FROM customer_feedback
WHERE dt = '20260101';ai_extract
Extracts structured fields from unstructured text.
ai_extract(text, labels)
ai_extract(model_name, text, labels)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text | VARCHAR | Source text |
labels | VARCHAR or Array\<VARCHAR\> | Field names to extract |
Return type: VARCHAR
Extract fields from a single value:
SELECT ai_extract(
'Yesterday (June 15, 2024), I bought an iPhone 15 Pro on Taobao. Its titanium body and A17 chip are truly amazing!',
"['product_name', 'date', 'key_feature']"
) AS result;
-- Returns: product_name=iPhone 15 Pro, date=June 15, 2024, key_feature=titanium body and A17 chipExtract fields from a table column (batch usage):
SELECT id, feedback,
ai_extract('qwen_plus_external', feedback, "['product_name', 'issue', 'contact_info']") AS extracted
FROM customer_feedback
WHERE dt = '20260101';ai_generate
Generates a text response from the input prompt.
ai_generate(text)
ai_generate(model_name, text)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text | VARCHAR | Input prompt |
Return type: VARCHAR
Example:
SELECT ai_generate('Introduce the TPC-H test set in one sentence.') AS result;
-- Returns: TPC-H is a standard benchmark test set used to evaluate the decision support capabilities of databases.ai_sentiment
Analyzes the sentiment of the input text. Returns one of: positive, negative, neutral, or mixed.
ai_sentiment(text)
ai_sentiment(model_name, text)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text | VARCHAR | Text to analyze |
Return type: VARCHAR
Example:
SELECT ai_sentiment(
'The sound quality of these headphones is excellent, and they are comfortable to wear. Highly recommended!'
) AS sentiment;
-- Returns: positiveai_similarity
Calculates the semantic similarity between two text segments. Returns a score from 0 to 10 — higher means more similar.
ai_similarity(text1, text2)
ai_similarity(model_name, text1, text2)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text1 | VARCHAR | First text segment |
text2 | VARCHAR | Second text segment |
Return type: VARCHAR
Example:
SELECT ai_similarity(
'How do I reset my account password?',
'I forgot my logon password. How can I retrieve it?'
) AS result;
-- Returns: 9.3ai_mask
Masks sensitive information in the input text. Detected values are replaced with [MSKED].
ai_mask(text, labels)
ai_mask(model_name, text, labels)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text | VARCHAR | Text containing sensitive information |
labels | VARCHAR or Array\<VARCHAR\> | Types of sensitive information to mask (for example, phone, email, id_card) |
Return type: VARCHAR
Example:
SELECT ai_mask(
'Contact me: 1381234****, email is user@example.com, ID card number 110101199003072316',
"['phone', 'email', 'id_card']"
) AS result;
-- Returns: Contact me: [MSKED], email is [MSKED], ID card number [MSKED]ai_summarize
Compresses long text into a concise, accurate summary.
ai_summarize(text)
ai_summarize(model_name, text)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the model to use |
text | VARCHAR | Long text to summarize |
Return type: VARCHAR
Example:
SELECT ai_summarize(
'On the winding lotus pond, what meets the eye are lush lotus leaves...'
) AS result;
-- Returns a concise summary of the input text.Audio functions
ai_audio_transcribe and ai_audio_embed support only built-in models. You cannot specify a custom model_name from a CREATE MODEL statement for these functions.
ai_audio_transcribe
Transcribes an audio file into text. Returns transcribed text as JSON or a URL to a file saved in Object Storage Service (OSS).
ai_audio_transcribe(url)
ai_audio_transcribe(model_name, url)
ai_audio_transcribe(model_name, url, options)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the built-in model to use |
url | VARCHAR | URL of the input audio file |
options | JSON | Optional parameters (see table below) |
options parameters
| Parameter | Options | Default |
|---|---|---|
language | cn (Chinese), en (English), ja (Japanese), yue (Cantonese), fspk (free speech in Chinese and English) | cn |
diarization_mode | word (by word), sentence (by sentence), speaker (by speaker) | word |
output_type | json (returns transcribed JSON text directly), url (saves to OSS and returns file URL) | json |
Return type: VARCHAR
Example 1 — Transcribe with default settings:
SELECT ai_audio_transcribe(
'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
) AS result;Example 2 — Transcribe with sentence-level segmentation:
SELECT ai_audio_transcribe(
'tingwu',
'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav',
"{'diarization_mode':'sentence'}"
) AS result;ai_audio_embed
Converts an audio file into an embedding vector. Optionally specify start and end times to embed a segment of the audio.
ai_audio_embed(text)
ai_audio_embed(model_name, text)
ai_audio_embed(model_name, text, options)Parameters
| Parameter | Type | Description |
|---|---|---|
model_name | VARCHAR | Name of the built-in model to use |
text | VARCHAR | URL of the input audio file |
options | JSON | Optional parameters (see table below) |
options parameters
| Parameter | Description | Notes |
|---|---|---|
access_id | OSS AccessKey ID | Not required if you use role_arn |
access_secret | OSS AccessKey secret | Not required if you use role_arn |
token | OSS security token | Not required if you use role_arn |
role_arn | Role assumption credential | Use instead of access_id/access_secret/token |
start_time | Start time of the audio segment | Defaults to the beginning of the file |
end_time | End time of the audio segment | Defaults to the end of the file |
Return type: Array\<FLOAT\>
Example:
SELECT ai_audio_embed(
'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
);
-- Returns an array of FLOAT values representing the embedding vector.Configure default models
When you call an AI function without specifying a model_name, the function uses the default model for its interface type. Change the default using the configuration items below. For instructions on modifying configuration items, see Config and Hint Configuration Parameters.
| Interface type | Default model | Configuration item |
|---|---|---|
| TEXT_TO_TEXT | qwen-plus | AI_FUNCTION_TEXT_TO_TEXT_MODEL |
| TEXT_TO_EMBEDDING | text-embedding-v2 | AI_FUNCTION_TEXT_TO_EMBEDDING_MODEL |
| AUDIO_TO_TEXT | tingwu | AI_FUNCTION_AUDIO_TO_TEXT_MODEL |
| AUDIO_TO_EMBEDDING | qwen2.5-vl-embedding | AI_FUNCTION_AUDIO_TO_EMBEDDING_MODEL |
Use cases
Text retrieval for RAG
In Retrieval-Augmented Generation (RAG) and customer service applications, use ai_embedd with cosine_similarity to find the most semantically relevant text segments for a given query.
SELECT cosine_similarity(ai_embedd('<query text>'), ai_embedd(text)) AS cos
FROM text_table
ORDER BY cos DESC
LIMIT 10;To optimize vector search performance at scale, see AISearch.
Voiceprint retrieval
In meeting recordings or conversation analysis, transcribe audio by speaker, then compare audio embeddings against a voice dataset to identify speakers.
Transcribe and segment by speaker:
SELECT AI_AUDIO_TRANSCRIBE('oss://xxx', "{'diarization_mode':'speaker'}");Retrieve the 10 closest voiceprint matches using
l2_distance:SELECT l2_distance(input_embedding, origin_embedding) AS ld FROM ( SELECT ai_audio_embed('oss://xxx') AS input_embedding JOIN SELECT ai_audio_embed(url) AS origin_embedding FROM audio_table ) ORDER BY ld LIMIT 10;
For the complete solution, see Voiceprint Retrieval.
Filter dissatisfied customer reviews
Use ai_sentiment in a WHERE clause to identify negative feedback from a reviews table.
In this example, the qwen_plus_external model (backed by the qwen-plus model in Alibaba Cloud Model Studio) is created in advance.
SELECT name, comment
FROM customer_comment
WHERE ai_sentiment('qwen_plus_external', comment) = 'negative'
AND dt = '20260101';