StarRocks AI functions embed large language model (LLM) capabilities directly into SQL, so you can run text analysis, transformation, and generation as part of any OLAP query—without moving data to an external pipeline.
All 11 built-in functions return typed values (STRING, FLOAT, or JSON) that compose naturally with JOIN, GROUP BY, aggregation, and filter operations. Raw data stays in StarRocks; only the fields you pass to a function are sent to the model endpoint.
AI functions are in public preview. Each user gets a free trial quota of 1 million tokens. If your total token consumption exceeds this limit, the function call fails. For large-scale use, contact the StarRocks team.
Use cases
| Use case | Functions |
|---|---|
| Dashboard-ready sentiment scoring on customer reviews | ai_analyze_sentiment |
| Auto-tagging product catalog or support tickets | ai_classify |
| Extracting structured fields (names, dates, locations) from free text | ai_extract |
| Masking PII before writing to a staging environment | ai_mask |
| Cleaning and standardizing user-generated content | ai_fix_grammar |
| Summarizing long documents or ticket threads | ai_summarize |
| Translating and localizing multilingual records | ai_translate |
| Ranking FAQ or search results by semantic relevance | ai_similarity |
| Batch-generating marketing copy or report text | ai_gen |
| Running Q&A against data rows using a pre-configured prompt | ai_query |
| Calling your own private or fine-tuned model | ai_custom_query |
Prerequisites
Before you begin, make sure you have:
-
Trial access enabled (see Enable trial access)
-
Public network access configured for your BE nodes (see Configure network access)
Enable trial access
AI functions currently require trial access. Apply through one of these channels:
-
Contact the StarRocks team through your Alibaba Cloud technical support manager or cloud architect.
-
Join the DingTalk group by searching for group number
24010016636in the DingTalk app. -
Fill out the AI Center Application Form.
Configure network access
StarRocks BE nodes need outbound internet access to reach external model service endpoints. Set up a NAT Gateway on the VPC where your cluster runs:
-
Create a NAT Gateway and bind an Elastic IP Address (EIP) to it.
-
Add an SNAT rule that routes traffic from the BE node CIDR block through the NAT Gateway.
-
Confirm that your VPC route tables and security group policies allow outbound traffic.
For setup details, see Public NAT Gateway.
Function reference
All functions are called in standard SQL. Results are returned as typed values and can be used directly in subsequent query expressions.
| Function | What it does | Returns |
|---|---|---|
ai_analyze_sentiment(content) |
Classifies text as positive, negative, or neutral | STRING |
ai_classify(content, labels) |
Assigns one label from a custom list | STRING |
ai_extract(content, labels) |
Pulls out named entities and returns them as JSON | JSON object |
ai_mask(content, labels) |
Replaces specified entity types with [MASKED] |
STRING |
ai_fix_grammar(content) |
Corrects grammar and spelling | STRING |
ai_summarize(content [, max_words]) |
Generates a summary, with an optional word limit | STRING |
ai_translate(content, to_lang) |
Translates text to the target language | STRING |
ai_similarity(text1, text2) |
Returns a semantic similarity score from 0 to 1 | FLOAT |
ai_gen(prompt) |
Generates content from a prompt | STRING |
ai_query(prompt) |
Runs Q&A using a pre-configured system prompt | STRING |
ai_custom_query(content, parameters) |
Calls a custom model endpoint | STRING |
Function details
ai_analyze_sentiment
Classifies the sentiment of input text.
Syntax
ai_analyze_sentiment(content)
Parameters
| Parameter | Type | Description |
|---|---|---|
content |
STRING | The text to analyze |
Returns
STRING — 'positive', 'negative', or 'neutral'. Returns NULL if sentiment cannot be determined.
Examples
Single-value test:
SELECT ai_analyze_sentiment('I am happy');
-- Returns: 'positive'
Apply to a table column:
SELECT
review_id,
review_text,
ai_analyze_sentiment(review_text) AS sentiment
FROM customer_reviews
LIMIT 10;
ai_classify
Assigns one label from a list you define.
Syntax
ai_classify(content, labels)
Parameters
| Parameter | Type | Description |
|---|---|---|
content |
STRING | The text to classify |
labels |
ARRAY | Candidate labels — at least 2 and at most 20 elements |
Returns
STRING — one element from labels. Returns NULL if classification fails.
Examples
Single-value test:
SELECT ai_classify('My password is leaked.', ['urgent', 'not urgent']);
-- Returns: 'urgent'
Tag product descriptions in bulk:
SELECT
product_id,
description,
ai_classify(description, ['clothing', 'shoes', 'accessories', 'furniture']) AS category
FROM products
LIMIT 10;
ai_extract
Extracts named entities from text and returns them as a JSON object.
Syntax
ai_extract(content, labels)
Parameters
| Parameter | Type | Description |
|---|---|---|
content |
STRING | The text to extract from |
labels |
ARRAY | Entity types to extract, e.g. ['person', 'location', 'organization'] |
Returns
JSON object where keys are the entity types and values are the extracted text.
Example
SELECT ai_extract(
'John Doe lives in New York and works for Acme Corp.',
['person', 'location', 'organization']
);
-- Returns: {"person":"John Doe","location":"New York","organization":"Acme Corp"}
ai_mask
Replaces specified entity types with [MASKED].
Syntax
ai_mask(content, labels)
Parameters
| Parameter | Type | Description |
|---|---|---|
content |
STRING | The text to mask |
labels |
ARRAY | Entity types to mask, e.g. ['person', 'email', 'phone'] |
Returns
STRING with matched entities replaced by [MASKED].
Examples
Single-value test:
SELECT ai_mask(
'John Doe lives in New York. His email is john.doe@example.com.',
['person', 'email']
);
-- Returns: "[MASKED] lives in New York. His email is [MASKED]."
Mask PII across a log table before export:
SELECT
log_id,
ai_mask(log_text, ['person', 'email', 'phone']) AS masked_log
FROM audit_logs
LIMIT 10;
ai_fix_grammar
Corrects grammar and spelling in text.
Syntax
ai_fix_grammar(content)
Parameters
| Parameter | Type | Description |
|---|---|---|
content |
STRING | The text to correct |
Returns
STRING with corrected grammar and spelling.
Example
SELECT ai_fix_grammar('This sentence have some mistake');
-- Returns: "This sentence has some mistake"
ai_summarize
Generates a concise summary of the input text.
Syntax
ai_summarize(content [, max_words])
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
content |
STRING | Yes | The text to summarize |
max_words |
INTEGER | No | Target word count for the summary. Default: 50. Set to 0 for no limit. |
Returns
STRING summary.
Example
SELECT ai_summarize(
'Apache Spark is a unified analytics engine for large-scale data processing...',
10
);
-- Returns: "Spark: unified engine for large-scale data processing with APIs and tools."
ai_translate
Translates text into a target language.
Syntax
ai_translate(content, to_lang)
Parameters
| Parameter | Type | Description |
|---|---|---|
content |
STRING | The text to translate |
to_lang |
STRING | Target language code following the ISO 639-1 standard |
Common language codes:
| Language | Code |
|---|---|
| Arabic | 'ar' |
| Chinese (Simplified) | 'zh' |
| English | 'en' |
| French | 'fr' |
| German | 'de' |
| Hindi | 'hi' |
| Japanese | 'ja' |
| Korean | 'ko' |
| Portuguese | 'pt' |
| Russian | 'ru' |
| Spanish | 'es' |
Returns
Translated STRING.
Example
SELECT ai_translate('Hello, how are you?', 'es');
-- Returns: "Hola, ¿cómo estás?"
ai_similarity
Calculates the semantic similarity between two texts.
Syntax
ai_similarity(text1, text2)
Parameters
| Parameter | Type | Description |
|---|---|---|
text1 |
STRING | First text |
text2 |
STRING | Second text |
Returns
FLOAT between 0 and 1. A score of 1.0 means the texts are identical. This score is primarily used for sorting.
Example
SELECT ai_similarity(
'I enjoy hiking in the mountains.',
'I love walking through mountain trails.'
);
-- Returns: 0.82
ai_gen
Generates content from a prompt.
Syntax
ai_gen(prompt)
Parameters
| Parameter | Type | Description |
|---|---|---|
prompt |
STRING | The prompt that guides content generation |
Returns
Generated STRING.
Example
SELECT ai_gen('Write a catchy email subject for a summer bike sale with a 20% discount');
-- Returns: "Summer Cycling Carnival: 20% Off for a Limited Time!"
ai_query
Runs general Q&A using a pre-configured system prompt combined with user input.
Syntax
ai_query(prompt)
Parameters
| Parameter | Type | Description |
|---|---|---|
prompt |
STRING | The user's question or instruction |
Returns
STRING — the model's answer.
Example
SELECT ai_query('What is the capital of France?');
-- Returns: "Paris"
ai_custom_query
Calls a custom or private model endpoint. All connection and model parameters are passed inline.
Syntax
ai_custom_query(content, parameters)
Parameters
| Parameter | Type | Description |
|---|---|---|
content |
STRING | The text content to process |
parameters |
MAP\<STRING, STRING\> | Connection and model settings (see table below) |
Supported keys in parameters:
| Key | Required | Description |
|---|---|---|
prompt |
Yes | Instructions that tell the model how to process content |
endpoint |
Yes | The model service endpoint URL |
model |
Yes | The model name |
api_key |
Yes | The API key for authentication |
temperature |
No | Controls output randomness. Range: [0, 2). Higher = more varied, lower = more deterministic. |
top_p |
No | Controls diversity via nucleus sampling. Higher = more random. |
max_tokens |
No | Maximum number of output tokens |
stop |
No | Stop sequence — the model stops generating when it encounters this string |
Returns
STRING.
Example
SELECT ai_custom_query(
'I am happy',
map{
'endpoint': 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions',
'model': 'qwen-plus',
'api_key': 'sk-89c1***********95d0',
'prompt': 'Classify each text into [positive, negative, neutral, mixed]. Return only the labels in a JSON array, in the same order as the input. Do not include any explanation, markdown formatting, or code block delimiters. Text: ',
'max_tokens': '100',
'top_p': '1',
'temperature': '0.8'
}
);
-- Returns: "positive"
Cluster configuration
AI function behavior is controlled by BE dynamic parameters. Change them at runtime using ADMIN SET CONFIG — no restart required.
To view current values:
SELECT * FROM information_schema.be_configs WHERE NAME LIKE 'ai_%';
Model connection
| Parameter | Default | Description |
|---|---|---|
ai_function_model_endpoint |
"https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions" |
Model service endpoint. Supports any OpenAI-compatible Chat/Completions endpoint. |
ai_function_model_type |
"qwen-plus" |
The model to call. Supports text generation models. |
ai_function_model_api_key |
"sk-89c1***********95d0" |
API key for Alibaba Cloud Model Studio. See Get an API key. |
ai_function_system_prompt |
"You are a helpful assistant." |
System prompt applied to all AI functions. |
ai_function_http_connect_timeout_ms |
10000 |
HTTP connection timeout in milliseconds (default: 10 seconds). |
ai_function_http_timeout_ms |
600000 |
HTTP request timeout in milliseconds (default: 10 minutes). |
ai_function_max_inflight_requests |
100 |
Maximum concurrent HTTP requests to the model. Controls request throughput. |
ai_function_request_template |
(OpenAI-compatible format) | Request body template. Placeholders: $0=model, $1=system prompt, $2=user prompt, $3=optional extra params (comma-prefixed). |
Batch sizes
Each function sends requests to the model in batches. Larger batches reduce the number of model calls and can shorten total response time.
| Parameter | Default | Function |
|---|---|---|
ai_function_sentiment_batch_size |
10 |
ai_analyze_sentiment |
ai_function_classify_batch_size |
10 |
ai_classify |
ai_function_similarity_batch_size |
10 |
ai_similarity |
ai_function_fix_grammar_batch_size |
3 |
ai_fix_grammar |
ai_function_mask_batch_size |
3 |
ai_mask |
ai_function_translate_batch_size |
3 |
ai_translate |
ai_function_extract_batch_size |
1 |
ai_extract |
ai_function_gen_batch_size |
1 |
ai_gen |
ai_function_query_batch_size |
1 |
ai_query |
ai_function_summarize_batch_size |
1 |
ai_summarize |
Prompt templates
Each function uses a configurable prompt template. Template placeholders use $0, $1, etc.
| Parameter | Placeholders | Function |
|---|---|---|
ai_function_sentiment_prompt |
$0=texts |
ai_analyze_sentiment |
ai_function_classify_prompt |
$0=labels (JSON-encoded), $1=texts |
ai_classify |
ai_function_extract_prompt |
$0=labels, $1=texts |
ai_extract |
ai_function_fix_grammar_prompt |
$0=texts |
ai_fix_grammar |
ai_function_mask_prompt |
$0=labels, $1=texts |
ai_mask |
ai_function_summarize_prompt |
$0=word count, $1=texts |
ai_summarize |
ai_function_translate_prompt |
$0=target language, $1=texts |
ai_translate |
ai_function_similarity_prompt |
$0=text pairs |
ai_similarity |
ai_function_gen_prompt |
$0=generation instruction |
ai_gen |
ai_function_query_prompt |
$0=query instruction, $1=texts |
ai_query |
ai_function_custom_query_prompt |
$0=query instruction, $1=texts |
ai_custom_query |