StarRocks AI functions embed large language model (LLM) capabilities directly into SQL, so you can run text analysis, transformation, and generation as part of any OLAP query—without moving data to an external pipeline.
All 12 built-in functions return typed values (VARCHAR, FLOAT, JSON, or BOOLEAN) that compose naturally with JOIN, GROUP BY, aggregation, and filter operations. Raw data stays in StarRocks; only the fields you pass to a function are sent to the model endpoint.
Each user gets a free trial quota of 100 million tokens. After exceeding the free trial quota, additional usage will be charged based on actual token consumption. For pricing details, see AI Function.
Use cases
Use case | Functions |
Dashboard-ready sentiment scoring on customer reviews |
|
Auto-tagging product catalog or support tickets |
|
Extracting structured fields (names, dates, locations) from free text |
|
Masking PII before writing to a staging environment |
|
Cleaning and standardizing user-generated content |
|
Summarizing long documents or ticket threads |
|
Translating and localizing multilingual records |
|
Ranking FAQ or search results by semantic relevance |
|
General AI text completion and content generation |
|
Filtering rows by natural-language conditions |
|
Running Q&A against a custom knowledge base or resource |
|
Prerequisites
Before you begin, make sure you meet the kernel and network requirements:
Kernel version requirements
3.3.20-2.1.0 and later
3.5.16-2.1.0 and later
Configure network access
StarRocks BE nodes need outbound internet access to reach external model service endpoints. Set up a NAT Gateway on the VPC where your cluster runs:
Create a NAT Gateway and bind an Elastic IP Address (EIP) to it.
Add an SNAT rule that routes traffic from the BE node CIDR block through the NAT Gateway.
Confirm that your VPC route tables and security group policies allow outbound traffic.
For setup details, see Internet NAT gateway.
Function reference
All functions are called in standard SQL. Results are returned as typed values and can be used directly in subsequent query expressions.
Function | What it does | Returns |
| Classifies text as positive, negative, neutral, mixed, or unknown | VARCHAR |
| Assigns one label from a custom list | JSON |
| Pulls out named entities and returns them as JSON | JSON |
| Replaces specified PII categories with | VARCHAR |
| Corrects grammar and spelling | VARCHAR |
| Generates a summary, with an optional word limit | VARCHAR |
| Translates text to the target language | VARCHAR |
| Returns a semantic similarity score from 0 to 1 | FLOAT |
| Generates content using a specified model and prompt | VARCHAR |
| Generates content with additional parameters (temperature, max_tokens, etc.) | VARCHAR |
| Filters rows by evaluating a natural-language condition | BOOLEAN |
| Runs a query against a custom resource or knowledge base | VARCHAR |
Function details
ai_sentiment
Classifies the sentiment of input text.
Syntax
ai_sentiment(text)Parameters
Parameter | Type | Description |
| VARCHAR | The text to analyze |
Returns
VARCHAR — 'positive', 'negative', 'neutral', 'mixed', or 'unknown'. Returns NULL if sentiment cannot be determined.
Examples
Single-value test:
SELECT ai_sentiment('I am happy');
-- Returns: 'positive'Apply to a table column:
SELECT
review_id,
review_text,
ai_sentiment(review_text) AS sentiment
FROM customer_reviews
LIMIT 10;ai_classify
Assigns one label from a list you define.
Syntax
ai_classify(text, labels)Parameters
Parameter | Type | Description |
| VARCHAR | The text to classify |
| ARRAY<VARCHAR> | Candidate labels — at least 2 and at most 20 elements |
Returns
JSON — contains the classification result. Returns NULL if classification fails.
Examples
Single-value test:
SELECT ai_classify('My password is leaked.', ['urgent', 'not urgent']);
-- Returns: {"labels": ["urgent"]}Tag product descriptions in bulk:
SELECT
product_id,
description,
ai_classify(description, ['clothing', 'shoes', 'accessories', 'furniture']) AS category
FROM products
LIMIT 10;ai_extract
Extracts named entities from text and returns them as a JSON object.
Syntax
ai_extract(text, entity_labels)Parameters
Parameter | Type | Description |
| VARCHAR | The text to extract from |
| ARRAY<VARCHAR> | Entity types to extract, e.g. |
Returns
JSON object where keys are the entity types and values are the extracted text.
Example
SELECT ai_extract(
'John Doe lives in New York and works for Acme Corp.',
['person', 'location', 'organization']
);
-- Returns: {"person":"John Doe","location":"New York","organization":"Acme Corp"}ai_redact
Replaces specified PII categories with [REDACTED].
Syntax
ai_redact(text, categories)Parameters
Parameter | Type | Description |
| VARCHAR | The text to redact |
| ARRAY<VARCHAR> | PII categories to redact, e.g. |
Returns
VARCHAR with matched entities replaced by [REDACTED].
Examples
Single-value test:
SELECT ai_redact(
'John Doe lives in New York. His email is john.doe@example.com.',
['person', 'email']
);
-- Returns: "[REDACTED] lives in New York. His email is [REDACTED]."Redact PII across a log table before export:
SELECT
log_id,
ai_redact(log_text, ['person', 'email', 'phone']) AS redacted_log
FROM audit_logs
LIMIT 10;ai_fix_grammar
Corrects grammar and spelling in text.
Syntax
ai_fix_grammar(text)Parameters
Parameter | Type | Description |
| VARCHAR | The text to correct |
Returns
VARCHAR with corrected grammar and spelling.
Example
SELECT ai_fix_grammar('This sentence have some mistake');
-- Returns: "This sentence has some mistake"ai_summarize
Generates a concise summary of the input text.
Syntax
ai_summarize(text)Parameters
Parameter | Type | Required | Description |
| VARCHAR | Yes | The text to summarize |
Returns
VARCHAR summary.
Example
SELECT ai_summarize(
'Apache Spark is a unified analytics engine for large-scale data processing...',
10
);
-- Returns: "Spark: unified engine for large-scale data processing with APIs and tools."ai_translate
Translates text from one language to another.
Syntax
ai_translate(text, source_lang, target_lang)Parameters
Parameter | Type | Description |
| VARCHAR | The text to translate |
| VARCHAR | Source language code (ISO 639-1) |
| VARCHAR | Target language code (ISO 639-1) |
Common language codes:
Language | Code |
Arabic |
|
Chinese (Simplified) |
|
English |
|
French |
|
German |
|
Hindi |
|
Japanese |
|
Korean |
|
Portuguese |
|
Russian |
|
Spanish |
|
Returns
Translated VARCHAR.
Example
SELECT ai_translate('Hello, how are you?', 'en', 'es');
-- Returns: "Hola, ¿cómo estás?"ai_similarity
Calculates the semantic similarity between two texts.
Syntax
ai_similarity(text1, text2)Parameters
Parameter | Type | Description |
| VARCHAR | First text |
| VARCHAR | Second text |
Returns
FLOAT between 0 and 1. A score of 1.0 means the texts are identical. This score is primarily used for sorting.
Example
SELECT ai_similarity(
'I enjoy hiking in the mountains.',
'I love walking through mountain trails.'
);
-- Returns: 0.82ai_complete
Generates content using a specified model. Supports two overloads: a simple form and a parameterized form.
Syntax
-- Overload 1: Simple
ai_complete(model, prompt)
-- Overload 2: With parameters
ai_complete(model, prompt, params)Parameters
Parameter | Type | Description |
| VARCHAR | The model name, e.g. |
| VARCHAR | The prompt that guides content generation |
| MAP<VARCHAR, VARCHAR> | Optional. Additional model parameters such as |
Returns
VARCHAR — the generated text.
Examples
Simple form:
SELECT ai_complete('qwen-plus', 'Write a catchy email subject for a summer bike sale with a 20% discount');
-- Returns: "Summer Cycling Carnival: 20% Off for a Limited Time!"With parameters:
SELECT ai_complete('qwen-plus', 'Summarize the benefits of cloud computing', map{'temperature':'0.7', 'max_tokens':'200'});
-- Returns: "Cloud computing offers scalable resources, cost efficiency..."ai_filter
Evaluates a natural-language condition against text and returns TRUE or FALSE. Useful for filtering rows in WHERE clauses.
Syntax
ai_filter(text, condition)Parameters
Parameter | Type | Description |
| VARCHAR | The text to evaluate |
| VARCHAR | A natural-language condition describing the filter criteria |
Returns
BOOLEAN — TRUE if the text satisfies the condition, FALSE otherwise.
Example
SELECT * FROM reviews WHERE ai_filter(review_text, 'mentions product quality issues');
-- Returns rows where the review text discusses product quality problemsai_custom_query
Runs a query against a custom resource or knowledge base.
Syntax
ai_custom_query(resource, prompt)Parameters
Parameter | Type | Description |
| VARCHAR | The name of the custom resource or knowledge base to query |
| VARCHAR | The query or question to ask the resource |
Returns
VARCHAR.
Example
SELECT ai_custom_query('my_knowledge_base', 'What is StarRocks?');
-- Returns: "StarRocks is a high-performance analytical data warehouse..."Cluster configuration
AI function behavior is controlled by BE dynamic parameters. Change them at runtime using ADMIN SET CONFIG — no restart required.
To view current values:
SELECT * FROM information_schema.be_configs WHERE NAME LIKE 'ai_%';Model connection
Parameter | Default | Description |
|
| Model service endpoint. Supports any OpenAI-compatible Chat/Completions endpoint. |
|
| The model to call. Supports text generation models. |
|
| API key for Alibaba Cloud Model Studio. See Get an API key. |
|
| System prompt applied to all AI functions. |
|
| HTTP connection timeout in milliseconds (default: 10 seconds). |
|
| HTTP request timeout in milliseconds (default: 10 minutes). |
|
| Maximum concurrent HTTP requests to the model. Controls request throughput. |
| (OpenAI-compatible format) | Request body template. Placeholders: |
Batch sizes
Each function sends requests to the model in batches. Larger batches reduce the number of model calls and can shorten total response time.
Parameter | Default | Function |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Prompt templates
Each function uses a configurable prompt template. Template placeholders use $0, $1, etc.
Parameter | Placeholders | Function |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|