EMR StarRocks AI center - E-MapReduce - Alibaba Cloud Documentation Center

StarRocks AI functions embed large language model (LLM) capabilities directly into SQL, so you can run text analysis, transformation, and generation as part of any OLAP query—without moving data to an external pipeline.

All 12 built-in functions return typed values (VARCHAR, FLOAT, JSON, or BOOLEAN) that compose naturally with JOIN, GROUP BY, aggregation, and filter operations. Raw data stays in StarRocks; only the fields you pass to a function are sent to the model endpoint.

Each user gets a free trial quota of 100 million tokens. After exceeding the free trial quota, additional usage will be charged based on actual token consumption. For pricing details, see AI Function.

Use cases

Use case	Functions
Dashboard-ready sentiment scoring on customer reviews	`ai_sentiment`
Auto-tagging product catalog or support tickets	`ai_classify`
Extracting structured fields (names, dates, locations) from free text	`ai_extract`
Masking PII before writing to a staging environment	`ai_redact`
Cleaning and standardizing user-generated content	`ai_fix_grammar`
Summarizing long documents or ticket threads	`ai_summarize`
Translating and localizing multilingual records	`ai_translate`
Ranking FAQ or search results by semantic relevance	`ai_similarity`
General AI text completion and content generation	`ai_complete`
Filtering rows by natural-language conditions	`ai_filter`
Running Q&A against a custom knowledge base or resource	`ai_custom_query`

Prerequisites

Before you begin, make sure you meet the kernel and network requirements:

Kernel version requirements

3.3.20-2.1.0 and later
3.5.16-2.1.0 and later

Configure network access

StarRocks BE nodes need outbound internet access to reach external model service endpoints. Set up a NAT Gateway on the VPC where your cluster runs:

Create a NAT Gateway and bind an Elastic IP Address (EIP) to it.
Add an SNAT rule that routes traffic from the BE node CIDR block through the NAT Gateway.
Confirm that your VPC route tables and security group policies allow outbound traffic.

For setup details, see Internet NAT gateway.

Function reference

All functions are called in standard SQL. Results are returned as typed values and can be used directly in subsequent query expressions.

Function	What it does	Returns
`ai_sentiment(text)`	Classifies text as positive, negative, neutral, mixed, or unknown	VARCHAR
`ai_classify(text, labels)`	Assigns one label from a custom list	JSON
`ai_extract(text, entity_labels)`	Pulls out named entities and returns them as JSON	JSON
`ai_redact(text, categories)`	Replaces specified PII categories with `[REDACTED]`	VARCHAR
`ai_fix_grammar(text)`	Corrects grammar and spelling	VARCHAR
`ai_summarize(text)`	Generates a summary, with an optional word limit	VARCHAR
`ai_translate(text, source_lang, target_lang)`	Translates text to the target language	VARCHAR
`ai_similarity(text1, text2)`	Returns a semantic similarity score from 0 to 1	FLOAT
`ai_complete(model, prompt)`	Generates content using a specified model and prompt	VARCHAR
`ai_complete(model, prompt, params)`	Generates content with additional parameters (temperature, max_tokens, etc.)	VARCHAR
`ai_filter(text, condition)`	Filters rows by evaluating a natural-language condition	BOOLEAN
`ai_custom_query(resource, prompt)`	Runs a query against a custom resource or knowledge base	VARCHAR

Function details

ai_sentiment

Classifies the sentiment of input text.

Syntax

ai_sentiment(text)

Parameters

Parameter	Type	Description
`text`	VARCHAR	The text to analyze

Returns

VARCHAR — 'positive', 'negative', 'neutral', 'mixed', or 'unknown'. Returns NULL if sentiment cannot be determined.

Examples

Single-value test:

SELECT ai_sentiment('I am happy');
-- Returns: 'positive'

Apply to a table column:

SELECT
  review_id,
  review_text,
  ai_sentiment(review_text) AS sentiment
FROM customer_reviews
LIMIT 10;

ai_classify

Assigns one label from a list you define.

Syntax

ai_classify(text, labels)

Parameters

Parameter	Type	Description
`text`	VARCHAR	The text to classify
`labels`	ARRAY<VARCHAR>	Candidate labels — at least 2 and at most 20 elements

Returns

JSON — contains the classification result. Returns NULL if classification fails.

Examples

Single-value test:

SELECT ai_classify('My password is leaked.', ['urgent', 'not urgent']);
-- Returns: {"labels": ["urgent"]}

Tag product descriptions in bulk:

SELECT
  product_id,
  description,
  ai_classify(description, ['clothing', 'shoes', 'accessories', 'furniture']) AS category
FROM products
LIMIT 10;

ai_extract

Extracts named entities from text and returns them as a JSON object.

Syntax

ai_extract(text, entity_labels)

Parameters

Parameter	Type	Description
`text`	VARCHAR	The text to extract from
`entity_labels`	ARRAY<VARCHAR>	Entity types to extract, e.g. `['person', 'location', 'organization']`

Returns

JSON object where keys are the entity types and values are the extracted text.

Example

SELECT ai_extract(
  'John Doe lives in New York and works for Acme Corp.',
  ['person', 'location', 'organization']
);
-- Returns: {"person":"John Doe","location":"New York","organization":"Acme Corp"}

ai_redact

Replaces specified PII categories with [REDACTED].

Syntax

ai_redact(text, categories)

Parameters

Parameter	Type	Description
`text`	VARCHAR	The text to redact
`categories`	ARRAY<VARCHAR>	PII categories to redact, e.g. `['person', 'email', 'phone']`

Returns

VARCHAR with matched entities replaced by [REDACTED].

Examples

Single-value test:

SELECT ai_redact(
  'John Doe lives in New York. His email is john.doe@example.com.',
  ['person', 'email']
);
-- Returns: "[REDACTED] lives in New York. His email is [REDACTED]."

Redact PII across a log table before export:

SELECT
  log_id,
  ai_redact(log_text, ['person', 'email', 'phone']) AS redacted_log
FROM audit_logs
LIMIT 10;

ai_fix_grammar

Corrects grammar and spelling in text.

Syntax

ai_fix_grammar(text)

Parameters

Parameter	Type	Description
`text`	VARCHAR	The text to correct

Returns

VARCHAR with corrected grammar and spelling.

Example

SELECT ai_fix_grammar('This sentence have some mistake');
-- Returns: "This sentence has some mistake"

ai_summarize

Generates a concise summary of the input text.

Syntax

ai_summarize(text)

Parameters

Parameter	Type	Required	Description
`text`	VARCHAR	Yes	The text to summarize

Returns

VARCHAR summary.

Example

SELECT ai_summarize(
  'Apache Spark is a unified analytics engine for large-scale data processing...',
  10
);
-- Returns: "Spark: unified engine for large-scale data processing with APIs and tools."

ai_translate

Translates text from one language to another.

Syntax

ai_translate(text, source_lang, target_lang)

Parameters

Parameter	Type	Description
`text`	VARCHAR	The text to translate
`source_lang`	VARCHAR	Source language code (ISO 639-1)
`target_lang`	VARCHAR	Target language code (ISO 639-1)

Common language codes:

Language	Code
Arabic	`'ar'`
Chinese (Simplified)	`'zh'`
English	`'en'`
French	`'fr'`
German	`'de'`
Hindi	`'hi'`
Japanese	`'ja'`
Korean	`'ko'`
Portuguese	`'pt'`
Russian	`'ru'`
Spanish	`'es'`

Returns

Translated VARCHAR.

Example

SELECT ai_translate('Hello, how are you?', 'en', 'es');
-- Returns: "Hola, ¿cómo estás?"

ai_similarity

Calculates the semantic similarity between two texts.

Syntax

ai_similarity(text1, text2)

Parameters

Parameter	Type	Description
`text1`	VARCHAR	First text
`text2`	VARCHAR	Second text

Returns

FLOAT between 0 and 1. A score of 1.0 means the texts are identical. This score is primarily used for sorting.

Example

SELECT ai_similarity(
  'I enjoy hiking in the mountains.',
  'I love walking through mountain trails.'
);
-- Returns: 0.82

ai_complete

Generates content using a specified model. Supports two overloads: a simple form and a parameterized form.

Syntax

-- Overload 1: Simple
ai_complete(model, prompt)
-- Overload 2: With parameters
ai_complete(model, prompt, params)

Parameters

Parameter	Type	Description
`model`	VARCHAR	The model name, e.g. `'qwen-plus'`
`prompt`	VARCHAR	The prompt that guides content generation
`params`	MAP<VARCHAR, VARCHAR>	Optional. Additional model parameters such as `temperature`, `max_tokens`, `top_p`, `stop`

Returns

VARCHAR — the generated text.

Examples

Simple form:

SELECT ai_complete('qwen-plus', 'Write a catchy email subject for a summer bike sale with a 20% discount');
-- Returns: "Summer Cycling Carnival: 20% Off for a Limited Time!"

With parameters:

SELECT ai_complete('qwen-plus', 'Summarize the benefits of cloud computing', map{'temperature':'0.7', 'max_tokens':'200'});
-- Returns: "Cloud computing offers scalable resources, cost efficiency..."

ai_filter

Evaluates a natural-language condition against text and returns TRUE or FALSE. Useful for filtering rows in WHERE clauses.

Syntax

ai_filter(text, condition)

Parameters

Parameter	Type	Description
`text`	VARCHAR	The text to evaluate
`condition`	VARCHAR	A natural-language condition describing the filter criteria

Returns

BOOLEAN — TRUE if the text satisfies the condition, FALSE otherwise.

Example

SELECT * FROM reviews WHERE ai_filter(review_text, 'mentions product quality issues');
-- Returns rows where the review text discusses product quality problems

ai_custom_query

Runs a query against a custom resource or knowledge base.

Syntax

ai_custom_query(resource, prompt)

Parameters

Parameter	Type	Description
`resource`	VARCHAR	The name of the custom resource or knowledge base to query
`prompt`	VARCHAR	The query or question to ask the resource

Returns

VARCHAR.

Example

SELECT ai_custom_query('my_knowledge_base', 'What is StarRocks?');
-- Returns: "StarRocks is a high-performance analytical data warehouse..."

Cluster configuration

AI function behavior is controlled by BE dynamic parameters. Change them at runtime using ADMIN SET CONFIG — no restart required.

To view current values:

SELECT * FROM information_schema.be_configs WHERE NAME LIKE 'ai_%';

Model connection

Parameter	Default	Description
`ai_function_model_endpoint`	`"https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"`	Model service endpoint. Supports any OpenAI-compatible Chat/Completions endpoint.
`ai_function_model_type`	`"qwen-plus"`	The model to call. Supports text generation models.
`ai_function_model_api_key`	`"sk-89c1***********95d0"`	API key for Alibaba Cloud Model Studio. See Get an API key.
`ai_function_system_prompt`	`"You are a helpful assistant."`	System prompt applied to all AI functions.
`ai_function_http_connect_timeout_ms`	`10000`	HTTP connection timeout in milliseconds (default: 10 seconds).
`ai_function_http_timeout_ms`	`600000`	HTTP request timeout in milliseconds (default: 10 minutes).
`ai_function_max_inflight_requests`	`100`	Maximum concurrent HTTP requests to the model. Controls request throughput.
`ai_function_request_template`	(OpenAI-compatible format)	Request body template. Placeholders: `$0`=model, `$1`=system prompt, `$2`=user prompt, `$3`=optional extra params (comma-prefixed).

Batch sizes

Each function sends requests to the model in batches. Larger batches reduce the number of model calls and can shorten total response time.

Parameter	Default	Function
`ai_function_sentiment_batch_size`	`10`	`ai_sentiment`
`ai_function_classify_batch_size`	`10`	`ai_classify`
`ai_function_similarity_batch_size`	`10`	`ai_similarity`
`ai_function_fix_grammar_batch_size`	`3`	`ai_fix_grammar`
`ai_function_mask_batch_size`	`3`	`ai_redact`
`ai_function_translate_batch_size`	`3`	`ai_translate`
`ai_function_extract_batch_size`	`1`	`ai_extract`
`ai_function_gen_batch_size`	`1`	`ai_complete`
`ai_function_query_batch_size`	`1`	`ai_filter`
`ai_function_summarize_batch_size`	`1`	`ai_summarize`

Prompt templates

Each function uses a configurable prompt template. Template placeholders use $0, $1, etc.

Parameter	Placeholders	Function
`ai_function_sentiment_prompt`	`$0`=texts	`ai_sentiment`
`ai_function_classify_prompt`	`$0`=labels (JSON-encoded), `$1`=texts	`ai_classify`
`ai_function_extract_prompt`	`$0`=labels, `$1`=texts	`ai_extract`
`ai_function_fix_grammar_prompt`	`$0`=texts	`ai_fix_grammar`
`ai_function_mask_prompt`	`$0`=labels, `$1`=texts	`ai_redact`
`ai_function_summarize_prompt`	`$0`=word count, `$1`=texts	`ai_summarize`
`ai_function_translate_prompt`	`$0`=target language, `$1`=texts	`ai_translate`
`ai_function_similarity_prompt`	`$0`=text pairs	`ai_similarity`
`ai_function_gen_prompt`	`$0`=generation instruction	`ai_complete`
`ai_function_query_prompt`	`$0`=query instruction, `$1`=texts	`ai_filter`
`ai_function_custom_query_prompt`	`$0`=query instruction, `$1`=texts	`ai_custom_query`