All Products
Search
Document Center

E-MapReduce:AI functions

Last Updated:May 20, 2026

StarRocks AI functions embed large language model (LLM) capabilities directly into SQL, so you can run text analysis, transformation, and generation as part of any OLAP query—without moving data to an external pipeline.

All 12 built-in functions return typed values (VARCHAR, FLOAT, JSON, or BOOLEAN) that compose naturally with JOIN, GROUP BY, aggregation, and filter operations. Raw data stays in StarRocks; only the fields you pass to a function are sent to the model endpoint.

Each user gets a free trial quota of 100 million tokens. After exceeding the free trial quota, additional usage will be charged based on actual token consumption. For pricing details, see AI Function.

Use cases

Use case

Functions

Dashboard-ready sentiment scoring on customer reviews

ai_sentiment

Auto-tagging product catalog or support tickets

ai_classify

Extracting structured fields (names, dates, locations) from free text

ai_extract

Masking PII before writing to a staging environment

ai_redact

Cleaning and standardizing user-generated content

ai_fix_grammar

Summarizing long documents or ticket threads

ai_summarize

Translating and localizing multilingual records

ai_translate

Ranking FAQ or search results by semantic relevance

ai_similarity

General AI text completion and content generation

ai_complete

Filtering rows by natural-language conditions

ai_filter

Running Q&A against a custom knowledge base or resource

ai_custom_query

Prerequisites

Before you begin, make sure you meet the kernel and network requirements:

Kernel version requirements

  • 3.3.20-2.1.0 and later

  • 3.5.16-2.1.0 and later

Configure network access

StarRocks BE nodes need outbound internet access to reach external model service endpoints. Set up a NAT Gateway on the VPC where your cluster runs:

  1. Create a NAT Gateway and bind an Elastic IP Address (EIP) to it.

  2. Add an SNAT rule that routes traffic from the BE node CIDR block through the NAT Gateway.

  3. Confirm that your VPC route tables and security group policies allow outbound traffic.

For setup details, see Internet NAT gateway.

Function reference

All functions are called in standard SQL. Results are returned as typed values and can be used directly in subsequent query expressions.

Function

What it does

Returns

ai_sentiment(text)

Classifies text as positive, negative, neutral, mixed, or unknown

VARCHAR

ai_classify(text, labels)

Assigns one label from a custom list

JSON

ai_extract(text, entity_labels)

Pulls out named entities and returns them as JSON

JSON

ai_redact(text, categories)

Replaces specified PII categories with [REDACTED]

VARCHAR

ai_fix_grammar(text)

Corrects grammar and spelling

VARCHAR

ai_summarize(text)

Generates a summary, with an optional word limit

VARCHAR

ai_translate(text, source_lang, target_lang)

Translates text to the target language

VARCHAR

ai_similarity(text1, text2)

Returns a semantic similarity score from 0 to 1

FLOAT

ai_complete(model, prompt)

Generates content using a specified model and prompt

VARCHAR

ai_complete(model, prompt, params)

Generates content with additional parameters (temperature, max_tokens, etc.)

VARCHAR

ai_filter(text, condition)

Filters rows by evaluating a natural-language condition

BOOLEAN

ai_custom_query(resource, prompt)

Runs a query against a custom resource or knowledge base

VARCHAR

Function details

ai_sentiment

Classifies the sentiment of input text.

Syntax

ai_sentiment(text)

Parameters

Parameter

Type

Description

text

VARCHAR

The text to analyze

Returns

VARCHAR — 'positive', 'negative', 'neutral', 'mixed', or 'unknown'. Returns NULL if sentiment cannot be determined.

Examples

Single-value test:

SELECT ai_sentiment('I am happy');
-- Returns: 'positive'

Apply to a table column:

SELECT
  review_id,
  review_text,
  ai_sentiment(review_text) AS sentiment
FROM customer_reviews
LIMIT 10;

ai_classify

Assigns one label from a list you define.

Syntax

ai_classify(text, labels)

Parameters

Parameter

Type

Description

text

VARCHAR

The text to classify

labels

ARRAY<VARCHAR>

Candidate labels — at least 2 and at most 20 elements

Returns

JSON — contains the classification result. Returns NULL if classification fails.

Examples

Single-value test:

SELECT ai_classify('My password is leaked.', ['urgent', 'not urgent']);
-- Returns: {"labels": ["urgent"]}

Tag product descriptions in bulk:

SELECT
  product_id,
  description,
  ai_classify(description, ['clothing', 'shoes', 'accessories', 'furniture']) AS category
FROM products
LIMIT 10;

ai_extract

Extracts named entities from text and returns them as a JSON object.

Syntax

ai_extract(text, entity_labels)

Parameters

Parameter

Type

Description

text

VARCHAR

The text to extract from

entity_labels

ARRAY<VARCHAR>

Entity types to extract, e.g. ['person', 'location', 'organization']

Returns

JSON object where keys are the entity types and values are the extracted text.

Example

SELECT ai_extract(
  'John Doe lives in New York and works for Acme Corp.',
  ['person', 'location', 'organization']
);
-- Returns: {"person":"John Doe","location":"New York","organization":"Acme Corp"}

ai_redact

Replaces specified PII categories with [REDACTED].

Syntax

ai_redact(text, categories)

Parameters

Parameter

Type

Description

text

VARCHAR

The text to redact

categories

ARRAY<VARCHAR>

PII categories to redact, e.g. ['person', 'email', 'phone']

Returns

VARCHAR with matched entities replaced by [REDACTED].

Examples

Single-value test:

SELECT ai_redact(
  'John Doe lives in New York. His email is john.doe@example.com.',
  ['person', 'email']
);
-- Returns: "[REDACTED] lives in New York. His email is [REDACTED]."

Redact PII across a log table before export:

SELECT
  log_id,
  ai_redact(log_text, ['person', 'email', 'phone']) AS redacted_log
FROM audit_logs
LIMIT 10;

ai_fix_grammar

Corrects grammar and spelling in text.

Syntax

ai_fix_grammar(text)

Parameters

Parameter

Type

Description

text

VARCHAR

The text to correct

Returns

VARCHAR with corrected grammar and spelling.

Example

SELECT ai_fix_grammar('This sentence have some mistake');
-- Returns: "This sentence has some mistake"

ai_summarize

Generates a concise summary of the input text.

Syntax

ai_summarize(text)

Parameters

Parameter

Type

Required

Description

text

VARCHAR

Yes

The text to summarize

Returns

VARCHAR summary.

Example

SELECT ai_summarize(
  'Apache Spark is a unified analytics engine for large-scale data processing...',
  10
);
-- Returns: "Spark: unified engine for large-scale data processing with APIs and tools."

ai_translate

Translates text from one language to another.

Syntax

ai_translate(text, source_lang, target_lang)

Parameters

Parameter

Type

Description

text

VARCHAR

The text to translate

source_lang

VARCHAR

Source language code (ISO 639-1)

target_lang

VARCHAR

Target language code (ISO 639-1)

Common language codes:

Language

Code

Arabic

'ar'

Chinese (Simplified)

'zh'

English

'en'

French

'fr'

German

'de'

Hindi

'hi'

Japanese

'ja'

Korean

'ko'

Portuguese

'pt'

Russian

'ru'

Spanish

'es'

Returns

Translated VARCHAR.

Example

SELECT ai_translate('Hello, how are you?', 'en', 'es');
-- Returns: "Hola, ¿cómo estás?"

ai_similarity

Calculates the semantic similarity between two texts.

Syntax

ai_similarity(text1, text2)

Parameters

Parameter

Type

Description

text1

VARCHAR

First text

text2

VARCHAR

Second text

Returns

FLOAT between 0 and 1. A score of 1.0 means the texts are identical. This score is primarily used for sorting.

Example

SELECT ai_similarity(
  'I enjoy hiking in the mountains.',
  'I love walking through mountain trails.'
);
-- Returns: 0.82

ai_complete

Generates content using a specified model. Supports two overloads: a simple form and a parameterized form.

Syntax

-- Overload 1: Simple
ai_complete(model, prompt)
-- Overload 2: With parameters
ai_complete(model, prompt, params)

Parameters

Parameter

Type

Description

model

VARCHAR

The model name, e.g. 'qwen-plus'

prompt

VARCHAR

The prompt that guides content generation

params

MAP<VARCHAR, VARCHAR>

Optional. Additional model parameters such as temperature, max_tokens, top_p, stop

Returns

VARCHAR — the generated text.

Examples

Simple form:

SELECT ai_complete('qwen-plus', 'Write a catchy email subject for a summer bike sale with a 20% discount');
-- Returns: "Summer Cycling Carnival: 20% Off for a Limited Time!"

With parameters:

SELECT ai_complete('qwen-plus', 'Summarize the benefits of cloud computing', map{'temperature':'0.7', 'max_tokens':'200'});
-- Returns: "Cloud computing offers scalable resources, cost efficiency..."

ai_filter

Evaluates a natural-language condition against text and returns TRUE or FALSE. Useful for filtering rows in WHERE clauses.

Syntax

ai_filter(text, condition)

Parameters

Parameter

Type

Description

text

VARCHAR

The text to evaluate

condition

VARCHAR

A natural-language condition describing the filter criteria

Returns

BOOLEAN — TRUE if the text satisfies the condition, FALSE otherwise.

Example

SELECT * FROM reviews WHERE ai_filter(review_text, 'mentions product quality issues');
-- Returns rows where the review text discusses product quality problems

ai_custom_query

Runs a query against a custom resource or knowledge base.

Syntax

ai_custom_query(resource, prompt)

Parameters

Parameter

Type

Description

resource

VARCHAR

The name of the custom resource or knowledge base to query

prompt

VARCHAR

The query or question to ask the resource

Returns

VARCHAR.

Example

SELECT ai_custom_query('my_knowledge_base', 'What is StarRocks?');
-- Returns: "StarRocks is a high-performance analytical data warehouse..."

Cluster configuration

AI function behavior is controlled by BE dynamic parameters. Change them at runtime using ADMIN SET CONFIG — no restart required.

To view current values:

SELECT * FROM information_schema.be_configs WHERE NAME LIKE 'ai_%';

Model connection

Parameter

Default

Description

ai_function_model_endpoint

"https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"

Model service endpoint. Supports any OpenAI-compatible Chat/Completions endpoint.

ai_function_model_type

"qwen-plus"

The model to call. Supports text generation models.

ai_function_model_api_key

"sk-89c1***********95d0"

API key for Alibaba Cloud Model Studio. See Get an API key.

ai_function_system_prompt

"You are a helpful assistant."

System prompt applied to all AI functions.

ai_function_http_connect_timeout_ms

10000

HTTP connection timeout in milliseconds (default: 10 seconds).

ai_function_http_timeout_ms

600000

HTTP request timeout in milliseconds (default: 10 minutes).

ai_function_max_inflight_requests

100

Maximum concurrent HTTP requests to the model. Controls request throughput.

ai_function_request_template

(OpenAI-compatible format)

Request body template. Placeholders: $0=model, $1=system prompt, $2=user prompt, $3=optional extra params (comma-prefixed).

Batch sizes

Each function sends requests to the model in batches. Larger batches reduce the number of model calls and can shorten total response time.

Parameter

Default

Function

ai_function_sentiment_batch_size

10

ai_sentiment

ai_function_classify_batch_size

10

ai_classify

ai_function_similarity_batch_size

10

ai_similarity

ai_function_fix_grammar_batch_size

3

ai_fix_grammar

ai_function_mask_batch_size

3

ai_redact

ai_function_translate_batch_size

3

ai_translate

ai_function_extract_batch_size

1

ai_extract

ai_function_gen_batch_size

1

ai_complete

ai_function_query_batch_size

1

ai_filter

ai_function_summarize_batch_size

1

ai_summarize

Prompt templates

Each function uses a configurable prompt template. Template placeholders use $0, $1, etc.

Parameter

Placeholders

Function

ai_function_sentiment_prompt

$0=texts

ai_sentiment

ai_function_classify_prompt

$0=labels (JSON-encoded), $1=texts

ai_classify

ai_function_extract_prompt

$0=labels, $1=texts

ai_extract

ai_function_fix_grammar_prompt

$0=texts

ai_fix_grammar

ai_function_mask_prompt

$0=labels, $1=texts

ai_redact

ai_function_summarize_prompt

$0=word count, $1=texts

ai_summarize

ai_function_translate_prompt

$0=target language, $1=texts

ai_translate

ai_function_similarity_prompt

$0=text pairs

ai_similarity

ai_function_gen_prompt

$0=generation instruction

ai_complete

ai_function_query_prompt

$0=query instruction, $1=texts

ai_filter

ai_function_custom_query_prompt

$0=query instruction, $1=texts

ai_custom_query