All Products
Search
Document Center

E-MapReduce:AI Center (Beta)

Last Updated:Mar 26, 2026

StarRocks AI functions embed large language model (LLM) capabilities directly into SQL, so you can run text analysis, transformation, and generation as part of any OLAP query—without moving data to an external pipeline.

All 11 built-in functions return typed values (STRING, FLOAT, or JSON) that compose naturally with JOIN, GROUP BY, aggregation, and filter operations. Raw data stays in StarRocks; only the fields you pass to a function are sent to the model endpoint.

AI functions are in public preview. Each user gets a free trial quota of 1 million tokens. If your total token consumption exceeds this limit, the function call fails. For large-scale use, contact the StarRocks team.

Use cases

Use case Functions
Dashboard-ready sentiment scoring on customer reviews ai_analyze_sentiment
Auto-tagging product catalog or support tickets ai_classify
Extracting structured fields (names, dates, locations) from free text ai_extract
Masking PII before writing to a staging environment ai_mask
Cleaning and standardizing user-generated content ai_fix_grammar
Summarizing long documents or ticket threads ai_summarize
Translating and localizing multilingual records ai_translate
Ranking FAQ or search results by semantic relevance ai_similarity
Batch-generating marketing copy or report text ai_gen
Running Q&A against data rows using a pre-configured prompt ai_query
Calling your own private or fine-tuned model ai_custom_query

Prerequisites

Before you begin, make sure you have:

Enable trial access

AI functions currently require trial access. Apply through one of these channels:

  • Contact the StarRocks team through your Alibaba Cloud technical support manager or cloud architect.

  • Join the DingTalk group by searching for group number 24010016636 in the DingTalk app.

  • Fill out the AI Center Application Form.

Configure network access

StarRocks BE nodes need outbound internet access to reach external model service endpoints. Set up a NAT Gateway on the VPC where your cluster runs:

  1. Create a NAT Gateway and bind an Elastic IP Address (EIP) to it.

  2. Add an SNAT rule that routes traffic from the BE node CIDR block through the NAT Gateway.

  3. Confirm that your VPC route tables and security group policies allow outbound traffic.

For setup details, see Public NAT Gateway.

Function reference

All functions are called in standard SQL. Results are returned as typed values and can be used directly in subsequent query expressions.

Function What it does Returns
ai_analyze_sentiment(content) Classifies text as positive, negative, or neutral STRING
ai_classify(content, labels) Assigns one label from a custom list STRING
ai_extract(content, labels) Pulls out named entities and returns them as JSON JSON object
ai_mask(content, labels) Replaces specified entity types with [MASKED] STRING
ai_fix_grammar(content) Corrects grammar and spelling STRING
ai_summarize(content [, max_words]) Generates a summary, with an optional word limit STRING
ai_translate(content, to_lang) Translates text to the target language STRING
ai_similarity(text1, text2) Returns a semantic similarity score from 0 to 1 FLOAT
ai_gen(prompt) Generates content from a prompt STRING
ai_query(prompt) Runs Q&A using a pre-configured system prompt STRING
ai_custom_query(content, parameters) Calls a custom model endpoint STRING

Function details

ai_analyze_sentiment

Classifies the sentiment of input text.

Syntax

ai_analyze_sentiment(content)

Parameters

Parameter Type Description
content STRING The text to analyze

Returns

STRING — 'positive', 'negative', or 'neutral'. Returns NULL if sentiment cannot be determined.

Examples

Single-value test:

SELECT ai_analyze_sentiment('I am happy');
-- Returns: 'positive'

Apply to a table column:

SELECT
  review_id,
  review_text,
  ai_analyze_sentiment(review_text) AS sentiment
FROM customer_reviews
LIMIT 10;

ai_classify

Assigns one label from a list you define.

Syntax

ai_classify(content, labels)

Parameters

Parameter Type Description
content STRING The text to classify
labels ARRAY Candidate labels — at least 2 and at most 20 elements

Returns

STRING — one element from labels. Returns NULL if classification fails.

Examples

Single-value test:

SELECT ai_classify('My password is leaked.', ['urgent', 'not urgent']);
-- Returns: 'urgent'

Tag product descriptions in bulk:

SELECT
  product_id,
  description,
  ai_classify(description, ['clothing', 'shoes', 'accessories', 'furniture']) AS category
FROM products
LIMIT 10;

ai_extract

Extracts named entities from text and returns them as a JSON object.

Syntax

ai_extract(content, labels)

Parameters

Parameter Type Description
content STRING The text to extract from
labels ARRAY Entity types to extract, e.g. ['person', 'location', 'organization']

Returns

JSON object where keys are the entity types and values are the extracted text.

Example

SELECT ai_extract(
  'John Doe lives in New York and works for Acme Corp.',
  ['person', 'location', 'organization']
);
-- Returns: {"person":"John Doe","location":"New York","organization":"Acme Corp"}

ai_mask

Replaces specified entity types with [MASKED].

Syntax

ai_mask(content, labels)

Parameters

Parameter Type Description
content STRING The text to mask
labels ARRAY Entity types to mask, e.g. ['person', 'email', 'phone']

Returns

STRING with matched entities replaced by [MASKED].

Examples

Single-value test:

SELECT ai_mask(
  'John Doe lives in New York. His email is john.doe@example.com.',
  ['person', 'email']
);
-- Returns: "[MASKED] lives in New York. His email is [MASKED]."

Mask PII across a log table before export:

SELECT
  log_id,
  ai_mask(log_text, ['person', 'email', 'phone']) AS masked_log
FROM audit_logs
LIMIT 10;

ai_fix_grammar

Corrects grammar and spelling in text.

Syntax

ai_fix_grammar(content)

Parameters

Parameter Type Description
content STRING The text to correct

Returns

STRING with corrected grammar and spelling.

Example

SELECT ai_fix_grammar('This sentence have some mistake');
-- Returns: "This sentence has some mistake"

ai_summarize

Generates a concise summary of the input text.

Syntax

ai_summarize(content [, max_words])

Parameters

Parameter Type Required Description
content STRING Yes The text to summarize
max_words INTEGER No Target word count for the summary. Default: 50. Set to 0 for no limit.

Returns

STRING summary.

Example

SELECT ai_summarize(
  'Apache Spark is a unified analytics engine for large-scale data processing...',
  10
);
-- Returns: "Spark: unified engine for large-scale data processing with APIs and tools."

ai_translate

Translates text into a target language.

Syntax

ai_translate(content, to_lang)

Parameters

Parameter Type Description
content STRING The text to translate
to_lang STRING Target language code following the ISO 639-1 standard

Common language codes:

Language Code
Arabic 'ar'
Chinese (Simplified) 'zh'
English 'en'
French 'fr'
German 'de'
Hindi 'hi'
Japanese 'ja'
Korean 'ko'
Portuguese 'pt'
Russian 'ru'
Spanish 'es'

Returns

Translated STRING.

Example

SELECT ai_translate('Hello, how are you?', 'es');
-- Returns: "Hola, ¿cómo estás?"

ai_similarity

Calculates the semantic similarity between two texts.

Syntax

ai_similarity(text1, text2)

Parameters

Parameter Type Description
text1 STRING First text
text2 STRING Second text

Returns

FLOAT between 0 and 1. A score of 1.0 means the texts are identical. This score is primarily used for sorting.

Example

SELECT ai_similarity(
  'I enjoy hiking in the mountains.',
  'I love walking through mountain trails.'
);
-- Returns: 0.82

ai_gen

Generates content from a prompt.

Syntax

ai_gen(prompt)

Parameters

Parameter Type Description
prompt STRING The prompt that guides content generation

Returns

Generated STRING.

Example

SELECT ai_gen('Write a catchy email subject for a summer bike sale with a 20% discount');
-- Returns: "Summer Cycling Carnival: 20% Off for a Limited Time!"

ai_query

Runs general Q&A using a pre-configured system prompt combined with user input.

Syntax

ai_query(prompt)

Parameters

Parameter Type Description
prompt STRING The user's question or instruction

Returns

STRING — the model's answer.

Example

SELECT ai_query('What is the capital of France?');
-- Returns: "Paris"

ai_custom_query

Calls a custom or private model endpoint. All connection and model parameters are passed inline.

Syntax

ai_custom_query(content, parameters)

Parameters

Parameter Type Description
content STRING The text content to process
parameters MAP\<STRING, STRING\> Connection and model settings (see table below)

Supported keys in parameters:

Key Required Description
prompt Yes Instructions that tell the model how to process content
endpoint Yes The model service endpoint URL
model Yes The model name
api_key Yes The API key for authentication
temperature No Controls output randomness. Range: [0, 2). Higher = more varied, lower = more deterministic.
top_p No Controls diversity via nucleus sampling. Higher = more random.
max_tokens No Maximum number of output tokens
stop No Stop sequence — the model stops generating when it encounters this string

Returns

STRING.

Example

SELECT ai_custom_query(
  'I am happy',
  map{
    'endpoint': 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions',
    'model': 'qwen-plus',
    'api_key': 'sk-89c1***********95d0',
    'prompt': 'Classify each text into [positive, negative, neutral, mixed]. Return only the labels in a JSON array, in the same order as the input. Do not include any explanation, markdown formatting, or code block delimiters. Text: ',
    'max_tokens': '100',
    'top_p': '1',
    'temperature': '0.8'
  }
);
-- Returns: "positive"

Cluster configuration

AI function behavior is controlled by BE dynamic parameters. Change them at runtime using ADMIN SET CONFIG — no restart required.

To view current values:

SELECT * FROM information_schema.be_configs WHERE NAME LIKE 'ai_%';

Model connection

Parameter Default Description
ai_function_model_endpoint "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions" Model service endpoint. Supports any OpenAI-compatible Chat/Completions endpoint.
ai_function_model_type "qwen-plus" The model to call. Supports text generation models.
ai_function_model_api_key "sk-89c1***********95d0" API key for Alibaba Cloud Model Studio. See Get an API key.
ai_function_system_prompt "You are a helpful assistant." System prompt applied to all AI functions.
ai_function_http_connect_timeout_ms 10000 HTTP connection timeout in milliseconds (default: 10 seconds).
ai_function_http_timeout_ms 600000 HTTP request timeout in milliseconds (default: 10 minutes).
ai_function_max_inflight_requests 100 Maximum concurrent HTTP requests to the model. Controls request throughput.
ai_function_request_template (OpenAI-compatible format) Request body template. Placeholders: $0=model, $1=system prompt, $2=user prompt, $3=optional extra params (comma-prefixed).

Batch sizes

Each function sends requests to the model in batches. Larger batches reduce the number of model calls and can shorten total response time.

Parameter Default Function
ai_function_sentiment_batch_size 10 ai_analyze_sentiment
ai_function_classify_batch_size 10 ai_classify
ai_function_similarity_batch_size 10 ai_similarity
ai_function_fix_grammar_batch_size 3 ai_fix_grammar
ai_function_mask_batch_size 3 ai_mask
ai_function_translate_batch_size 3 ai_translate
ai_function_extract_batch_size 1 ai_extract
ai_function_gen_batch_size 1 ai_gen
ai_function_query_batch_size 1 ai_query
ai_function_summarize_batch_size 1 ai_summarize

Prompt templates

Each function uses a configurable prompt template. Template placeholders use $0, $1, etc.

Parameter Placeholders Function
ai_function_sentiment_prompt $0=texts ai_analyze_sentiment
ai_function_classify_prompt $0=labels (JSON-encoded), $1=texts ai_classify
ai_function_extract_prompt $0=labels, $1=texts ai_extract
ai_function_fix_grammar_prompt $0=texts ai_fix_grammar
ai_function_mask_prompt $0=labels, $1=texts ai_mask
ai_function_summarize_prompt $0=word count, $1=texts ai_summarize
ai_function_translate_prompt $0=target language, $1=texts ai_translate
ai_function_similarity_prompt $0=text pairs ai_similarity
ai_function_gen_prompt $0=generation instruction ai_gen
ai_function_query_prompt $0=query instruction, $1=texts ai_query
ai_function_custom_query_prompt $0=query instruction, $1=texts ai_custom_query