All Products
Search
Document Center

Hologres:AI functions

Last Updated:Feb 04, 2026

Hologres V3.2 and later supports AI functions and provides operators such as Embedding, Rank, and LLM. You can call AI functions directly using standard SQL without a separate inference service. This enables you to implement AI scenarios, such as building and using enterprise-level knowledge bases.

Prerequisites

  1. Purchase AI resources.

  2. You have deployed a model.

Summary of AI functions

Hologres supports the following AI functions:

  • Each AI function is assigned an optimal model by default based on the deployed models. You can view the default model assigned to an AI function in the system table.

  • When calling an AI function, you do not need to specify a model name. The system automatically uses the default model. To change the model for an AI function, modify the system table. For more information, see Modify the model for an AI function.

  • Each model requires different AI resources for deployment. Purchase the appropriate specifications as needed.

Function name

Description

Supported models

Supported versions

to_file

Converts a URL to the FILE type.

No model is required. It is typically used with Object Tables.

V4.0 and later

prompt

Assembles prompts for a large language model (LLM). It can wrap multimodal prompts.

No model is required.

ai_parse_document

Parses unstructured data, such as PDFs and images, into text.

ds4sd/docling-models.

ai_embed

Computes a fixed-dimension continuous vector for the given text or image.

Text:

  • iic/nlp_gte_sentence-embedding_chinese series.

  • Qwen/Qwen3-Embedding-XB series.

Image: clip-ViT-B series.

  • V3.2 and later for text embedding

  • Version 4.0 and later support image vectors.

ai_rank

Scores the relevance of given text.

Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

V3.2.2 and later

ai_chunk

Splits long text into segments.

recursive-character-text-splitter.

ai_gen

Calls an LLM using a prompt to perform inference on text or images and outputs the result.

Text: Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

Image: Qwen/Qwen2.5-VL-xB type.

  • V3.2 and later for text inference

  • V4.0 and later for image inference

ai_classify

Classifies input text based on the provided classification labels.

Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

V3.2 and later

ai_extract

Extracts specified label information from input text.

Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

ai_mask

Masks specified label information in the input text. The masked information is replaced with the [MASKED] placeholder.

Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

ai_fix_grammar

Fixes syntax errors in the input text.

Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

ai_summarize

Generates a summary of a piece of text.

Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

ai_translate

Translates input text into a specified language.

Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

ai_similarity

Calculates the similarity between two input texts.

Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

ai_analyze_sentiment

Performs sentiment analysis on the input text.

  • Qwen3 series LLMs. Qwen/Qwen3-32B is recommended.

  • iic/nlp_structbert_sentiment-classification_chinese-base model.

Use AI functions

ai_embed

  • Description: Computes a fixed-dimension continuous vector for input text or images.

    --Compute text embedding
    select ai_embed([model,] content);
    --Compute image embedding
    select ai_embed([model,] file);
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • content: Required. The input text. It supports character types such as CHAR, VARCHAR, and TEXT.

    • file: Required. The FILE type. This is typically a FILE object for an image. This type is supported only in Hologres V4.0 and later.

  • Return value

  • Examples

    • Text embedding

      SELECT ai_embed('Hologres is a one-stop real-time data warehouse engine developed by Alibaba. It supports real-time writing, updating, processing, and analysis of massive amounts of data.');

      The following result is returned.

      ai_embed
      -------
      {-0.020090256, -0.009496426, -0.01584659, ..., -0.057956327}
    • Image embedding

      --Image embedding. The example shows how to create an embedding for an image in OSS.
      SELECT ai_embed('clip-ViT-B-32', to_file('oss://****', 'oss-cn-hangzhou-internal.aliyuncs.com', 'roleran'));

ai_rank

  • Description: Scores the relevance of a given document.

    SELECT ai_rank([model,] source_sentence, sentence_to_compare);
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • source_sentence: Required. The text content. It supports character types such as CHAR, VARCHAR, and TEXT.

    • sentence_to_compare: Required. The statement to compare with the source_sentence parameter. It supports character types such as CHAR, VARCHAR, and TEXT.

  • Return value

    • Returns a relevance score of the FLOAT type. The value is in the range of [0, 1]. A higher value indicates higher relevance.

    • If either the source_sentence or sentence_to_compare parameter is NULL, 0 is returned.

  • Examples

    SELECT knowledge, ai_rank('What is Alibaba''s revenue in 2024?', knowledge) AS score
      FROM (
          VALUES ('Amazon''s 2024 revenue was 638 billion USD'), 
                 ('Alibaba''s 2024 revenue was 941.168 billion CNY'), 
                 ('Alibaba''s 2023 revenue was 868.687 billion CNY')
      ) AS knowledge_table(knowledge)
      ORDER BY score DESC;

    The following result is returned.

    knowledge	                  | score
    -----------------------------|-------
    Alibaba's 2024 revenue was 941.168 billion CNY	 |0.899999976
    Alibaba's 2023 revenue was 868.687 billion CNY |0.200000003
    Amazon's 2024 revenue was 638 billion USD	   |0.100000001

ai_chunk

  • Description: Splits long text into segments (chunks).

    SELECT ai_chunk([model,] long_sentence[, chunk_size, chunk_overlap, separators])
  • Parameters

    Parameter name

    Description

    model

    Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    long_sentence

    Required. The source text to be segmented. It supports character types such as CHAR, VARCHAR, and TEXT.

    chunk_size

    Optional. The length of each chunk. This parameter is of the INT type. The default value is 300.

    chunk_overlap

    Optional. The overlap length between adjacent chunks to avoid splitting sentences across chunks and breaking their semantics. This parameter is of the INT type. The default value is 50.

    separators

    Optional. The separators for splitting chunks. This parameter is of the TEXT[] type. The default value is ["\n\n", "\n", " ", ""], which is suitable for English text. For Chinese text, use Chinese separators such as ["\n\n", "\n", ".", "!", "?", ";", ",", " "].

  • Return Value Description

    • Returns a TEXT[] type, which is a list of the segmented chunks.

    • If the long_sentence parameter is NULL, NULL is returned.

  • Examples

    SELECT ai_chunk('Hologres is a one-stop real-time data warehouse engine (Real-Time Data Warehouse) developed by Alibaba. It supports real-time writing, updating, processing, and analysis of massive amounts of data. It supports standard SQL (compatible with the PostgreSQL protocol and syntax, and supports most PostgreSQL functions). It supports petabyte-scale multidimensional analysis (OLAP) and ad hoc analysis. It supports high-concurrency and low-latency online data services (Serving). It supports fine-grained isolation for multiple workloads and enterprise-level security capabilities. It is deeply integrated with MaxCompute, Flink, and DataWorks to provide an enterprise-level, integrated, online and offline, full-stack data warehouse solution.',40,10);

    The following result is returned.

    ai_chunk
    ---
    "{"Hologres is a one-stop real-time data warehouse (Real-Time Data","Warehouse) engine developed by Alibaba. It supports real-time writing, updating, processing,","and analysis of massive data. It is compatible with standard SQL, the PostgreSQL","protocol and syntax, and most PostgreSQL functions. It also supports petabyte-scale","multidimensional analysis (OLAP), ad hoc analysis (Ad","Hoc), high-concurrency and low-latency online data services (Serving), fine-grained","isolation for various workloads, and enterprise-level security capabilities. Deeply","integrated with MaxCompute, Flink, and DataWorks, it provides an enterprise-level,","integrated, full-stack data warehouse solution for both offline and online scenarios."}"

ai_gen

  • Description: Calls an LLM using a prompt to perform inference on text or images and outputs the result.

    --Text inference
    SELECT ai_gen([model,] text)
    --Image inference
    SELECT ai_gen([model,] text, file)
    --Encapsulate prompt
    SELECT ai_gen([model,] prompt)
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • text: Required. The input prompt. It supports character types such as CHAR, VARCHAR, and TEXT.

    • file: Required. The FILE type, such as an image converted to a FILE type from an Object Table. This parameter is supported only in Hologres V4.0 and later.

    • prompt: Required. The JSON type. This is the return value of the prompt() function.

  • Return value

    • Returns the LLM's answer to the question.

    • If the text parameter is NULL, NULL is returned.

    • If the text parameter is an empty string (""), an empty string ("") is returned.

    • If the prompt parameter is NULL, an error is reported.

  • Examples

    • Text inference

      CREATE TABLE questions (
          question TEXT
      );
      
      INSERT INTO questions (question) VALUES
        ('What is artificial intelligence?'),
        ('How can I improve my spoken English?'),
        ('What are the key points for a healthy diet?');
      
      SELECT
        question,
        ai_gen('Please answer the following question in 20 words: ' || question) AS answer
      FROM
        questions;

      The following result is returned.

       question            |	answer
        --------------------|-------
      How can I improve my spoken English?	  |Speak and practice more, imitate pronunciation, build vocabulary, and be brave to speak.
      What are the key points for a healthy diet?	|Eat a balanced diet, control oil, salt, and sugar, eat more fruits and vegetables, and drink water in moderation.
      What is artificial intelligence?	        |Artificial intelligence is a computer system that simulates human intelligence. It can learn, reason, perceive, and solve problems.
    • Image inference

      SELECT  ai_gen('jpg_llm','What is in this picture?', to_file('oss://****/bd****k/val/images/b9b53753-91a5****.jpg','oss-cn-hangzhou-internal.aliyuncs.com','acs:ram::****' ) )

      The following result is returned.

      ai_gen
      -----
      This picture shows a city street scene. A white car with the license plate BTB-9784 is parked on the side of the road. There are several cars on the street, including a yellow taxi. In the background, there are buildings and trees. The weather looks a bit gloomy, possibly a rainy day. There are also pedestrians and traffic lights on the street.

ai_classify

  • Description: Classifies input text based on the provided classification labels.

    SELECT ai_classify([model,] content, labels)
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • content: Required. The text to be classified. It supports character types such as CHAR, VARCHAR, and TEXT.

    • labels: Required. A list of expected classification labels. The number of labels must be between 2 and 20. This parameter is of the ARRAY type.

  • Return value

    • Returns the classification result, which is the matched label. The return type is TEXT.

    • If the content parameter is NULL, NULL is returned.

    • If content is an empty string (""), NULL is returned.

    • If the number of labels is incorrect, an error is reported.

  • Examples

    CREATE TABLE product_detail(
        product_name TEXT,
        product_desc TEXT
    );
    INSERT INTO product_detail VALUES
    ('iphone','Apple phone'),
    ('p50','Huawei phone'),
    ('x200','vivo phone'),
    ('aaa','Dior dress'),
    ('bbb','Dior pants'),
    ('Sheng Sheng Oolong','Milk tea from Cha Yan Yue Se'),
    ('Sandwich cookie','Oreo cookie');
    
    --Use ai_classify to classify text
    SELECT
        product_name,
        ai_classify(product_desc, ARRAY['Electronics', 'Clothing', 'Food']) AS category
      FROM
         product_detail
      LIMIT 10;

    The following result is returned.

    product_name	|category
    --------------|------
    aaa	          |Clothing
    iphone	      |Electronics
    Sheng Sheng Oolong	      |Food
    p50	          |Electronics
    x200	        |Electronics
    bbb	          |Clothing
    Sandwich cookie	      |Food

ai_extract

  • Description: Extracts specified label information from input text.

    SELECT ai_extract([model,] content, labels)
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • content: Required. The input text. It supports character types such as CHAR, VARCHAR, and TEXT.

    • labels: Required. The specified label information. The number of labels must be between 1 and 20. This parameter is of the ARRAY type.

  • Return value

    • Returns the extracted information for each label in JSON format.

    • If the content parameter is NULL or an empty string (""), NULL is returned.

    • If the number of labels is incorrect, an error is reported.

  • Examples

    CREATE TABLE users (
      user_id TEXT,
      resume TEXT
    );
    
    INSERT INTO users (user_id, resume) VALUES
      ('u001', 'Name: Zhang San, Male, 28 years old. Email: zhangsan@example.com, Phone: 1380013****. Rich work experience.'),
      ('u002', 'Name: Li Si, Female, 35 years old. Phone: 1390013****, Email: lisi@example.com. Has management experience.'),
      ('u003', 'Name: Wang Wu, Male, 25 years old. Email: wangwu@example.com. Phone: 1370013****.');
    
    SELECT
      user_id,
      ai_extract(resume, ARRAY['Name','Email','Phone','Gender','Age']) AS user_desc_obj
    FROM
      users;

    The following result is returned.

    user_id	|user_desc_obj
    --------|-------------
    u002	  |"{"Name":"Li Si","Age":"35 years old","Gender":"Female","Phone":"1390013****","Email":"lisi@example.com"}"
    u003	  |"{"Name":"Wang Wu","Age":"25 years old","Gender":"Male","Phone":"1370013****","Email":"wangwu@example.com"}"
    u001	  |"{"Name":"Zhang San","Age":"28 years old","Gender":"Male","Phone":"1380013****","Email":"zhangsan@example.com"}"

ai_mask

  • Description: Masks specified label information in the input text. The masked information is replaced with the [MASKED] placeholder.

    SELECT ai_mask([model,] content, labels)
  • Parameters:

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • content: Required. The input text to be masked. It supports character types such as CHAR, VARCHAR, and TEXT.

    • labels: Required. The label information to be masked. The number of labels must be between 1 and 20. This parameter is of the ARRAY type.

  • Return value

    • Returns the masked text content.

    • If the content parameter is NULL, NULL is returned.

    • If the content parameter is an empty string (""), an empty string is returned.

    • If the number of labels is incorrect, an error is reported.

  • Examples

    SELECT ai_mask(
      'User Wang Xiaoming, ID card number: 2303061111111111, mobile phone number: 1388888****.',
      ARRAY['ID Card', 'Mobile Phone Number']); 

    The following result is returned.

    ai_mask
    -------
    User Wang Xiaoming, ID card number: [MASKED], mobile phone number: [MASKED].

ai_fix_grammar

  • Description: Fixes syntax errors in the input text.

    SELECT ai_fix_grammar([model,] content)
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • content: Required. The input text with syntax to be fixed. It supports character types such as CHAR, VARCHAR, and TEXT.

  • Return value

    • Returns the corrected text content.

    • If the content parameter is NULL, NULL is returned.

    • If the content parameter is an empty string (""), an empty string ("") is returned.

  • Examples

    SELECT ai_fix_grammar('He dont know what to did.');

    The following result is returned.

    ai_fix_grammar
    --------------
    He doesn't know what to do.

ai_summarize

  • Description: Generates a summary of the input text.

    SELECT ai_summarize([model,] content[, max_words])
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • content: Required. The input text. It supports character types such as CHAR, VARCHAR, and TEXT.

    • max_words: Optional. The maximum number of words for the LLM to output. The model tries to generate a result that is close to this value. The default value is 50. If you set it to 0, there is no limit.

  • Return value

    • Returns a summary of the text.

    • If the content parameter is NULL, NULL is returned.

    • If the content parameter is an empty string (""), an empty string ("") is returned.

    • If max_words is less than 0, an error is reported.

  • Examples

    SELECT ai_summarize('Hologres is a one-stop real-time data warehouse engine (Real-Time Data Warehouse) developed by Alibaba. It supports real-time writing, updating, processing, and analysis of massive amounts of data. It supports standard SQL (compatible with the PostgreSQL protocol and syntax, and supports most PostgreSQL functions). It supports petabyte-scale multidimensional analysis (OLAP) and ad hoc analysis. It supports high-concurrency and low-latency online data services (Serving). It supports fine-grained isolation for multiple workloads and enterprise-level security capabilities. It is deeply integrated with MaxCompute, Flink, and DataWorks to provide an enterprise-level, integrated, online and offline, full-stack data warehouse solution.', 15);

    The following result is returned.

    ai_summarize
    ------------
    Hologres is a real-time data warehouse engine developed by Alibaba that supports real-time processing and multidimensional analysis of massive data.

ai_translate

  • Description: Translates input text into a specified language.

    SELECT ai_translate([model,] content, to_lang)
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • content: Required. The input text to be translated. It supports character types such as CHAR, VARCHAR, and TEXT.

    • to_lang: Required. The target language code. For more information, see ISO 639.

  • Return value

    • Returns the translated text.

    • If the content parameter is NULL, NULL is returned.

    • If the content parameter is an empty string (""), an empty string ("") is returned.

    • If the to_lang parameter is invalid, an error is reported.

  • Examples

    SELECT ai_translate('Hologres is a one-stop real-time data warehouse engine (Real-Time Data Warehouse) developed by Alibaba. It supports real-time writing, updating, processing, and analysis of massive amounts of data. It supports standard SQL (compatible with the PostgreSQL protocol and syntax, and supports most PostgreSQL functions). It supports petabyte-scale multidimensional analysis (OLAP) and ad hoc analysis. It supports high-concurrency and low-latency online data services (Serving). It supports fine-grained isolation for multiple workloads and enterprise-level security capabilities. It is deeply integrated with MaxCompute, Flink, and DataWorks to provide an enterprise-level, integrated, online and offline, full-stack data warehouse solution.', 'en');

    The following result is returned.

    ai_translate
    -----------
    Hologres is a self-developed one-stop real-time data warehouse engine by Alibaba, supporting real-time writing, real-time updating, real-time processing, and real-time analysis of massive data. It supports standard SQL (compatible with PostgreSQL protocol and syntax, supporting most PostgreSQL functions), supports multi-dimensional analysis (OLAP) and ad-hoc analysis at the PB-level, supports high-concurrency, low-latency online data services (Serving), supports fine-grained isolation for multiple workloads and enterprise-level security capabilities, and is deeply integrated with MaxCompute, Flink, and DataWorks, providing an enterprise-level fully stacked data warehouse solution that integrates online and offline processing.

ai_similarity

  • Description: Calculates the similarity between two input texts.

    SELECT ai_similarity([model,] text1, text2)
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • text1 and text2: The two texts to be compared for similarity. They support character types such as CHAR, VARCHAR, and TEXT.

  • Return value

    • Returns a FLOAT value in the range of [0, 1]. A higher value indicates higher similarity. 0 means not similar at all, and 1 means the two texts are identical.

    • If either the text1 or text2 parameter is NULL, 0 is returned.

    • If both the text1 and text2 parameters are empty strings (""), 1 is returned.

    • If one of the text1 and text2 parameters is an empty string ("") and the other is a non-empty string, 0 is returned.

  • Examples

    CREATE TABLE products2 (
        product_name TEXT
    );
    
    INSERT INTO products2 (product_name) VALUES
      ('white shirt'), ('black dress pants'), ('casual top'), ('sports jacket'), ('white dress'),
      ('Bluetooth headset'), ('milk chocolate'), ('white top'), ('men''s T-shirt'), ('down jacket');
    
    SELECT product_name FROM products2 
      ORDER BY ai_similarity(product_name, 'white top') DESC LIMIT 5;
    

    The following result is returned.

    product_name
    ----------
    white top
    white shirt
    casual top
    white dress
    men's T-shirt

ai_analyze_sentiment

  • Description: Performs sentiment analysis on the input text.

    select ai_analyze_sentiment([model,] content);
  • Parameters

    • model: Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    • content: Required. The input text. It supports character types such as CHAR, VARCHAR, and TEXT.

  • Return value

    • Returns the sentiment labels after analysis. The return type is string. The returned labels vary depending on the model.

      • For Qwen3 series LLMs, one of the following labels is returned: positive, negative, neutral, or mixed. If the input is empty, NULL is returned.

      • For the iic/nlp_structbert_sentiment-classification_chinese-base model, the label with the highest probability is returned: positive, negative, or NULL. If the input is empty, NULL is returned.

    • If the content parameter is NULL or an empty string (""), NULL is returned.

  • Examples

    --LLM:
    SELECT ai_analyze_sentiment('Wedding night, and success in the imperial examination.');
    -- output example: positive
    
    --Use the iic/nlp_structbert_sentiment-classification_chinese-base model
    SELECT ai_analyze_sentiment('A drizzling rain falls on the Mourning Day; The mourner''s heart is going to break on his way.');
    -- output example: negative

ai_parse_document

  • Description: Parses unstructured data from various formats, such as PDF, image, Word, PPT, TXT, and Markdown, into text.

    SELECT ai_parse_document([model,] input_bytes , input_format [, output_format]);
    SELECT ai_parse_document([model,] file [, input_format, output_format]);
  • Parameters

    Parameter Name

    Description

    model

    Optional. The name of the model used by the AI function. By default, the system automatically assigns the optimal model from the deployed models. To change the model, first deploy the target model, and then modify the system table configuration. For more information, see Modify the model for an AI function.

    input_bytes

    Required. The binary content of the file to be parsed. The type is BYTEA.

    file

    Required. The File type. We recommend that you use this with an Object Table.

    input_format

    Optional. The file format. The type is TEXT. The default value is auto. Supported formats include PDF, Word, PPT, TXT, IMAGE, and AUTO.

    • Supported image formats: ["jpg", "jpeg", "png", "tif", "tiff", "bmp"].

    • If an OSS directory contains documents of different types, you can set input_format=auto. The model then automatically determines the file type based on the file name extension.

    output_format

    Optional. The format of the parsing result. The type is text. The default value is JSON. Supported formats include JSON and markdown.

  • Return value

    Returns a text type. The format is determined by the output_format setting.

    • If output_format=JSON, the output is a text type JSON string. You must explicitly cast it to JSON when you use it.

    • If parsing fails, text describing the error is returned instead of reporting an error.

  • Examples

    • Convert a single PDF file in OSS to text.

      SELECT object_uri, etag, ai_parse_document(to_file ('oss://xxxx-hangzhou/bs_challenge_financial_14b_dataset/pdf', 'oss-cn-hangzhou-internal.aliyuncs.com', 'acs:ram::18xxx:role/xxx'), 'auto', 'markdown') AS doc 
          FROM pdf_bs_challenge_financial_14b_dataset limit 1);
    • Convert unstructured data from an Object Table to text. For more information, see Unstructured data (Object Table).

prompt

  • Description: This helper function for AI functions assembles prompts for LLMs. It can include multimodal prompts.

    SELECT prompt('<template_string>', <expr_1> [ , <expr_2>, ... ])
        FROM <table>;
  • Usage notes

    Scalar strings are not supported. If there is only one string, pass it directly to the LLM. Avoid using the prompt function. This function is typically used with a FROM clause.

  • Parameters

    • template_string: Required. The prompt template string. The type is TEXT. Use {0} and {1} as placeholders for variables.

    • <expr_1> [ , <expr_2>, ... ]: Multiple expression parameters. Types such as TEXT, NUMERIC, and FILE are supported.

  • Return value

    Typically returns a JSON type with the following format:

    {
      "prompt": "<template_string>",
      "args": ARRAY(<value_1>, <value_2>, ...)
    }

    Special cases:

    • If the template_string parameter is NULL, an error is reported.

    • If an expression returns NULL, it is replaced by the string "None" in the template_string.

    • If a row in the result is all NULL, the row is not filtered. Instead, all values in Args are filled with "None".

  • Examples

    create table customer_service_konwledge_detail(
      question text, 
      question_summarize text
    );
    
    insert into customer_service_konwledge_detail values
    ('Many OOM SQL statements suddenly appeared in the instance', 'Backend investigation found that the customer''s traffic increased, and the existing resources were insufficient to support the traffic. The customer has resolved the issue by scaling out'),
    ('DataWorks cannot connect to Hologres', 'This is not our issue. Contact the DataWorks support on duty');
    
    -- prompt
    SELECT question, question_summarize, ai_gen(
      prompt('The customer''s ticket content is: {0}. The Alibaba Cloud support''s answer is: {1}. Did this answer solve the customer''s problem? Output only yes or no', question, question_summarize)) 
          from customer_service_konwledge_detail;
         

    The following result is returned.

              question           |                                 question_summarize                                 | ai_gen 
    -----------------------------+------------------------------------------------------------------------------------+--------
     DataWorks cannot connect to Hologres | This is not our issue. Contact the DataWorks support on duty                                              | No
     Many OOM SQL statements suddenly appeared in the instance    | Backend investigation found that the customer's traffic increased, and the existing resources were insufficient to support the traffic. The customer has resolved the issue by scaling out | Yes
    (2 rows)

to_file

  • Description: This tool function converts a URL to the FILE type.

    select to_file(oss_url, oss_endpoint, oss_rolearn);
  • Usage notes

    This is a tool function and does not require a model.

  • Parameters

    • oss_url: Required. The path of the OSS file to be parsed. The type is text.

    • oss_endpoint: Required. The domain name of the OSS region. The type is text. Only classic network domain names are supported.

    • oss_rolearn: Required. The RoleARN for accessing OSS.

  • Return value

    Returns the FILE type. An error is reported if the URL file path is invalid or the file does not exist.

  • Examples

    select to_file('oss://****/bd****k/val/images/b9b53753-91a5****.jpg','oss-cn-hangzhou-internal.aliyuncs.com','acs:ram::****' );

AI functions and models

View the mapping between functions and models

Hologres provides the list_ai_function_infos system table to view the mapping between AI functions and models. After you deploy a model in the Hologres console, this system table is automatically updated with the corresponding model for each AI function. You can then call the corresponding model using the AI function.

Note

Different AI functions must be adapted to specific types of models. For example, ai_embed is suitable for embedding models, and ai_classify is suitable for LLMs. If only one type of model is deployed in an instance, some AI functions may not have a model assigned. If an AI function does not have a corresponding model deployed, you cannot use that AI function.

SELECT * FROM list_ai_function_infos();

The following result is returned.

    function_name     |    model_name    
----------------------+------------------
 ai_embed             | my_gte_embedding
 ai_classify          | my_qwen32b
 ai_extract           | my_qwen32b

Modify the model for an AI function

AI functions have default mappings to deployed models. You can modify the model for an AI function as follows. After modification, the new model is called when you use the AI function.

  • Global modification

    SELECT set_ai_function_info('<function_name>', '<model_name>');

    Parameters

    • function_name: The name of the AI function. You can view AI function names in the Summary of AI functions table.

    • model_name: The name of a deployed model. You can log on to the Hologres console and view the names of deployed models on the AI Node page.

    Note

    An error is reported if the specified AI function name or deployed model name does not exist.

    Examples

    SELECT set_ai_function_info('ai_embed', 'my_gte_embedding');
  • Session-level modification

    After a session-level modification, the session-level configuration takes precedence over the global configuration (ai_function_info) when you call a model using an AI function.

    --Takes effect only for the current connection
    SET hg_experimental_ai_function_name_to_model_name_mapping='<function_name>:<model_name>[,<function_name1>:<model_name1>]';

Best practices

After you understand the basic usage of AI functions, you can learn how to combine them to solve complex business problems through the following real-world best practices.