API description for the RAG service - - Alibaba Cloud Documentation Center

Retrieval-Augmented Generation (RAG) of Platform for AI (PAI) provides various APIs for features, such as service management and chat. This topic describes the APIs and calling methods supported by RAG services deployed by using images with the versions earlier than v0.3.0.

Limits

This document is applicable only to RAG services deployed by using images with the versions earlier than v0.3.0.

To view the image version, perform the following operations: On the Elastic Algorithm Service (EAS) page, click the name of the desired RAG service. In the Environment Information section of the Overview tab, view the image version.

Obtain the access address and token of the service

Before you call a RAG service by using an API, obtain the access address and token of the service.

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
Click the name of the desired service. In the Basic Information section of the Overview tab, click View Endpoint Information.
On the Shared Gateway tab of the Invocation Method dialog box, obtain the service access address (EAS_SERVICE_URL) and token (EAS_Token).
Important
- Remove the forward slash (/) from the end of EAS_SERVICE_URL.
- To call the service by using a public endpoint, the client that you use must support access over the Internet.
- To call the service by using a VPC endpoint, the client that you use must be in the same virtual private cloud (VPC) as the service.

Chat API

Call the service by using the OpenAI API that is compatible with the service. Before calling the service, you need to complete the corresponding configurations on the WebUI page of the RAG service based on your business requirements.

Supported features

Web search: You need to configure web search parameters.
Knowledge base query: You need to upload knowledge base files.
LLM chat: Use large language model (LLM) services to provide answers. You need to configure LLM services.
Agent chat: You need to complete agent-related code configurations on the WebUI page of the RAG service.
Database or table query: You need to complete chat_db-related parameters on the WebUI page of the RAG service.

Method
URL	`{EAS_SERVICE_URL}/v1/chat/completions`
Request method	POST
Request header	`Authorization: EAS_TOKEN` # The token of the Elastic Algorithm Service (EAS) service.
HTTP body	{ "model": "default", # The model name. Set the value to default. "messages": [ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hello, how can I help you?"}, {"role": "user", "content": "What is the capital of Zhejiang province"}, {"role": "assistant", "content": "Hangzhou is the capital of Zhejiang province."}, {"role": "user", "content": "What are some interesting places to visit"}, ], "stream": true, # Specifies whether to use the streaming mode. "chat_knowledgebase": true, # Specifies whether to query the local knowledge base. "search_web": false, # Specifies whether to use web search. "chat_llm": false, # Specifies whether to use the LLM chat only. "chat_agent": false, # Specifies whether to use the agent. "chat_db": false, # Specifies whether to query the database. "index_name": "default_index", # The index name used for RAG scenarios. Only 1 index name is supported. If you leave this parameter empty, the system will use the default index name. "max_tokens": 1024, # The maximum number of tokens, such as 1024. "temperature": 0.1, # Controls the randomness of the generated content. Valid values: [0,1]. The smaller the value, the more deterministic the generated content. The larger the value, the higher the randomness of the generated content. } Important If you configure multiple features at the same time, the system calls them based on the priority from highest to lowest: search_web, chat_knowledgebase, chat_agent, chat_db, chat_llm. Additionally, for each feature, the system performs preliminary intent recognition to determine whether to call that feature or directly use the LLM to generate a response. If you set all features to false or do not configure the features, the local knowledge base (`"chat_knowledgebase": true`) is queried by default.

Request example (click here to view details)

Web search

from openai import OpenAI

##### API configuration #####
# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.

openai_api_key = ""
openai_api_base = "/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)


#### Chat ######
def chat():
    stream = True
    chat_completion = client.chat.completions.create(
        model="default",
        stream=stream,
        messages=[
            {"role": "user", "content": "Hello"},
            {"role": "assistant", "content": "Hello, how can I help you?"},
            {"role": "user", "content": "What is the capital of Zhejiang province"},
            {"role": "assistant", "content": "Hangzhou is the capital of Zhejiang province."},
            {"role": "user", "content": "What are some interesting places to visit"},
        ],
        extra_body={
            "search_web": True,
        },
    )

    if stream:
        for chunk in chat_completion:
            print(chunk.choices[0].delta.content, end="")
    else:
        result = chat_completion.choices[0].message.content
        print(result)


chat()

Database query

from openai import OpenAI

##### API configuration #####
# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.

openai_api_key = ""
openai_api_base = "/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)


#### Chat ######
def chat():
    stream = True
    chat_completion = client.chat.completions.create(
        model="default",
        stream=stream,
        messages=[
            {"role": "user", "content": "How many cats are there"},
            {"role": "assistant", "content": "There are 2 cats"},
            {"role": "user", "content": "What about dogs"},
        ],
        extra_body={
            "chat_db": True,
        },
    )

    if stream:
        for chunk in chat_completion:
            print(chunk.choices[0].delta.content, end="")
    else:
        result = chat_completion.choices[0].message.content
        print(result)


chat()

Management API

Upload knowledge base files

Method
URL	`{EAS_SERVICE_URL}/api/v1/upload_data`
Request method	POST
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service. `Content-Type: multipart/form-data`
Request parameter	files: the file. oss_path: the Object Storage Service (OSS) path. index_name: the index name. The default value is default_index.

cURL request example (click here to view details)

Upload a single file

 # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
 # Replace the path after "-F 'files=@" with the path to your file.
 # Configure index_name as the name of your knowledge base index.
   curl -X 'POST' /api/v1/upload_data \
  -H 'Authorization: ' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@example_data/paul_graham/paul_graham_essay.txt' \
  -F 'index_name=default_index'

Upload multiple files. Use multiple -F 'files=@path' parameters with each parameter corresponding to a file to be uploaded, as shown in the example:

  # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
  # Replace the path after "-F 'files=@" with the path to your file.
  # Configure index_name as the name of your knowledge base index.
  curl -X 'POST' /api/v1/upload_data \
  -H 'Authorization: ' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@example_data/paul_graham/paul_graham_essay.txt' \
  -F 'files=@example_data/another_file1.md' \
  -F 'files=@example_data/another_file2.pdf' \
  -F 'index_name=default_index'

Response example (click here to view details)

  { "task_id": "2c1e557733764fdb9fefa0635389****" }

Query the upload status

Method
URL	`{EAS_SERVICE_URL}/api/v1/get_upload_state`
Request method	GET
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service.

cURL request example (click here to view details)

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
# Configure task_id as the task_id value returned after you upload knowledge base files.
curl -X 'GET' '/api/v1/get_upload_state?task_id=2c1e557733764fdb9fefa0635389****' -H 'Authorization: '

Response example (click here to view details)

  {
    "task_id": "2c1e557733764fdb9fefa0635389****",
    "status": "completed",
    "detail": null
  }

Query a knowledge base

Method
URL	`{EAS_SERVICE_URL}/api/v1/query/retrieval`
Request method	POST
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service. `Content-Type: application/json`
Request parameter	question: the user question index_name: the name of the index. The default value is default_index.

cURL request example (click here to view details)

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
# Configure the question parameter as the user question. 
# Specify an index name for the index_name parameter.  
  curl -X 'POST' '<EAS_SERVICE_URL>/api/v1/query/retrieval' \
  -H 'Authorization: <EAS_TOKEN>' \
  -H 'Content-Type: application/json' \
  -d '{
      "question": "What can I do when the x13-auto-arima component reports an error?",
      "index_name": "default_index"
  }'

Response example (click here to view details)

{
  "docs": [
    {
      "text": "2.PAl-Studio/Designer FAQ 2.1. FAQ about algorithm components : \nCharacters that cannot be transcoded are displayed as \"blob.\" Ignore this error, because nodes in the downstream can read and process the data.\nWhat can I do when the x13-auto-arima component reports an error?\nMake sure that up to 1,200 training data samples are imported into the x13-auto-arima component.\nWhat can I do when the Doc2Vec component reports the CallExecutorToParseTaskFail error?",
      "score": 0.83608,
      "metadata": {
        "file_path": "***/pai_document.md",
        "file_name": "pai_document.md",
        "file_size": 3794,
        "creation_date": "2025-03-20",
        "last_modified_date": "2025-03-20"
      },
      "image_url": null
    }
  ]
}

Upload EXCEL or CSV files to query chat_db tables

Method
URL	`{EAS_SERVICE_URL}/api/v1/upload_datasheet`
Request method	POST
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service. `Content-Type: multipart/form-data`
Request parameter	The EXCEL or CSV file.

cURL request example (click here to view details)

  # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
  # Replace the path after "-F 'file=@" with the actual file path.
  curl -X 'POST' /api/v1/upload_datasheet \
  -H 'Authorization: ' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@example_data/titanic_train.csv'

Response example (click here to view details)

  {
    "task_id": "3b12cf5fabee4a99a32895d2f693****",
    "destination_path": "./localdata/data_analysis/titanic_train.csv",
    "data_preview": "xxx"
  }

Upload JSON files to supplement chat_db database information - Q&A pairs

Method
URL	`{EAS_SERVICE_URL}/api/v1/upload_db_history`
Request method	POST
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service. `Content-Type: multipart/form-data`
Request parameter	file: the JSON file. db_name: the database name.

cURL request example (click here to view details)

  # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
  # Replace the path after " -F 'file=@" with the JSON file path.
  # Configure db_name as the name of your database.
  curl -X 'POST' /api/v1/upload_db_history \
  -H 'Authorization: ' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@example_data/db_query_history.json' \
  -F 'db_name=my_pets'

Response example (click here to view details)

  {
    "task_id": "204191f946384a54a48b13ec00fd****",
    "destination_path": "./localdata/data_analysis/nl2sql/history/my_pets_db_query_history.json"
  }

Load database information

Method
URL	`{EAS_SERVICE_URL}/api/v1/query/load_db_info`
Request method	POST
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service.

cURL request example (click here to view details)

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
curl -X 'POST' <EAS_SERVICE_URL>/api/v1/query/load_db_info -H 'Authorization: <EAS_TOKEN>'

Response example (click here to view details)

"Load database info successfully."

Query all knowledge base indexes

Method
URL	`{EAS_SERVICE_URL}/api/v1/indexes`
Request method	GET
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service.

cURL request example (click here to view details)

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
curl -X 'GET' '/api/v1/indexes' -H 'Authorization: '

Response example (click here to view details)

  {
    "indexes": {
      "default_index": {
        "index_name": "default_index",
        "vector_store_config": {
          "persist_path": "localdata/storage",
          "type": "faiss",
          "is_image_store": false
        },
        "embedding_config": {
          "source": "huggingface",
          "model": "bge-m3",
          "embed_batch_size": 10,
          "enable_sparse": false
        }
      }
    },
    "current_index_name": "default_index"
  }

Create a knowledge base index

Method
URL	`{EAS_SERVICE_URL}/api/v1/indexes/{index_name}`
Request method	POST
Request header	Authorization: EAS_TOKEN # The token of the EAS service. Content-Type: application/json
Request parameter	index_name: the index name. vector_store_config: the vector database configuration. embedding_config: the embedding model configuration.

cURL request example (click here to view details)

    # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
    # Replace <my_index> with the new knowledge base index.
    # Configure vector_store_config as the vector database configuration.
    curl -X 'POST' '/api/v1/indexes/' \
    -H 'Authorization: ' \
    -H 'Content-Type: application/json' \
    -d '{
        "index_name": "",
        "vector_store_config": {
            "type": "faiss"
        },
        "embedding_config": {
            "model": "bge-m3",
            "source": "huggingface"
        }
    }'

The vector_store_config parameter in the preceding code uses Faiss as an example. The vector_store_config configurations for other vector databases are as follows:

Milvus

"vector_store_config":
      {
          "type":"milvus",
          "host":"c-xxxxx.milvus.aliyuncs.com",
          "port":19530,
          "user":"root",
          "password":"xxx",
          "database":"default",
          "collection_name":"test",
          "reranker_weights":[0.5,0.5]
      }

Hologres

"vector_store_config":
      {
          "type":"hologres",
          "host":"xxx",
          "port":xxx,
          "user":"xxx",
          "password":"xxx",
          "database":"default",
          "table_name":"test",
          "pre_delete_table":"false"
      }

Elasticsearch

"vector_store_config":
      {
          "type":"elasticsearch",
          "es_url":"xxx",
          "es_user":xxx,
          "es_password":"xxx",
          "es_index":"xxx"
      }

OpenSearch

"vector_store_config":
      {
          "type":"opensearch",
          "endpoint":"xxx",
          "instance_id":xxx,
          "username":"xxx",
          "password":"xxx",
          "table_name":"xxx"
      }

AnalyticDB

"vector_store_config":
      {
          "type":"analyticdb",
          "ak":"xxx",
          "sk":xxx,
          "region_id":"xxx",
          "instance_id":"xxx",
          "account":"xxx",
          "account_password":"xxx",
          "namespace":"xxx",
          "collection":"xxx"
      }

Tablestore

"vector_store_config":
      {
          "type":"tablestore",
          "endpoint":"xxx",
          "instance_name":xxx,
          "access_key_id":"xxx",
          "access_key_secret":"xxx",
          "table_name":"xxx"
      }

DashVector

"vector_store_config":
      {
          "type":"dashvector",
          "endpoint":"xxx",
          "api_key":xxx,
          "collection_name":"xxx",
          "partition_name":"xxx"
      }

Response example (click here to view details)

  { "msg": "Add index 'my_index' successfully." }

Update a knowledge base index

Method
URL	`{EAS_SERVICE_URL}/api/v1/indexes/{index_name}`
Request method	PATCH
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service. `Content-Type: application/json`
Request parameter	index_name: the index name. vector_store_config: the vector database configuration. embedding_config: th embedding model configuration.

cURL request example (click here to view details)

    # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
    # Replace <my_index> with the knowledge base index to be updated.
    # Configure vector_store_config as the vector database configuration to be updated.
    curl -X 'PATCH' '/api/v1/indexes/' \
    -H 'Authorization: ' \
    -H 'Content-Type: application/json' \
    -d '{
        "index_name": "",
        "vector_store_config": {
            "type": "faiss"
        },
        "embedding_config": {
            "model": "bge-m3",
            "source": "huggingface"
        }
    }'

The vector_store_config parameter in the preceding code uses Faiss as an example. For vector_store_config configurations of other vector databases, see Create a knowledge base index.

Response example (click here to view details)

  { "msg": "Update index 'my_index' successfully." }

Delete a knowledge base index

Method
URL	`{EAS_SERVICE_URL}/api/v1/indexes/{index_name}`
Request method	DELETE
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service. `Content-Type: application/json`
Request parameter	index_name: the index name.

cURL request example (click here to view details)

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
# Replace <my_index> with the knowledge base index to be deleted.
curl -X 'DELETE' '/api/v1/indexes/' -H 'Authorization: ' -H 'Content-Type: application/json' -d '{"index_name":""}'

Response example (click here to view details)

  { "msg": "Delete index 'my_index' successfully." }

Query the configurations of a RAG service

Method
URL	`{EAS_SERVICE_URL}/api/v1/config`
Request method	GET
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service.

cURL request example (click here to view details)

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
curl -X 'GET' '/api/v1/config' -H 'Authorization: '

Response example (click here to view details)

  {
    "system": {
      "default_web_search": false,
      "query_type": "websearch"
    },
    "data_reader": {
      "concat_csv_rows": false,
      "enable_mandatory_ocr": false,
      "format_sheet_data_to_json": false,
      "sheet_column_filters": null,
      "number_workers": 4
    },
    "node_parser": {
      "type": "Sentence",
      "chunk_size": 500,
      "chunk_overlap": 10,
      "enable_multimodal": true,
      "paragraph_separator": "\n\n\n",
      "sentence_window_size": 3,
      "sentence_chunk_overlap": 200,
      "breakpoint_percentile_threshold": 95,
      "buffer_size": 1
    },
    "index": {
      "vector_store": {
        "persist_path": "localdata/storage",
        "type": "faiss",
        "is_image_store": false
      },
      "enable_multimodal": true,
      "persist_path": "localdata/storage"
    },
    "embedding": {
      "source": "huggingface",
      "model": "bge-m3",
      "embed_batch_size": 10,
      "enable_sparse": false
    },
    "multimodal_embedding": {
      "source": "cnclip",
      "model": "ViT-L-14",
      "embed_batch_size": 10,
      "enable_sparse": false
    },
    "llm": {
      "source": "openai_compatible",
      "temperature": 0.1,
      "system_prompt": null,
      "max_tokens": 4000,
      "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "api_key": "sk-xxx",
      "model": "qwen-max"
    },
    "multimodal_llm": {
      "source": "openai_compatible",
      "temperature": 0.1,
      "system_prompt": null,
      "max_tokens": 4000,
      "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "api_key": "sk-xxx",
      "model": ""
    },
    "functioncalling_llm": null,
    "agent": {
      "system_prompt": "You are a travel assistant, xxx",
      "python_scripts": "xxx",
      "function_definition": "xxx",
      "api_definition": "xxx"
    },
    "chat_store": {
      "type": "local",
      "persist_path": "localdata/storage"
    },
    "data_analysis": {
      "type": "mysql",
      "nl2sql_prompt": "Give a question, xxx",
      "synthesizer_prompt": "Give a question, xxx",
      "database": "my_pets",
      "tables": [],
      "descriptions": {},
      "enable_enhanced_description": false,
      "enable_db_history": true,
      "enable_db_embedding": true,
      "max_col_num": 100,
      "max_val_num": 1000,
      "enable_query_preprocessor": true,
      "enable_db_preretriever": true,
      "enable_db_selector": true,
      "user": "root",
      "password": "xxx",
      "host": "127.0.0.1",
      "port": 3306
    },
    "intent": {
      "descriptions": {
        "rag": "\nThis tool can help you get more specific information from the knowledge base.\n",
        "tool": "\nThis tool can help you get travel information about time, weather, flights, train and hotels.\n"
      }
    },
    "node_enhancement": {
      "tree_depth": 3,
      "max_clusters": 52,
      "proba_threshold": 0.1
    },
    "oss_store": {
      "bucket": "",
      "endpoint": "oss-cn-hangzhou.aliyuncs.com",
      "ak": null,
      "sk": null
    },
    "postprocessor": {
      "reranker_type": "no-reranker",
      "similarity_threshold": 0.5
    },
    "retriever": {
      "vector_store_query_mode": "default",
      "similarity_top_k": 3,
      "image_similarity_top_k": 2,
      "search_image": false,
      "hybrid_fusion_weights": [0.7, 0.3]
    },
    "search": {
      "source": "google",
      "search_count": 10,
      "serpapi_key": "142xxx",
      "search_lang": "zh-CN"
    },
    "synthesizer": {
      "use_multimodal_llm": false,
      "system_role_template": "You are xxx",
      "custom_prompt_template": "Your goal is to provide accurate, useful, and easy-to-understand information. xxx"
    },
    "query_rewrite": {
      "enabled": true,
      "rewrite_prompt_template": "# Role\nYou are a professional information retrieval expert, xxx",
      "llm": {
        "source": "openai_compatible",
        "temperature": 0.1,
        "system_prompt": null,
        "max_tokens": 4000,
        "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
        "api_key": null,
        "model": ""
      }
    },
    "guardrail": {
      "endpoint": null,
      "region": null,
      "access_key_id": null,
      "access_key_secret": null,
      "custom_advice": null
    }
  }

Update the configurations of a RAG service

Method
URL	`{EAS_SERVICE_URL}/api/v1/config`
Request method	PATCH
Request header	`Authorization: EAS_TOKEN` # The token of the EAS service. `Content-Type: application/json`
Request parameter	new_config: the updated configuration.

cURL request example (click here to view details)

    # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively.
    curl -X 'PATCH' '{EAS_SERVICE_URL}/api/v1/config' \
    -H 'Authorization: EAS_TOKEN' \
    -H 'Content-Type: application/json' \
    -d '{
        "system": {
          "default_web_search": false,
          "query_type": "websearch"
        },
        "data_reader": {
          "concat_csv_rows": false,
          "enable_mandatory_ocr": false,
          "format_sheet_data_to_json": false,
          "sheet_column_filters": null,
          "number_workers": 4
        },
        "node_parser": {
          "type": "Sentence",
          "chunk_size": 500,
          "chunk_overlap": 10,
          "enable_multimodal": true,
          "paragraph_separator": "\n\n\n",
          "sentence_window_size": 3,
          "sentence_chunk_overlap": 200,
          "breakpoint_percentile_threshold": 95,
          "buffer_size": 1
        },
        ...
    }' # (For more information, see the response example in the "Query the configurations of a RAG service" section.)

Response example (click here to view details)

  { "msg": "Update RAG configuration successfully." }