All Products
Search
Document Center

ApsaraDB RDS:Use the EAS module of PAI and ApsaraDB RDS for PostgreSQL to deploy a RAG-based LLM chatbot

Last Updated:Mar 28, 2026

This tutorial shows you how to build a Retrieval-Augmented Generation (RAG)-based large language model (LLM) chatbot using Platform for AI (PAI) Elastic Algorithm Service (EAS) and ApsaraDB RDS for PostgreSQL as the vector database.

In this tutorial, you:

  • Set up a vector database in an ApsaraDB RDS for PostgreSQL instance

  • Deploy a RAG-based chatbot through the EAS console

  • Test the chatbot using the web UI with retrieval, LLM, and RAG query modes

  • Call the chatbot API from cURL and Python

How it works

RAG addresses the accuracy limits of standalone LLM applications by combining retrieval with generation. When a query arrives, the system retrieves relevant documents from the vector database, injects them into the prompt, and sends the enriched prompt to the LLM — producing answers grounded in your specific knowledge base rather than the model's training data alone. No model retraining is needed.

ApsaraDB RDS for PostgreSQL serves as the vector database in this architecture. The pg_jieba extension enables keyword-based retrieval and recall for Chinese text.

Prerequisites

Before you begin, ensure that you have:

If you use Faiss to build the vector database, an OSS bucket is required.

Limitations

  • The ApsaraDB RDS for PostgreSQL instance and EAS must reside in the same region.

  • The chatbot is limited by the LLM service's token limit. Long conversations may hit this limit. To reduce the likelihood of reaching the limit in single-turn scenarios, disable Chat history in the web UI. For details, see Disable chat history.

Step 1: Set up the vector database

  1. Create an ApsaraDB RDS for PostgreSQL instance. Place it in the same region as your planned EAS deployment to enable VPC-internal connectivity. For more information, see Create an instance.

  2. Create a privileged account and a database for the instance. For more information, see Create a database and an account.

    • Set Account Type to Privileged Account.

    • When creating the database, select the privileged account from the Authorized By drop-down list.

  3. Get the database connection details.

    1. Go to the Instances page. In the top navigation bar, select the region where your instance resides, then click the instance ID.

    2. In the left-side navigation pane, click Database Connection.

    3. Note the endpoint and port number. You will need them when deploying the chatbot.

  4. Add pg_jieba to the shared_preload_libraries parameter. In the instance parameters page, find shared_preload_libraries and add pg_jieba to Running Parameter Value — for example: 'pg_stat_statements,auto_explain,pg_cron'. For more information, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.

    The pg_jieba extension segments Chinese text for keyword-based retrieval and recall. For more information, see Use the pg_jieba extension.

Step 2: Deploy the RAG-based chatbot

  1. Log on to the Platform for AI (PAI) console.

  2. In the left-side navigation pane, click Workspaces. On the Workspaces page, find your workspace and click its name. If no workspace exists, create one. For more information, see Create a workspace.

  3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS).

    image

  4. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click RAG-based Smart Dialogue Deployment.

    6eea7736f88e6ec8b3b900e4d028bb48

  5. On the RAG-based LLM Chatbot Deployment page, configure the parameters.

    Basic information

    ParameterDescription
    Service NameThe name of the service.
    Model SourceThe model source. Valid values: Open Source Model and Custom Fine-tuned Model.
    Model TypeThe model type. Select based on your requirements. If Model Source is Custom Fine-tuned Model, also configure the parameter quantity and precision.
    Model SettingsRequired when Model Source is Custom Fine-tuned Model. Specify where the fine-tuned model file is stored. The model file format must be compatible with Hugging Face Transformers. Valid values: Mount OSS (select the OSS path) and Mount NAS (select the NAS file system and source path).

    Resource configuration

    ParameterDescription
    Resource ConfigurationIf Model Source is Open Source Model, the system selects an instance type automatically. If Model Source is Custom Fine-tuned Model, select an instance type that matches your model. For more information, see Deploy LLM applications in EAS.
    Inference AccelerationAvailable for the Qwen, Llama2, ChatGLM, or Baichuan2 model on A10 or GU30 instances. Options: BladeLLM Inference Acceleration (high concurrency and low latency) and Open-source vLLM Inference Acceleration.

    Vector database settings

    ParameterDescription
    Vector Database TypeSelect RDS PostgreSQL.
    Host AddressThe internal or public endpoint of the RDS instance. Use the internal endpoint when the RAG application and the database are in the same region. If they are in different regions, use the public endpoint — see Apply for or release a public endpoint.
    PortDefault: 5432.
    DatabaseThe name of the database you created in Step 1.
    Table NameA new or existing table name. If you use an existing table, its schema must be compatible with the RAG-based LLM chatbot format.
    AccountThe privileged account of the RDS instance.
    PasswordThe password of the privileged account.

    VPC configuration

    ParameterDescription
    VPCIf Host Address is an internal endpoint, select the VPC of the RDS instance. If Host Address is a public endpoint, configure a VPC and vSwitch, then create a NAT gateway and an elastic IP address (EIP) for internet access. Add the EIP to the IP address whitelist of the RDS instance. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet and Configure an IP address whitelist.
    vSwitchThe vSwitch associated with your VPC.
    Security Group NameThe security group. Do not use the security group named created_by_rds — it is reserved for system access control.
  6. Click Deploy. When the Service Status column shows Running, the chatbot is deployed.

Step 3: Test with the web UI

Use the built-in web UI to validate chatbot performance before integrating it into your application.

Configure the chatbot

  1. On the EAS page, click View Web App in the Service Type column.

  2. Set the embedding model:

    • Embedding Model Name: Four models are available. The optimal model is selected by default.

    • Embedding Dimension: Auto-configured after you select an Embedding Model Name.

  3. Click Connect PostgreSQL to verify the connection to the RDS vector database. The connection settings come from the deployment configuration and cannot be modified here.

Upload knowledge base files

On the Upload tab, upload your business data files.

image

Supported file formats: TXT, PDF, XLSX, XLS, CSV, DOCX, DOC, Markdown, and HTML.

  1. Configure chunking parameters to control how documents are split.

    ParameterDescriptionDefault
    Chunk SizeThe size of each chunk, in bytes.500
    Chunk OverlapThe overlap between adjacent chunks.10
    Process with QA Extraction ModelWhen set to Yes, the system extracts question-answer pairs from uploaded files, improving retrieval precision.
  2. Upload files on the Files tab, or upload a directory on the Directory tab. For example, upload the rag_chatbot_test_doc.txt file to test.

  3. The system runs data cleansing (text extraction and hyperlink replacement) and semantic-based chunking before storing the data.

    image

Configure inference parameters

On the Chat tab, configure retrieval and generation parameters.

Retrieval-based query settings

image
ParameterDescription
Streaming OutputReturn results in streaming mode.
Retrieval ModelThe retrieval method. Embedding Only: vector database-based retrieval. Keyword Only: keyword-based retrieval. Hybrid: combines both methods. In most scenarios, vector-based retrieval delivers better results. For corpora where precise keyword matching matters, Keyword Only or Hybrid may perform better. ApsaraDB RDS for PostgreSQL uses pg_jieba for Chinese text segmentation. For more information, see Use the pg_jieba extension.
Reranker TypeApply a second-pass ranking model to improve result precision. You can use the simple-weighted-reranker or model-based-reranker to perform a higher-precision re-rank operation on the top K results.
Note

If you use a model for the first time, you may need to wait for a period of time before the model is loaded.

Top KThe number of top results to retrieve from the vector database.
Similarity Score ThresholdThe minimum similarity score for a result to be returned. A higher value returns fewer but more relevant results.

RAG query settings (retrieval + LLM)

image

Select a predefined prompt template or define a custom one. You can configure Streaming Output, Retrieval Model, and Reranker Type in this mode as well. For more information on available prompt policies, see RAG-based LLM chatbot.

Run queries

The chatbot supports three query modes:

Retrieval — Returns the top K relevant results from the vector database.

image

LLM — Uses only the LLM application to generate an answer.

image

RAG (retrieval + LLM) — Retrieves relevant documents, injects them into the selected prompt template, and sends the combined input to the LLM.

image

Step 4: Call the API

Get the endpoint and token

  1. Click the chatbot service name to open the Service Details page.

  2. In the Basic Information section, click View Endpoint Information.

  3. On the Public Endpoint tab of the Invocation Method dialog, copy the service endpoint and token.

Upload knowledge base files

Connect to the vector database and upload knowledge base files before calling the API. Alternatively, populate the vector database directly using a table that conforms to the PAI-RAG schema.

Call the API

The chatbot exposes three endpoints, one for each query mode:

ModeEndpoint
Retrievalservice/query/retrieval
LLMservice/query/llm
RAG (retrieval + LLM)service/query

Replace <service_url> and <service_token> with the values from the previous step. Remove the trailing slash (/) from the service URL.

cURL

Single-turn requests

# Retrieval mode
curl -X 'POST' '<service_url>service/query/retrieval' \
  -H 'Authorization: <service_token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"question": "What is PAI?"}'

# LLM mode (supports optional parameters such as temperature)
curl -X 'POST' '<service_url>service/query/llm' \
  -H 'Authorization: <service_token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"question": "What is PAI?", "temperature": 0.9}'

# RAG mode (retrieval + LLM)
curl -X 'POST' '<service_url>service/query' \
  -H 'Authorization: <service_token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"question": "What is PAI?"}'

Multi-turn requests

Multi-turn conversations are supported in RAG and LLM modes. Use session_id to maintain conversation state across requests, or pass chat_history explicitly.

# Round 1: send the first question and get a session_id in the response
curl -X 'POST' '<service_url>service/query' \
  -H 'Authorization: <service_token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"question": "What is PAI?"}'

# Round 2: include the session_id to continue the conversation
curl -X 'POST' '<service_url>service/query' \
  -H 'Authorization: <service_token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"question": "What are the benefits of PAI?", "session_id": "ed7a80e2e20442eab****"}'

# Alternative: pass chat_history directly as a list of {user, bot} pairs
curl -X 'POST' '<service_url>service/query' \
  -H 'Authorization: <service_token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"question": "What are the features of PAI?", "chat_history": [{"user": "What is PAI", "bot": "PAI is an AI platform provided by Alibaba Cloud..."}]}'

# When both session_id and chat_history are provided, the chat_history is appended to the session
curl -X 'POST' '<service_url>service/query' \
  -H 'Authorization: <service_token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"question": "What are the features of PAI?", "chat_history": [{"user": "What is PAI", "bot": "PAI is an AI platform provided by Alibaba Cloud..."}], "session_id": "1702ffxxad3xxx6fxxx97daf7c"}'

Python

Single-turn requests

import requests

EAS_URL = 'http://xxxx.****.cn-beijing.pai-eas.aliyuncs.com'  # Remove trailing /
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': 'MDA5NmJkNzkyMGM1Zj****YzM4M2YwMDUzZTdiZmI5YzljYjZmNA==',
}


def test_post_api_query_llm():
    url = EAS_URL + '/service/query/llm'
    data = {"question": "What is PAI?"}
    response = requests.post(url, headers=headers, json=data)

    if response.status_code != 200:
        raise ValueError(f'Error post to {url}, code: {response.status_code}')
    ans = dict(response.json())
    print(f"======= Question =======\n {data['question']}")
    print(f"======= Answer =======\n {ans['answer']} \n\n")


def test_post_api_query_retrieval():
    url = EAS_URL + '/service/query/retrieval'
    data = {"question": "What is PAI?"}
    response = requests.post(url, headers=headers, json=data)

    if response.status_code != 200:
        raise ValueError(f'Error post to {url}, code: {response.status_code}')
    ans = dict(response.json())
    print(f"======= Question =======\n {data['question']}")
    print(f"======= Answer =======\n {ans['docs']}\n\n")


def test_post_api_query_rag():
    url = EAS_URL + '/service/query'
    data = {"question": "What is PAI?"}
    response = requests.post(url, headers=headers, json=data)

    if response.status_code != 200:
        raise ValueError(f'Error post to {url}, code: {response.status_code}')
    ans = dict(response.json())
    print(f"======= Question =======\n {data['question']}")
    print(f"======= Answer =======\n {ans['answer']}")
    print(f"======= Retrieved Docs =======\n {ans['docs']}\n\n")


# LLM mode
test_post_api_query_llm()
# Retrieval mode
test_post_api_query_retrieval()
# RAG mode (retrieval + LLM)
test_post_api_query_rag()

Multi-turn requests

Multi-turn conversations are supported in LLM and RAG modes. Pass the session_id from the previous response to continue a conversation.

import requests

EAS_URL = 'http://xxxx.****.cn-beijing.pai-eas.aliyuncs.com'  # Remove trailing /
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': 'MDA5NmJkN****jNlMDgzYzM4M2YwMDUzZTdiZmI5YzljYjZmNA==',
}


def test_post_api_query_llm_with_chat_history():
    url = EAS_URL + '/service/query/llm'

    # Round 1
    data = {"question": "What is PAI?"}
    response = requests.post(url, headers=headers, json=data)
    if response.status_code != 200:
        raise ValueError(f'Error post to {url}, code: {response.status_code}')
    ans = dict(response.json())
    print(f"=======Round 1: Question =======\n {data['question']}")
    print(f"=======Round 1: Answer =======\n {ans['answer']} session_id: {ans['session_id']} \n")

    # Round 2: use the session_id from Round 1
    data_2 = {
        "question": "What are the benefits of PAI?",
        "session_id": ans['session_id']
    }
    response_2 = requests.post(url, headers=headers, json=data_2)
    if response_2.status_code != 200:
        raise ValueError(f'Error post to {url}, code: {response_2.status_code}')
    ans_2 = dict(response_2.json())
    print(f"=======Round 2: Question =======\n {data_2['question']}")
    print(f"=======Round 2: Answer =======\n {ans_2['answer']} session_id: {ans_2['session_id']} \n\n")


def test_post_api_query_rag_with_chat_history():
    url = EAS_URL + '/service/query'

    # Round 1
    data = {"question": "What is PAI?"}
    response = requests.post(url, headers=headers, json=data)
    if response.status_code != 200:
        raise ValueError(f'Error post to {url}, code: {response.status_code}')
    ans = dict(response.json())
    print(f"=======Round 1: Question =======\n {data['question']}")
    print(f"=======Round 1: Answer =======\n {ans['answer']} session_id: {ans['session_id']}")
    print(f"=======Round 1: Retrieved Docs =======\n {ans['docs']}\n")

    # Round 2: use the session_id from Round 1
    data = {
        "question": "What are the features of PAI?",
        "session_id": ans['session_id']
    }
    response = requests.post(url, headers=headers, json=data)
    if response.status_code != 200:
        raise ValueError(f'Error post to {url}, code: {response.status_code}')
    ans = dict(response.json())
    print(f"=======Round 2: Question =======\n {data['question']}")
    print(f"=======Round 2: Answer =======\n {ans['answer']} session_id: {ans['session_id']}")
    print(f"=======Round 2: Retrieved Docs =======\n {ans['docs']}")


# LLM mode with chat history
test_post_api_query_llm_with_chat_history()
# RAG mode with chat history
test_post_api_query_rag_with_chat_history()

View knowledge base content

After the chatbot is running, connect to the RDS PostgreSQL database to inspect the imported knowledge base content directly. For connection instructions, see Connect to an ApsaraDB RDS for PostgreSQL instance.

image

FAQ

Disable chat history

To disable multi-turn conversation history on the web UI, clear the Chat history checkbox.

image

What's next