All Products
Search
Document Center

ApsaraDB RDS:Build an LLM-driven dedicated chatbot on top of ApsaraDB RDS for PostgreSQL

Last Updated:Mar 28, 2026

This tutorial shows how to build a retrieval-augmented generation (RAG) chatbot that answers questions from your own knowledge base. The chatbot uses ApsaraDB RDS for PostgreSQL with the pgvector extension to store and search vector embeddings, and OpenAI to generate answers.

Tech stack:

  • ApsaraDB RDS for PostgreSQL + pgvector: vector storage and similarity search

  • OpenAI text-embedding-ada-002: converts text to vector embeddings

  • OpenAI gpt-3.5-turbo / text-davinci-003: generates answers

  • Python + psycopg2 + LangChain-compatible splitting: knowledge base ingestion

What you'll build:

  1. A data pipeline that splits your knowledge base into text chunks, converts them to vector embeddings, and stores them in RDS PostgreSQL.

  2. A Q&A function that embeds an incoming question, retrieves the most relevant chunks by cosine similarity, and passes them to an LLM for a grounded answer.

How it works

Phase 1: Data preparation

  1. Split documents from your knowledge base into text chunks.

  2. Call the OpenAI Embeddings API to convert each chunk into a 1536-dimension vector embedding.

  3. Store the text chunks, their embeddings, and metadata in your RDS instance.

Phase 2: Q&A

  1. Convert an incoming question to a vector embedding using the same OpenAI model.

  2. Query the RDS instance to find text chunks with a cosine similarity score above the threshold.

  3. Pass the matching chunks to an LLM to generate a grounded answer.

The following figure shows the end-to-end process.

image..png

Prerequisites

Before you begin, make sure you have:

Key concepts

Embedding

An embedding represents text as a fixed-length numeric vector, capturing semantic meaning so that similar texts produce similar vectors. In this tutorial, each text chunk and each user question is converted to a 1536-dimension vector using OpenAI's text-embedding-ada-002 model.

Why ApsaraDB RDS for PostgreSQL for RAG

ApsaraDB RDS for PostgreSQL with the pgvector extension lets you run vector similarity searches inside the same database that handles your application's transactional data — chat history, user records, session state — without maintaining a separate vector store.

Phase 1: Prepare the knowledge base

Step 1: Set up the database

  1. Connect to your RDS instance.

  2. Create a test database.

    CREATE DATABASE rds_pgvector_test;
  3. Switch to the database and enable the pgvector extension.

    CREATE EXTENSION IF NOT EXISTS vector;
  4. Create a table to store knowledge base chunks and their embeddings.

    CREATE TABLE rds_pg_help_docs (
      id bigserial PRIMARY KEY,
      title text,
      description text,
      doc_chunk text,
      token_size int,
      embedding vector(1536)
    );

    The table stores the following fields for each text chunk:

    ColumnData typeDescription
    idbigserialAuto-incrementing primary key
    titletextSource document title
    descriptiontextSource document description
    doc_chunktextText chunk content
    token_sizeintNumber of tokens in the chunk
    embeddingvector(1536)Vector embedding generated by the embedding model
    The dimension 1536 matches the output of OpenAI's text-embedding-ada-002 model. If you use a different embedding model, update this value to match its output dimensions.
  5. Create an ivfflat index on the embedding column to speed up similarity searches.

    For guidance on choosing index types and parameters for vector indexes, see Use the pgvector extension to perform high-dimensional vector similarity searches.
    CREATE INDEX ON rds_pg_help_docs USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Step 2: Install Python dependencies

In PyCharm, open the terminal and run:

pip install openai psycopg2 tiktoken requests beautifulsoup4 numpy

Step 3: Split and store knowledge base content

Create a file named knowledge_chunk_storage.py with the following code. This script fetches a web page, splits the HTML content into fixed-size text chunks, generates an embedding for each chunk, and stores the results in your RDS instance.

The example below uses a fixed character count to split content. For production use, consider the splitting utilities in LangChain or the ChatGPT Retrieval Plugin, which support Markdown, PDF, Word, and other formats. Chunking quality directly affects retrieval accuracy.

Replace the following placeholders before running:

PlaceholderDescriptionExample
<Database name>Name of the database you createdrds_pgvector_test
<Endpoint of the RDS instance>Public or private endpoint of the RDS instancepgm-xxx.pg.rds.aliyuncs.com
<Username>Database account usernamemyuser
<Password>Database account password
<Database port number>Listening port5432
<Secret API Key>Your OpenAI Secret API key
import openai
import psycopg2
import tiktoken
import requests
from bs4 import BeautifulSoup

EMBEDDING_MODEL = "text-embedding-ada-002"
tokenizer = tiktoken.get_encoding("cl100k_base")

# Connect to the RDS instance
conn = psycopg2.connect(
    database="<Database name>",
    host="<Endpoint of the RDS instance>",
    user="<Username>",
    password="<Password>",
    port="<Database port number>"
)
conn.autocommit = True

# OpenAI API key
openai.api_key = '<Secret API Key>'

# Split HTML content into fixed-size character chunks
def get_text_chunks(text, max_chunk_size):
    chunks_ = []
    soup_ = BeautifulSoup(text, 'html.parser')
    content = ''.join(soup_.strings).strip()
    length = len(content)
    start = 0
    while start < length:
        end = min(start + max_chunk_size, length)
        chunks_.append(content[start:end])
        start = end
    return chunks_

# Fetch the source document
url = 'https://www.alibabacloud.com/help/document_detail/148038.html'
response = requests.get(url)

if response.status_code == 200:
    web_html_data = response.text
    soup = BeautifulSoup(web_html_data, 'html.parser')

    # Extract title and description
    title = soup.find('h1').text.strip()
    description = soup.find('p', class_='shortdesc').text.strip()

    # Split content into 500-character chunks and store with embeddings
    chunks = get_text_chunks(web_html_data, 500)
    for chunk in chunks:
        doc_item = {
            'title': title,
            'description': description,
            'doc_chunk': chunk,
            'token_size': len(tokenizer.encode(chunk))
        }

        # Generate embedding for the chunk
        query_embedding_response = openai.Embedding.create(
            model=EMBEDDING_MODEL,
            input=chunk,
        )
        doc_item['embedding'] = query_embedding_response['data'][0]['embedding']

        # Insert into RDS
        cur = conn.cursor()
        insert_query = '''
        INSERT INTO rds_pg_help_docs
            (title, description, doc_chunk, token_size, embedding)
        VALUES (%s, %s, %s, %s, %s);
        '''
        cur.execute(insert_query, (
            doc_item['title'],
            doc_item['description'],
            doc_item['doc_chunk'],
            doc_item['token_size'],
            doc_item['embedding']
        ))
        conn.commit()
else:
    print('Failed to fetch web page')

Step 4: Run and verify

  1. Run the knowledge_chunk_storage.py script.

  2. Log in to the database and verify that data was stored correctly.

    SELECT * FROM rds_pg_help_docs;

    The output lists rows with populated doc_chunk, token_size, and embedding columns.

    image..png

Phase 2: Build the Q&A function

Create a file named chatbot.py. The key parameters are listed at the top of the script.

ParameterValueDescription
GPT_MODELgpt-3.5-turboChat model
EMBEDDING_MODELtext-embedding-ada-002Embedding model (must match phase 1)
GPT_COMPLETIONS_MODELtext-davinci-003Completions model for answer generation
MAX_TOKENS1024Maximum tokens in a single LLM response
similarity_threshold0.78Minimum cosine similarity score for a chunk to be retrieved
max_matched_doc_counts8Maximum number of matching chunks to retrieve

The Q&A flow works as follows: embed the question using the same model as phase 1, retrieve chunks with cosine similarity above similarity_threshold, batch them into groups of up to 1,000 tokens, and call the LLM once per batch.

import openai
import psycopg2
from psycopg2.extras import DictCursor

GPT_MODEL = "gpt-3.5-turbo"
EMBEDDING_MODEL = "text-embedding-ada-002"
GPT_COMPLETIONS_MODEL = "text-davinci-003"
MAX_TOKENS = 1024

# OpenAI API key
openai.api_key = '<Secret API Key>'

# The question to answer
prompt = 'Create an RDS instance.'

# Embed the question
prompt_response = openai.Embedding.create(
    model=EMBEDDING_MODEL,
    input=prompt,
)
prompt_embedding = prompt_response['data'][0]['embedding']

# Connect to the RDS instance
conn = psycopg2.connect(
    database="<Database name>",
    host="<Endpoint of the RDS instance>",
    user="<Username>",
    password="<Password>",
    port="<Database port number>"
)
conn.autocommit = True


def answer(prompt_doc, prompt):
    """Send matching chunks and the question to the LLM and print the answer."""
    improved_prompt = f"""
    Answer the following questions based on the following file and steps:
    a. Check whether the content in the file is related to the question.
    b. Reference the content in the file to answer the question and provide the detailed answer as a Markdown file.
    c. Use "I don't know because the question is beyond my scope" to answer the questions that are irrelevant to ApsaraDB RDS for PostgreSQL.

    File:
    \"\"\"
    {prompt_doc}
    \"\"\"

    Question: {prompt}
    """

    response = openai.Completion.create(
        model=GPT_COMPLETIONS_MODEL,
        prompt=improved_prompt,
        temperature=0.2,
        max_tokens=MAX_TOKENS
    )
    print(f"{response['choices'][0]['text']}\n")


# Retrieve text chunks with cosine similarity above the threshold
similarity_threshold = 0.78
max_matched_doc_counts = 8

similarity_search_sql = f'''
SELECT doc_chunk, token_size, 1 - (embedding <=> '{prompt_embedding}') AS similarity
FROM rds_pg_help_docs
WHERE 1 - (embedding <=> '{prompt_embedding}') > {similarity_threshold}
ORDER BY id
LIMIT {max_matched_doc_counts};
'''

cur = conn.cursor(cursor_factory=DictCursor)
cur.execute(similarity_search_sql)
matched_docs = cur.fetchall()

# Batch chunks into groups of up to 1,000 tokens and call the LLM per batch
total_tokens = 0
prompt_doc = ''
print('Answer: \n')
for matched_doc in matched_docs:
    if total_tokens + matched_doc['token_size'] <= 1000:
        prompt_doc += f"\n---\n{matched_doc['doc_chunk']}"
        total_tokens += matched_doc['token_size']
        continue

    answer(prompt_doc, prompt)
    total_tokens = 0
    prompt_doc = ''

answer(prompt_doc, prompt)

Run the script. The chatbot prints an answer grounded in your knowledge base.

To improve answer accuracy, refine the text chunking strategy and adjust the prompt template in the answer() function.
image..png

Results

Without a connected knowledge base, OpenAI has no context about Alibaba Cloud products. The answer to "How do I create an ApsaraDB RDS for PostgreSQL instance?" is generic and unrelated to ApsaraDB RDS.

image..png

After connecting the knowledge base stored in your RDS instance, the same question returns an answer specific to ApsaraDB RDS for PostgreSQL, drawn directly from your ingested documentation.

What's next