Build an LLM-driven chatbot on ApsaraDB RDS for PostgreSQL with pgvector - ApsaraDB RDS

This tutorial shows how to build a retrieval-augmented generation (RAG) chatbot that answers questions from your own knowledge base. The chatbot uses ApsaraDB RDS for PostgreSQL with the pgvector extension to store and search vector embeddings, and OpenAI to generate answers.

Tech stack:

ApsaraDB RDS for PostgreSQL + pgvector: vector storage and similarity search
OpenAI text-embedding-ada-002: converts text to vector embeddings
OpenAI gpt-3.5-turbo / text-davinci-003: generates answers
Python + psycopg2 + LangChain-compatible splitting: knowledge base ingestion

What you'll build:

A data pipeline that splits your knowledge base into text chunks, converts them to vector embeddings, and stores them in RDS PostgreSQL.
A Q&A function that embeds an incoming question, retrieves the most relevant chunks by cosine similarity, and passes them to an LLM for a grounded answer.

How it works

Phase 1: Data preparation

Split documents from your knowledge base into text chunks.
Call the OpenAI Embeddings API to convert each chunk into a 1536-dimension vector embedding.
Store the text chunks, their embeddings, and metadata in your RDS instance.

Phase 2: Q&A

Convert an incoming question to a vector embedding using the same OpenAI model.
Query the RDS instance to find text chunks with a cosine similarity score above the threshold.
Pass the matching chunks to an LLM to generate a grounded answer.

The following figure shows the end-to-end process.

Prerequisites

Before you begin, make sure you have:

An ApsaraDB RDS for PostgreSQL instance running PostgreSQL 14 or later, with minor engine version 20230430 or later
To upgrade the major engine version, see Upgrade the major engine version. To update the minor engine version, see Update the minor engine version.
Familiarity with the pgvector extension — see Use the pgvector extension to perform high-dimensional vector similarity searches for background
An OpenAI Secret API key with network access to the OpenAI API (the sample code in this tutorial runs on an Elastic Compute Service (ECS) instance in the Singapore region)
Python 3.11.4 and PyCharm 2023.1.2 (or a compatible Python environment)

Key concepts

Embedding

An embedding represents text as a fixed-length numeric vector, capturing semantic meaning so that similar texts produce similar vectors. In this tutorial, each text chunk and each user question is converted to a 1536-dimension vector using OpenAI's text-embedding-ada-002 model.

Why ApsaraDB RDS for PostgreSQL for RAG

ApsaraDB RDS for PostgreSQL with the pgvector extension lets you run vector similarity searches inside the same database that handles your application's transactional data — chat history, user records, session state — without maintaining a separate vector store.

Phase 1: Prepare the knowledge base

Step 1: Set up the database

Connect to your RDS instance.
Create a test database.
```
CREATE DATABASE rds_pgvector_test;
```
Switch to the database and enable the pgvector extension.
```
CREATE EXTENSION IF NOT EXISTS vector;
```

Create a table to store knowledge base chunks and their embeddings.

CREATE TABLE rds_pg_help_docs (
  id bigserial PRIMARY KEY,
  title text,
  description text,
  doc_chunk text,
  token_size int,
  embedding vector(1536)
);

The table stores the following fields for each text chunk:

Column	Data type	Description
`id`	bigserial	Auto-incrementing primary key
`title`	text	Source document title
`description`	text	Source document description
`doc_chunk`	text	Text chunk content
`token_size`	int	Number of tokens in the chunk
`embedding`	vector(1536)	Vector embedding generated by the embedding model

The dimension 1536 matches the output of OpenAI's text-embedding-ada-002 model. If you use a different embedding model, update this value to match its output dimensions.

Create an ivfflat index on the embedding column to speed up similarity searches.
For guidance on choosing index types and parameters for vector indexes, see Use the pgvector extension to perform high-dimensional vector similarity searches.
```
CREATE INDEX ON rds_pg_help_docs USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
```

Step 2: Install Python dependencies

In PyCharm, open the terminal and run:

pip install openai psycopg2 tiktoken requests beautifulsoup4 numpy

Step 3: Split and store knowledge base content

Create a file named knowledge_chunk_storage.py with the following code. This script fetches a web page, splits the HTML content into fixed-size text chunks, generates an embedding for each chunk, and stores the results in your RDS instance.

The example below uses a fixed character count to split content. For production use, consider the splitting utilities in LangChain or the ChatGPT Retrieval Plugin, which support Markdown, PDF, Word, and other formats. Chunking quality directly affects retrieval accuracy.

Replace the following placeholders before running:

Placeholder	Description	Example
`<Database name>`	Name of the database you created	`rds_pgvector_test`
`<Endpoint of the RDS instance>`	Public or private endpoint of the RDS instance	`pgm-xxx.pg.rds.aliyuncs.com`
`<Username>`	Database account username	`myuser`
`<Password>`	Database account password	—
`<Database port number>`	Listening port	`5432`
`<Secret API Key>`	Your OpenAI Secret API key	—

import openai
import psycopg2
import tiktoken
import requests
from bs4 import BeautifulSoup

EMBEDDING_MODEL = "text-embedding-ada-002"
tokenizer = tiktoken.get_encoding("cl100k_base")

# Connect to the RDS instance
conn = psycopg2.connect(
    database="<Database name>",
    host="<Endpoint of the RDS instance>",
    user="<Username>",
    password="<Password>",
    port="<Database port number>"
)
conn.autocommit = True

# OpenAI API key
openai.api_key = '<Secret API Key>'

# Split HTML content into fixed-size character chunks
def get_text_chunks(text, max_chunk_size):
    chunks_ = []
    soup_ = BeautifulSoup(text, 'html.parser')
    content = ''.join(soup_.strings).strip()
    length = len(content)
    start = 0
    while start < length:
        end = min(start + max_chunk_size, length)
        chunks_.append(content[start:end])
        start = end
    return chunks_

# Fetch the source document
url = 'https://www.alibabacloud.com/help/document_detail/148038.html'
response = requests.get(url)

if response.status_code == 200:
    web_html_data = response.text
    soup = BeautifulSoup(web_html_data, 'html.parser')

    # Extract title and description
    title = soup.find('h1').text.strip()
    description = soup.find('p', class_='shortdesc').text.strip()

    # Split content into 500-character chunks and store with embeddings
    chunks = get_text_chunks(web_html_data, 500)
    for chunk in chunks:
        doc_item = {
            'title': title,
            'description': description,
            'doc_chunk': chunk,
            'token_size': len(tokenizer.encode(chunk))
        }

        # Generate embedding for the chunk
        query_embedding_response = openai.Embedding.create(
            model=EMBEDDING_MODEL,
            input=chunk,
        )
        doc_item['embedding'] = query_embedding_response['data'][0]['embedding']

        # Insert into RDS
        cur = conn.cursor()
        insert_query = '''
        INSERT INTO rds_pg_help_docs
            (title, description, doc_chunk, token_size, embedding)
        VALUES (%s, %s, %s, %s, %s);
        '''
        cur.execute(insert_query, (
            doc_item['title'],
            doc_item['description'],
            doc_item['doc_chunk'],
            doc_item['token_size'],
            doc_item['embedding']
        ))
        conn.commit()
else:
    print('Failed to fetch web page')

Step 4: Run and verify

Run the knowledge_chunk_storage.py script.
Log in to the database and verify that data was stored correctly.
```
SELECT * FROM rds_pg_help_docs;
```
The output lists rows with populated doc_chunk, token_size, and embedding columns.

Phase 2: Build the Q&A function

Create a file named chatbot.py. The key parameters are listed at the top of the script.

Parameter	Value	Description
`GPT_MODEL`	`gpt-3.5-turbo`	Chat model
`EMBEDDING_MODEL`	`text-embedding-ada-002`	Embedding model (must match phase 1)
`GPT_COMPLETIONS_MODEL`	`text-davinci-003`	Completions model for answer generation
`MAX_TOKENS`	`1024`	Maximum tokens in a single LLM response
`similarity_threshold`	`0.78`	Minimum cosine similarity score for a chunk to be retrieved
`max_matched_doc_counts`	`8`	Maximum number of matching chunks to retrieve

The Q&A flow works as follows: embed the question using the same model as phase 1, retrieve chunks with cosine similarity above similarity_threshold, batch them into groups of up to 1,000 tokens, and call the LLM once per batch.

import openai
import psycopg2
from psycopg2.extras import DictCursor

GPT_MODEL = "gpt-3.5-turbo"
EMBEDDING_MODEL = "text-embedding-ada-002"
GPT_COMPLETIONS_MODEL = "text-davinci-003"
MAX_TOKENS = 1024

# OpenAI API key
openai.api_key = '<Secret API Key>'

# The question to answer
prompt = 'Create an RDS instance.'

# Embed the question
prompt_response = openai.Embedding.create(
    model=EMBEDDING_MODEL,
    input=prompt,
)
prompt_embedding = prompt_response['data'][0]['embedding']

# Connect to the RDS instance
conn = psycopg2.connect(
    database="<Database name>",
    host="<Endpoint of the RDS instance>",
    user="<Username>",
    password="<Password>",
    port="<Database port number>"
)
conn.autocommit = True


def answer(prompt_doc, prompt):
    """Send matching chunks and the question to the LLM and print the answer."""
    improved_prompt = f"""
    Answer the following questions based on the following file and steps:
    a. Check whether the content in the file is related to the question.
    b. Reference the content in the file to answer the question and provide the detailed answer as a Markdown file.
    c. Use "I don't know because the question is beyond my scope" to answer the questions that are irrelevant to ApsaraDB RDS for PostgreSQL.

    File:
    \"\"\"
    {prompt_doc}
    \"\"\"

    Question: {prompt}
    """

    response = openai.Completion.create(
        model=GPT_COMPLETIONS_MODEL,
        prompt=improved_prompt,
        temperature=0.2,
        max_tokens=MAX_TOKENS
    )
    print(f"{response['choices'][0]['text']}\n")


# Retrieve text chunks with cosine similarity above the threshold
similarity_threshold = 0.78
max_matched_doc_counts = 8

similarity_search_sql = f'''
SELECT doc_chunk, token_size, 1 - (embedding <=> '{prompt_embedding}') AS similarity
FROM rds_pg_help_docs
WHERE 1 - (embedding <=> '{prompt_embedding}') > {similarity_threshold}
ORDER BY id
LIMIT {max_matched_doc_counts};
'''

cur = conn.cursor(cursor_factory=DictCursor)
cur.execute(similarity_search_sql)
matched_docs = cur.fetchall()

# Batch chunks into groups of up to 1,000 tokens and call the LLM per batch
total_tokens = 0
prompt_doc = ''
print('Answer: \n')
for matched_doc in matched_docs:
    if total_tokens + matched_doc['token_size'] <= 1000:
        prompt_doc += f"\n---\n{matched_doc['doc_chunk']}"
        total_tokens += matched_doc['token_size']
        continue

    answer(prompt_doc, prompt)
    total_tokens = 0
    prompt_doc = ''

answer(prompt_doc, prompt)

Run the script. The chatbot prints an answer grounded in your knowledge base.

To improve answer accuracy, refine the text chunking strategy and adjust the prompt template in the answer() function.

Results

Without a connected knowledge base, OpenAI has no context about Alibaba Cloud products. The answer to "How do I create an ApsaraDB RDS for PostgreSQL instance?" is generic and unrelated to ApsaraDB RDS.

After connecting the knowledge base stored in your RDS instance, the same question returns an answer specific to ApsaraDB RDS for PostgreSQL, drawn directly from your ingested documentation.

What's next

Improve retrieval accuracy by using LangChain or the ChatGPT Retrieval Plugin for more sophisticated document chunking and ingestion.
Learn more about vector indexing options and similarity operators in Use the pgvector extension to perform high-dimensional vector similarity searches.