Full Compatibility with MySQL! How to Build a RAG System Based on PolarDB

The article explains how to build a Retrieval-Augmented Generation (RAG) system on Alibaba Cloud PolarDB, leveraging its MySQL-compatible vector search and built-in AI capabilities.

By Nanlong Yu

1. Background

Large language models (LLMs) have limits in accuracy and real-time responsiveness of generated outputs and are not ideally suitable for direct deployment in customer service or Q&A systems that require precise, up-to-date information. To address the issue, the industry has widely adopted the RAG (Retrieval-Augmented Generation) technology to improve the performance of LLMs. RAG significantly improves the performance of natural language processing (NLP) tasks such as Q&A and summarization that require access to external knowledge. RAG integrates LLM models, such as Tongyi Qianwen, with an information retriever to enhance the accuracy and informativeness of the LLM models. When a user query is processed, RAG first uses the information retriever to search a designated knowledge base for contextually relevant documents, passages, or data snippets. Then, the retrieved contents are combined with the original query and fed into the LLM. The LLM then leverages its inherent generative capabilities to produce factually accurate responses based on the latest retrieved information, which eliminates the need to rebuild the model.

PolarDB, a cloud-native database owned by Alibaba Cloud ApsaraDB, supports MySQL 9.0 native vector search syntax (The Vector Type [1], Vector Functions [2]). This feature accelerates KNN search by using vectorized execution and parallel scan capabilities of IMCIs. Developers can use PolarDB IMCI as an information retrieval component and use the induction and generation capabilities provided by PolarDB for AI (the integrated Qwen model) to quickly build an RAG system. This effectively overcomes the limitations of LLM in terms of accuracy and real-time performance, and provides accurate and informative responses for various Q&A scenarios.

[1] https://dev.mysql.com/doc/refman/9.0/en/vector.html
[2] https://dev.mysql.com/doc/refman/9.0/en/vector-functions.html

2. RAG System Architecture Diagram

The following figure shows the overall process of building and using an RAG system based on PolarDB IMCI and PolarDB for AI.

3. RAG System Construction Process

Build a vector index offline

Create a document knowledge base

Upload the files to the database, parse the file content, and perform text vectorization.

Upload the files

The following SQL statement is used to upload files to the database.

/*polar4ai*/UPLOAD FILE docfile WITH
(    file_id = 'unique_file_id', 
file_type = '.docx',
src_file_location = 'your_file_url',
dest_file_name = 'your_file_name.docx',
metadata = '{}',
overwritten = 1
);

The following table describes the parameters in the WITH() clause:

Parameters	Note	Example
file_id	The unique identifier of the file.	'Polar4AI'
file_type	The file type.	'.docx'
src_file_location	The URL of the file.	`https://xxx.aliyuncs.com/Polar4AI-Introduction.docx`
dest_ffile_name	The name of the uploaded file. The name does not need to be unique and cannot be a storage path.	'Polar4AI-Introduction'
metadata	The metadata. You can leave the parameter empty.	'{}'

Create a metadata table for the uploaded files

To create and update a metadata table for the files uploaded to a database, execute the following SQL statement:

CREATE TABLE file_id_list_rag (
file_id varchar(256) PRIMARY KEY
);
INSERT INTO file_id_list_rag(file_id) VALUES ('unique_file_id');

The following table describes the parameters in the preceding SQL statement:

Parameters	Note	Example
file_id	The unique identifier of the file specified in the UPLOAD FILE statement.	'Polar4AI'

Text segmentation and vectorization

Create a temporary table

The temporary table that stores the chunks and vectors obtained from the text-to-vector conversion model has fixed parameter names and types. The SQL statement is as follows:

/*polar4ai*/CREATE TABLE file_index_table_rag(  chunk_id varchar(265),  chunk_content text_ik_smart,   file_id varchar(256),  file_name varchar(256),  vecs vector_768,   PRIMARY KEY(chunk_id));

The following table describes the parameters in the temporary table.

Parameters	Note	Example
file_id	The unique identifier for the source file, which is used to trace the origin of the chunked text.	'Polar4AI'
file_name	The name or title of the file. You can also specify a summary or key phrase related to the content based on your business requirements.	'Polar4AI-Introduction'
chunk_content	The text content of the chunk, which is processed by using a tokenizer. Example tokenizers: text_ik_smart (coarse-grained tokenization) and text_ik_max_word (fine-grained tokenization).	The content of the file.
vecs	The vector of the chunk. The _polar4ai_text2vec model converts the text chunk into a vector.	[0.1, 0.2, ...]
chunk_id	The ID for each text chunk. The value sequentially increments within the same file. Specify the parameter in the {file_id}-{chunk_order} format.	'Polar4AI-1'

Text-to-vector conversion

Select the file_id to be segmented and vectorized from the document metadata table, parse the document content, and use the predict method of the _polar4ai_text2vec model to convert the document chunks into vectors.

/*polar4ai*/SELECT chunk_id, file_id, file_name, chunk_content FROM predict(model _polar4ai_text2vec, SELECT file_id FROM file_id_list_rag) WITH (    x_cols='chunk_content',
primary_key='file_id',
resource='file', 
mode='async', 
to_chunk=1,
headers_to_split_on=2,
chunk_size=1024,
chunk_overlap=64,
separator='')INTO file_index_table_rag;

The _polar4ai_text2vec model is used to convert text into 768-dimensional vectors. The following table describes the parameters included in the WITH() clause.

Parameters	Note	Value range or example
x_cols	The field that stores the text content used as input for the model.	chunk_content
primary_key	The primary key of the file list.	id
mode	The writing mode of document data. Currently, only the async mode is supported.	async
resource	The type of the model. Specify the file model in the file parsing process.	file
to_chunk	Specify whether to chunk the file.	0 and 1
headers_to_split_on	If you set the to_chunk parameter to 1, the file is chunked based on the header level. • To chunk the file based on all header levels, set the headers_to_split_on parameter to -1. •To chunk the file based on the header level x, set the headers_to_split_on parameter to x.	-1 ≤ x ≤ 6, and the value is an integer.
chunk_size	The maximum length of a chunk.	chunk size ≥50
chunk_overlap	The number of overlapping characters between two adjacent chunks.	chunk_overlaps≤chunk_size
separator	The delimiter used to separate the chunks.	'' /', ',' etc.

Build a PolarDB IMCI vector index

Create a table that has a vector index

Use the MySQL native vector type and syntax to create a table that has a vector index and build a column store index.

create table vector_index (  chunk_id varchar(265),  chunk_content text,   file_id varchar(256),  file_name varchar(256),  vecs vector(768),   PRIMARY KEY(chunk_id)) COMMENT 'columnar=1';

Create a vector index

First, you need to query the intermediate results of chunks and vectors from the temporary table in the previous section.

/*polar4ai*/select * from file_index_table_rag;

Write the intermediate results into the table that has a vector index via the string_to_vector expression.

insert into vector_index (file_name,file_id,vecs,chunk_content,chunk_id)values ("file_name","file_id",string_to_vector("[...]"),'chunk_content',"chunk_id");

Online RAG query

Problem vectorization

The SQL statement for the online vectorization of the question is as follows:

/*polar4ai*/SELECT * FROM predict(model _polar4ai_text2vec, SELECT 'question') with();

PolarDB IMCI vector-based recall

Use distance expressions and accelerate KNN vector search with IMCI to recall the K vectors and text chunks with the highest similarity:

select /*+ SET_VAR(use_imci_engine=forced) */ file_name,chunk_content, distance(string_to_vector("[...]"), vecs, "COSINE") as d from vector_index order by d limit 5;

LLM generates answers based on Prompt

Compose the question and the recalled text chunks into a Prompt, and call the PolarDB for AI's built-in Qwen model for online inference to generate an answer:

"/*polar4ai*/SELECT * FROM PREDICT (MODEL _polar4ai_tongyi, SELECT 'prompt') with ()"

4. RAG System Example

You can use PolarDB IMCI and PolarDB for AI to quickly build an RAG system based on the preceding procedure. The appendix of this article provides a sample code. In this example, a document knowledge base is created for some official documents of IMCI. A vector index is built on PolarDB IMCI, and then an RAG system is created by using the PolarDB for AI-integrated Qwen model. A simple Web service is provided as a Q&A interaction. With this RAG system, LLM can correctly answer domain-specific questions related to IMCI.

Run the RAG system

After the Python environment is correctly installed on the ECS according to requirements.txt, run the following commands in sequence as shown in the figure:

1_upload_file.py Create a document knowledge base
2_vectorization.py Perform document segmentation and vectorization and build vector indexes on PolarDB IMCI
3_rag.py Start RAG Web services

Examples

The Hash Match operator is an operator proprietary to IMCI. For the question "What is the Hash Match operator in IMCI", the RAG system can recall similar document chunks from PolarDB IMCI and provide them to LLM as the Prompt. This enables LLM to correctly answer questions about the Hash Match operator.

Appendixes

RAG system sample code

Copy the link below to your browser to get it: https://github.com/nanlongyu/rag_service_demo

Runtime environment

Running the sample requires the developer to have the following resources:

A PolarDB cluster that contains column store nodes and AI nodes: 8.0.2.2.27
An ECS instance: The system image is Alibaba Cloud Linux 3.2104 LTS 64-bit and the Python version is 3.9.20.
OSS

Column store nodes are used to build vector indexes, while AI nodes provide text-to-vector conversion and LLM capabilities. The ECS instance runs the RAG system and deploys Web services, and OSS is used to upload knowledge base documents. While the PolarDB cluster and OSS are mandatory, developers may alternatively run the RAG system and deploy Web services in other appropriate environments.

Developers need to update the relevant access configurations of the above resources to 1_upload_file.py, 2_vectorization.py, and 3_rag.py, and then run the RAG system according to the RAG System Example section.

Community