Build an enterprise conversational chatbot using Hologres for RAG - Hologres

This guide walks you through building a private enterprise chatbot grounded in your own knowledge base — using Hologres as the vector store, PAI EAS to host Llama 2, and LangChain to wire them together.

By the end, you will have:

Deployed a Llama 2 model on PAI EAS
Configured the connection between Hologres, EAS, and the embedding model
Vectorized your corpus and loaded it into Hologres
Verified that RAG-based retrieval corrects hallucinated answers

How it works

The chatbot uses a Retrieval-Augmented Generation (RAG) architecture:

Your corpus is vectorized using an embedding model and stored in Hologres.
When a user asks a question, Hologres performs a vector search to retrieve the most relevant passages from the corpus.
The retrieved passages are injected into a prompt template and sent to Llama 2 on PAI EAS.
Llama 2 generates an answer grounded in your corpus rather than relying solely on its training data.

LangChain orchestrates the retrieval and generation pipeline. The holo-llm tool in the Hologres connectors repository encapsulates this pipeline so you can get started without building the integration from scratch.

Prerequisites

Before you begin, ensure that you have:

A Hologres instance. For vector workloads with millions of vectors, use an instance with at least 8 compute units (CUs).
PAI activated with a default workspace. See Activate PAI and create a default workspace.
An ECS instance or local machine with the following prepared: Install Anaconda — see Anaconda for instructions. Create a Python environment:
```
conda create --name chatbot python=3.8
conda activate chatbot
```
Install dependencies:
```
pip install langchain modelscope psycopg2-binary sentence_transformers bottle requests
```
Clone the repository:
```
git clone https://github.com/aliyun/alibabacloud-hologres-connectors.git
```
The chatbot tool is in the holo-llm folder.

Step 1: Deploy Llama 2 on PAI EAS

Deploy Llama 2 using PAI EAS. Llama 2 is available in 7B, 13B, and 70B parameter sizes. See Deploying large language models and Deploy a Llama model as a web application in EAS for deployment instructions.

Note

You can also use other large language models (LLMs) instead of Llama 2. For other supported models, see LLM.

After deployment, go to the Service Details page in the EAS console and record the service endpoint and token. You will need these in the next step.

Step 2: Configure the connection

The config/config.json file tells the tool how to connect to each service and which embedding model to use. Open the file:

cd alibabacloud-hologres-connectors/holo-llm
vim config/config.json

Set the following configuration items:

Configuration item	Description
`eas_config.url`	The endpoint of the Llama 2 service on EAS (recorded in step 1)
`eas_config.token`	The authentication token for the EAS service (recorded in step 1)
`holo_config.HOLO_ENDPOINT`	The endpoint of your Hologres instance. Find it on the Instance Details page in the Hologres console.
`holo_config.HOLO_PORT`	The port of your Hologres instance (also on Instance Details)
`holo_config.HOLO_DATABASE`	The name of the database in your Hologres instance
`holo_config.HOLO_USER`	Your Alibaba Cloud AccessKey ID. Get it from the AccessKey Management page.
`holo_config.HOLO_PASSWORD`	Your Alibaba Cloud AccessKey secret. Get it from the AccessKey Management page.
`embedding.model_id`	The path of the embedding model. This guide uses the CoROM model from DAMO Academy: `damo/nlp_corom_sentence-embedding_english-base`.
`embedding.model_dimension`	The vector dimension of the embedding model. CoROM produces 768-dimensional vectors, so set this to `768`. See CoROM for details.
`query_topk`	The number of passages retrieved per query. Default is `4`. A higher value retrieves more context but increases the token count sent to Llama 2, which reduces the space available for conversation history.
`prompt_template`	The prompt template used to format retrieved passages before sending them to Llama 2. The default template works for most cases — change it only if you need to customize the answer format. Changing the template affects answer quality and style.

Important

The embedding, query_topk, and prompt_template settings directly affect answer quality. Change them only when you understand the trade-offs.

Step 3: Vectorize and load your corpus

Run the following commands to clear any existing data and load your corpus into Hologres:

# Clear existing vector data from the database
python main.py -l --clear

# Vectorize the corpus and import it into Hologres
python main.py -l

The tool reads from holo-llm/data/example.csv, vectorizes each entry using the embedding model, and stores the vectors in the langchain_embedding table in Hologres.

Important

On the first run, the tool downloads the embedding model (~400 MB) automatically. Subsequent runs skip this download.

This guide includes a sample Hologres corpus to demonstrate the setup:

Title	Content
What is Hologres	Hologres is a one-stop real-time data warehouse independently developed by Alibaba. It supports real-time data write, update, processing, and analysis of massive data.
What is Proxima	Proxima is a high-performance software library for vector nearest neighbor search from Alibaba's DAMO Academy. Compared with similar open-source products such as Faiss, the Proxima is better in stability and performance.
What is the principle of JSONB column storage in Hologres	Hologres supports column-oriented storage for JSONB type starting from V1.3, reducing storage size and accelerating queries.

To confirm the data loaded successfully, run the following query in your Hologres database:

SELECT * FROM langchain_embedding LIMIT 1;

-- Expected output
-- id | embedding | metadata | document
-- 2419815e-1407-11ee-bee5-acde48001122 | {0.395261,0.123794,...} | {"source": "data/example_data.csv", "row": 1} | title: ... content: ...

Step 4: Verify retrieval-augmented generation

Run both commands to see how RAG changes the answer quality.

Without RAG — query Llama 2 directly, without retrieval:

python main.py -n

Please enter a Question:
What is hologres?
PAI-LLM answer:
Hologres is a term used in the context of digital holography...

With RAG — query Llama 2 with Hologres as the retrieval backend:

python main.py

Please enter a Question:
What is hologres?
PAI-LLM + Hologres answer:
Hologres is a one-stop real-time data warehouse independently developed by Alibaba. It supports real-time data write, update, processing and analysis of massive data.

Without Hologres, Llama 2 generates an answer from its training data — in this case, an incorrect definition based on the word "hologram." With Hologres as the retrieval backend, the answer is grounded in your corpus and factually correct.

What's next

You have deployed a private conversational chatbot using Hologres, PAI, and Llama 2. To integrate the chatbot into your business workflows — for example, connecting it to a DingTalk group — see Build a free chatbot with Hologres and LLMs.