Build a RAG chatbot using Hologres and a large language model - Hologres

Use Hologres as a vector storage and retrieval engine to quickly build a domain-specific chatbot for free — no model training required. Store your custom text as vectors in Hologres, feed them to a large language model (LLM) via prompt tuning, and get accurate answers grounded in your own data.

How it works:

Hologres — stores and retrieves vectors at scale, powered by the Proxima library from DAMO Academy
LangChain — orchestrates the pipeline between vector retrieval and the LLM
LLM — generates answers anchored to the retrieved documents

When to use this approach:

You need accurate answers to domain-specific questions that a general-purpose LLM cannot answer correctly
Your knowledge base changes frequently and retraining a model is not practical
You want to keep your data private without sending it to an external training pipeline

When not to use this approach:

Your knowledge base is small enough to fit entirely in the LLM's context window
The questions are general enough that a pre-trained LLM already answers them accurately

Prerequisites

Before you begin, ensure that you have:

Access to an LLM (this topic applies to various LLMs)
A Hologres instance activated

Use a Hologres instance with 8 compute units (CUs) or more. An 8-CU instance can handle millions of vectors. For larger corpora, use a higher-CU instance.
Anaconda installed

Step 1: Set up the environment

Create and activate a Python 3.8 environment in Anaconda, then install the required dependencies:

conda create --name chatbot python=3.8
conda activate chatbot

pip install langchain psycopg2-binary torch transformers modelscope bottle
pip install <LLM_dependencies>

Prepare your LLM. This tutorial uses an LLM accessible through an API — have your API AccessKey pair ready.
Download the sample code and data:
- Option A (ZIP): Download and decompress the archive.
- Option B (Git): Install Git, then run: ``bash git clone https://github.com/aliyun/alibabacloud-hologres-connectors.git ``

Step 2: Test the baseline LLM

Before adding vector retrieval, test how the LLM performs on Hologres-specific questions using only its pre-trained knowledge. This establishes a baseline to compare against later.

In holo-chatbot/chatbot.py, find the main function at line 107. Replace chat_model_name with the LLM class you want to use:

Any LLM supported by LangChain works. See LangChain chat models for the full list. Alternatively, implement the corresponding interfaces for other LLMs.
```
# Select an LLM.
from langchain.chat_models import <chat_model_name>

chat_model = <chat_model_name>(<api_key>='xxx', <api_base>='xxx', temperature=0.1)
bot = Chatbot(chat_model, args.clear, args.no_vector_store)
```

Run the setup wizard, then start the LLM in no-vector-store mode:

The embedding model (~400 MB) downloads automatically on first run from ModelScope. Make sure your server can reach ModelScope. Subsequent runs skip this download.

# Switch to the code directory.
cd alibabacloud-hologres-connectors/holo-chatbot
# Open the setup wizard. Press Enter to skip each setting for now.
python generate_config.py

# -n / --no-vector-store: calls the LLM directly, without Hologres vector retrieval.
# On first run, the embedding model (~400 MB) downloads automatically from ModelScope — this may take several minutes.
python chatbot.py -n

Ask the LLM Hologres-specific questions. The following examples show the baseline responses — notice the inaccuracies and missing details that vector retrieval will fix:
- Human: What is Hologres? Chatbot: Hologres is a cloud-native analytical data warehouse provided by Alibaba Cloud. It can process petabytes of data and provides high-performance query and analysis capabilities. Hologres supports the SQL and PostgreSQL protocols and is deeply integrated with the Alibaba Cloud ecosystem, such as MaxCompute and DataWorks.
- Human: Is there a way to accelerate JSONB queries in Hologres? Chatbot: Yes, Hologres provides a method called "Gin Index" to accelerate queries on the JSONB type. A GIN Index is an inverted index-based data structure that can quickly locate rows containing specific JSONB key-value pairs. In Hologres, you can use the CREATE INDEX statement to create a GIN Index. For example: CREATE INDEX idx_gin ON my_table USING gin(my_jsonb_column);
- Human: What is column-store JSONB in Hologres? Chatbot: Column-store JSONB in Hologres is a data type that can store data in JSON format. jsonb is a binary format of JSON that provides faster query speeds and requires less storage space. In Hologres, you can use JSONB columns to store and query JSON data.

Step 3: Load vector data into Hologres

Create a database

Create a database in your Hologres instance.

Configure the connection

Run the setup wizard and enter your Hologres connection details. Alternatively, edit config/config.yaml directly.

python generate_config.py

Parameter	Description
`HOLO_ENDPOINT`	Endpoint of the Hologres instance. Find it on the Instance Details page in the Hologres console.
`HOLO_PORT`	Port number of the Hologres instance. Find it on the Instance Details page.
`HOLO_USER`	AccessKey ID of your Alibaba Cloud account. Get it from AccessKey Management.
`HOLO_PASSWORD`	AccessKey secret of your Alibaba Cloud account. Get it from AccessKey Management.
`HOLO_DATABASE`	Name of the database in the Hologres instance.

Prepare the corpus

The corpus is the domain-specific text the chatbot uses to answer questions. It must be a CSV file with title and content columns.

The sample file data/example_data.csv from the downloaded code contains selected Hologres documentation as example content:

title	content
Who are you	I am Holo chatbot, an AI chat robot. I will do my best to answer your questions based on the provided document database.
How are you implemented and what is your architecture	I am developed using the LangChain framework. LangChain is an easy-to-use Python development framework for large language models... My backend uses Hologres for vector data storage...
What is the real-time data warehouse Hologres	Hologres is a one-stop real-time data warehouse (Real-Time Data Warehouse) engine independently developed by Alibaba...

Replace or extend this file with your own domain content, keeping the same CSV format.

Vectorize and import the corpus

Convert the corpus into 768-dimensional vectors using the open source CoRom embedding model from DAMO Academy on ModelScope, and write them to Hologres:

# Vectorizes corpus/example_data.csv and writes vectors to the langchain_embedding table in Hologres.
python chatbot.py -l

If you switch to a different embedding model, update the vector dimension in Hologres to match the output dimension of your new model. The CoRom model produces 768-dimensional vectors — mismatched dimensions will cause import errors.

Verify the import by querying the langchain_embedding table:

SELECT * FROM langchain_embedding LIMIT 1;

-- Sample result
-- id | embedding | metadata | document
-- 2419815e-1407-11ee-bee5-acde48001122 | {0.395261,0.123794,0.761932,0.413286,...} | {"source": "data/example_data.csv", "row": 1} | title: How are you implemented...

Step 4: Run the chatbot with vector retrieval

Start the chatbot with Hologres vector retrieval enabled:

python chatbot.py

Ask the same questions from Step 2 to see how the corpus improves the answers:

Human: What is Hologres? Chatbot: Hologres is a one-stop real-time data warehouse (Real-Time Data Warehouse) engine developed by Alibaba. It supports real-time writing, updating, processing, and analysis of massive data. Hologres supports standard SQL and is compatible with the PostgreSQL protocol, syntax, and most functions. It also supports petabyte-level multidimensional analysis (OLAP), ad hoc analysis, and online data services (Serving) with high concurrency and low latency... Improvement: The baseline answer was generic. The corpus-grounded answer covers OLAP, Serving, fine-grained isolation, ecosystem integrations (MaxCompute, Flink, DataWorks), and concrete use cases — details the baseline LLM lacked.
Human: Is there a way to accelerate JSONB queries in Hologres? Chatbot: Yes. Starting from V1.3, Hologres supports column storage optimization for the JSONB type. This optimization reduces the storage size of JSONB data and accelerates queries. When you enable this feature, the system automatically converts the JSONB column to a column store with a strong schema at the underlying layer... Improvement: The baseline answer described a generic GIN index approach. The corpus-grounded answer correctly identifies the Hologres-specific column storage optimization introduced in V1.3.
Human: What is column-store JSONB in Hologres? Chatbot: Column-store JSONB in Hologres means that data of the JSONB type is stored in a columnar format. Each key-value pair in the JSONB data is split into a separate column for storage... Starting from V1.3, Hologres supports column storage optimization for the JSONB type. Improvement: The baseline answer described JSONB as a binary JSON format. The corpus-grounded answer correctly explains column storage optimization, including how key-value pairs are split into separate columns and the version information (V1.3).

Step 5 (optional): Connect the chatbot to DingTalk

Deploy the chatbot as a DingTalk group bot so team members can query it directly in a chat.

Open a DingTalk group and add a custom bot. In Security Settings, select Add Signature and enable the Outgoing mechanism. Enter the IP address of your server and an arbitrary token. Copy the access_token and secret from the Webhook section as the DINGDING_TOKEN and DINGDING_SECRET environment variables in config/config.yaml.

Start the HTTP service on your server:

# Runs in the background and handles HTTP requests from DingTalk at http://localhost:8889/chat.
python dingding_server.py > log.txt 2>&1 &

To connect to other messaging services, follow the same pattern in dingding_server.py.

In the DingTalk group, @ the chatbot to start a conversation.

What's next

For Proxima vector computation details, see Proxima vector computation.
To customize the corpus, replace data/example_data.csv with your own CSV file in the same title/content format.
To use a different LLM, update chat_model_name in holo-chatbot/chatbot.py and install the corresponding LangChain dependencies.