Build a RAG system with Vector Retrieval Service for Milvus and Dify - Vector Retrieval Service for Milvus

This topic describes how to build a retrieval-augmented generation (RAG) system using Vector Retrieval Service for Milvus (Milvus) and the Dify platform.

Background

RAG

Large language models (LLMs) often hallucinate because their knowledge is limited. The retrieval-augmented generation (RAG) technology solves this pain point by connecting to external knowledge bases. An efficient RAG system requires a powerful vector database.

This topic focuses on Milvus and the Dify low-code artificial intelligence (AI) platform. It shows you how to combine them to quickly build an enterprise-grade RAG application. This demonstrates the core value of a vector database in solving the "last mile" problem in AI.

Dify

Dify is an open source platform for developing AI applications. It features low-code pipelines and a user-friendly interface. Its core mission is to simplify and accelerate the process of building AI applications by integrating backend-as-a-service (BaaS) and LLM Operations (LLMOps) concepts.

Dify is a full-stack solution. On the backend, it provides reliable infrastructure, including API services and data management. This lets developers avoid building from scratch. For LLMOps, Dify offers an intuitive visual interface for prompt engineering, which simplifies complex prompt workflows. The built-in, high-quality RAG engine connects to private knowledge bases, such as company documents and databases. This connection allows the large language model to answer questions using domain-specific knowledge, which reduces hallucinations and ensures that answers are accurate and traceable.

Prerequisites

You have created a Milvus instance. For more information, see Create a Milvus instance.
You have activated Alibaba Cloud Model Studio and obtained an API key. For more information, see Obtain an App ID and a Workspace ID.
You have installed Docker and Docker Compose. For more information, see Install and use Docker and Docker Compose.

Procedure

Step 1: Install Dify

Clone the open-source Dify project from GitHub to your local machine.
```
git clone https://github.com/langgenius/dify.git
```
Go to the deployment directory and back up the .env configuration file.
```
cd dify/docker/
cp .env.example .env
```

Modify the following configuration information in the .env file.

# Vector storage engine configuration
VECTOR_STORE=milvus  # Specifies Milvus as the vector storage engine

# Milvus connection information
MILVUS_URI=http://YOUR_ALIYUN_MILVUS_ENDPOINT:19530
MILVUS_USER=YOUR_ALIYUN_MILVUS_USER
MILVUS_PASSWORD=YOUR_ALIYUN_MILVUS_PASSWORD

This example uses the following parameters. Replace them with your actual values.

Parameter	Description
`MILVUS_URI`	The endpoint of the Milvus instance. The format is `http://<Public IP address>:<Port>`. `<Public IP address>`: You can view this on the Instance Details page of your Milvus instance. `<Port>`: You can view this on the Instance Details page of your Milvus instance. The default port is 19530.
`MILVUS_USER`	The username that you specified when you created the Milvus instance.
`MILVUS_PASSWORD`	The password for the user that you specified when you created the Milvus instance.

Start Dify.
```
docker compose up -d --build
```
In a browser, navigate to http://127.0.0.1/ to access the Dify service. Set the administrator account and password, and then log on to the console.
Note
If Dify is running on a remote server, such as an Elastic Compute Service (ECS) instance or a virtual machine, replace 127.0.0.1 with the public IP address or domain name of the server. Make sure that the server is accessible over the network.

Step 2: Set up the model

After you log on, click your profile picture in the upper-right corner and select Settings from the drop-down menu.
In the navigation pane on the left, select Model Provider. On the right, find Qwen and install it.
After the model is installed, click API-Key Setup. Enter your API key from Alibaba Cloud Model Studio.
Configure the System Model Settings as needed.

Step 3: Create a knowledge base

At the top of the page, click Knowledge, and then click Create Knowledge.
Click Import from file. Download the sample data by clicking README.md. Then, upload the sample data file.
Modify the parameters and click Save & Process.
This example modifies the following parameters:
- Maximum chunk length: Set it to 1024.
- Embedding Model: Select text-embedding-v1.
After the process is complete, the knowledge base is created.

Step 4: Verify vector retrieval

Log on to the Vector Retrieval Service for Milvus console and select your Milvus instance. In the upper-right corner, click Attu Manager to open the Attu page. You can see that the corresponding collection data is imported. For more information, see Manage Milvus instances with Attu.

Step 5: Verify the RAG performance

At the top of the page, click Studio to return to the home page. Select Create from Template.
In the search box, search for and use the Knowledge Retrieval + Chatbot template.
In the dialog box that appears, click Create.
Select the KNOWLEDGE RETRIEVAL node. Set the knowledge base to the one that you created in the previous step.
Select the LLM node and set the model to qwen-max.
In the upper-right corner, click Publish, and select Update.
Run the workflow to open the test page. Enter a question related to the content in the knowledge base to receive an answer.