Build a retrieval-augmented generation (RAG) system with Vector Retrieval Service for Milvus (Milvus) and the Dify platform.
Background
RAG principles
Large language models can "hallucinate"—generating incorrect information—when their internal knowledge is limited. Retrieval-augmented generation (RAG) addresses this by connecting the model to an external knowledge base. An efficient RAG system requires a powerful vector database.
This topic shows how to integrate Milvus and Dify to build an enterprise-grade RAG application that demonstrates the value of a vector database in solving the "last mile" problem in AI.
Dify
Dify is an open-source AI application development platform with low-code workflows. It simplifies the process of building AI applications by integrating backend-as-a-service (BaaS) and LLMOps.
Dify provides backend infrastructure (API services, data management) so developers do not have to build from scratch. Its visual prompt orchestration interface simplifies prompt engineering. The built-in RAG engine connects to private knowledge bases such as enterprise documents and databases, enabling the LLM to generate domain-specific answers that are accurate and traceable, with reduced hallucinations.
Prerequisites
-
A Milvus instance is created. Create a Milvus instance.
-
Alibaba Cloud Model Studio is activated and an API key is obtained. Obtain an App ID and a Workspace ID.
-
Docker and Docker Compose are installed. Install and use Docker and Docker Compose.
Procedure
Step 1: Install Dify
-
Clone the open-source Dify project from GitHub to your local machine.
git clone https://github.com/langgenius/dify.git -
Navigate to the deployment directory and back up the .env configuration file.
cd dify/docker/ cp .env.example .env -
Modify the following settings in the .env file.
# Vector storage engine configuration VECTOR_STORE=milvus # Specifies Milvus as the vector storage engine # Milvus connection information MILVUS_URI=http://YOUR_ALIYUN_MILVUS_ENDPOINT:19530 MILVUS_USER=YOUR_ALIYUN_MILVUS_USER MILVUS_PASSWORD=YOUR_ALIYUN_MILVUS_PASSWORDReplace the placeholder values with your actual information.
Parameter
Description
MILVUS_URIThe endpoint of the Milvus instance. The format is
http://<public IP address>:<port>.-
<public IP address>: Available on the Details page of your Milvus instance. -
<port>: Available on the Details page of your Milvus instance. The default is 19530.
MILVUS_USERThe username you set when creating the Milvus instance.
MILVUS_PASSWORDThe password for the user you set when creating the Milvus instance.
-
-
Start Dify.
docker compose up -d --build[root@xxx /docker]# docker compose up -d --build [+] Running 15/15 ✔ Network docker_default Created ✔ Network docker_milvus Created ✔ Network docker_ssrf_proxy_network Created ✔ Container docker-db-1 Healthy ✔ Container docker-redis-1 Started ✔ Container docker-sandbox-1 Started ✔ Container milvus-etcd Started ✔ Container milvus-minio Started ✔ Container docker-ssrf_proxy-1 Started ✔ Container docker-web-1 Started ✔ Container docker-plugin_daemon-1 Started ✔ Container docker-worker-1 Started ✔ Container docker-api-1 Started ✔ Container milvus-standalone Started ✔ Container docker-nginx-1 Started -
Open
http://127.0.0.1/in a browser to access Dify. Set the administrator account and password, and then log in.NoteIf Dify runs on a remote server (ECS instance or virtual machine), replace
127.0.0.1with the server's public IP address or domain name. Ensure the server is publicly accessible.Enter your Email, Username, and Password (at least 8 characters, with both letters and numbers), and click Set up.
Step 2: Configure models
-
Click your profile picture in the upper-right corner and select Settings.
-
In the left-side navigation pane, select Model Provider. Find Qwen and click Install.
-
After the model is installed, select it and enter the API key from Alibaba Cloud Model Studio.
-
In the System Model Settings panel, configure the System Inference Model, Embedding Model, Rerank Model, speech-to-text model, and text-to-speech model, and then click Save.
Step 3: Create a knowledge base
-
At the top of the page, click Knowledge, and then click Create Knowledge.
-
For Data source, select Import Existing Text. Download the sample data (README.md) and upload it.
-
Modify the parameters as needed and click Save & Process.
Use the default values for key parameters: indexing method High quality, retrieval setting vector retrieval, Rerank Model gte-rerank enabled, Top K 3, Score Threshold 0.5.
In this example, modify the following parameters:
-
Maximum chunk length: Set to 1024.
-
Embedding Model: Select text-embedding-v1.
After processing completes, the knowledge base is created.
The summary confirms the settings: chunking mode Custom, text preprocessing replaces consecutive spaces/newlines/tabs, indexing method High quality, retrieval setting vector retrieval. Click Go to Documentation to view details.
-
Step 4: Verify vector retrieval
Log on to the Vector Retrieval Service for Milvus console. Select your Milvus instance and click Attu Manager in the upper-right corner. On the Attu page, verify that the corresponding collection is created. Attu tool management.
Step 5: Verify the RAG performance
-
Click Studio at the top of the page, then select Create from Template.
-
Search for and select the Knowledge Retrieval + Chatbot template.
-
In the dialog box, click Create.
-
Select the Knowledge Retrieval node and set the knowledge base to the one you created in the previous step.
The workflow connects nodes in this order: START → KNOWLEDGE RETRIEVAL → LLM (qwen-max) → ANSWER. The query variable is
sys.query. -
Select the LLM node and set the model to qwen-max.
-
In the upper-right corner, click Publish, and then click Publish Update.
-
Click Run to open the test page. Enter a question related to the knowledge base content to verify the answer.