Large language models (LLMs) may lack enterprise-specific or real-time data. Retrieval-Augmented Generation (RAG) technology enhances the accuracy and relevance of model responses by providing LLMs with access to private knowledge bases. This topic describes how to develop and deploy a RAG-based application in LangStudio.
Background information
In the realm of modern information retrieval, RAG models combine the advantages of information retrieval and generative artificial intelligence to deliver more accurate and relevant answers in specific scenarios. For example, in specialized fields such as finance and healthcare, users often require accurate and pertinent information for decision-making. Traditional generative models excel in natural language understanding and generation but may lack accuracy in specialized knowledge. RAG models effectively improve the accuracy and contextual relevance of answers by integrating retrieval and generation technologies. This topic provides a RAG-based application for the finance and healthcare fields by using Platform for AI (PAI) as the core platform.
Prerequisites
LangStudio supports Faiss or Milvus as its vector database. If you want to use Milvus, you must first create a Milvus database..
NoteIn most cases, Faiss is used in test environments without the need to create an additional database. In production environments, we recommend that you use Milvus, which can process larger volumes of data.
The data required for the RAG knowledge base has been uploaded to OSS.
1. (Optional) Deploy LLM and embedding model
The RAG-based application flow requires both LLM and embedding model. This section describes how to quickly deploy the required model services through Model Gallery. If the deployed model service meets your business requirements and supports OpenAI APIs, you can skip this step and use it directly.
Choose QuickStart > Model Gallery and deploy the models for the following two scenarios.
Make sure that you select an LLM that has been fine-tuned based on instructions. Base models cannot reliably follow user instructions to answer questions.
Select large-language-model in the Scenarios section and deploy DeepSeek-R1-Distill-Qwen-7B.

Select embedding in the Scenarios section and deploy bge-m3 embedding model.

2. Create a connection
The LLM and embedding connections created in this topic are based on the Elastic Algorithm Service (EAS) model services deployed in QuickStart > Model Gallery. For information about other connection types, see Create a connection.
2.1 Create an LLM connection
Go to LangStudio, select a workspace, and then choose Connection > Model Service. On the tab that appears, click New Connection to create a general LLM model service connection.

The following table describes the key parameters.
Parameter | Description |
Name | If you deploy a model through Model Gallery, you can obtain the model name on the model details page. To go to the model details page, click the related model card on the Model Gallery page. For more information, see the "Model service" section of Create a connection. |
Service Provider |
|
2.2 Create an embedding connection
You can create an embedding connection by referring to 2.1 Create an LLM connection.

2.3 Create a vector database connection
On the Application Development (LangStudio) page, choose Connection > Database. and click Create Connection. On the tab that appears, click New Connection to create a Milvus database connection.

The following table describes the key parameters.
Parameter | Description |
uri | The endpoint of the Milvus instance, in the format of |
token | The username and password for logging on to the Milvus instance, in the format of |
database | The database name. The default database |
3. Create a knowledge base index
You must create a knowledge base index to parse, chunk, vectorize, and store the corpus in the vector database. The following table describes the key parameters. For information about other configurations, see Create a knowledge base index.
Parameter | Description |
Basic Configurations | |
Data Source OSS Path | Set the value to the OSS path of the RAG knowledge base in Prerequisites. |
Output OSS Path | Set the value to the path for storing intermediate results and index information generated during document parsing. Important If you use FAISS as the vector database, the application flow saves the generated index files to OSS. If you use a default role of PAI (set Instance RAM Role to Default Roles of PAI on the Start Runtime page), the application flow can access the default storage bucket of your workspace. Therefore, we recommend that you set this parameter to a directory of the OSS bucket where the storage path for the workspace resides. If you use a custom role, you must grant OSS access permissions to the custom role. We recommend that you attach the AliyunOSSFullAccess policy to the role. |
Embedding Model and Databases | |
Embedding Type | Select General Embedding Model. |
Embedding Connection | Select the embedding connection created in 2.2 Create an embedding connection. |
Vector Database Type | Select Vector Database Milvus. |
Vector Database Connection | Select the Milvus database connection created in 2.3 Create a vector database connection. |
Table Name | Set the value to the collection of the Milvus database created in Prerequisites. |
VPC Configuration | |
VPC | Select the same VPC as that of the Milvus instance or a VPC that is connected to the VPC where the Milvus instance resides. |
4. Create and run a RAG-based application flow
Go to LangStudio, select a workspace, and then click the Application Flow tab. On the tab that appears, click Create Application Flow to create a RAG-based application flow.

On the application flow details page, click Create Runtime to create a runtime and start it. Note: Make sure that the runtime is started before the system parses a Python node or you can view More Tools.

Key parameter:
VPC: Select the same VPC as that of the Milvus instance in Prerequisites or a VPC that is connected to the VPC where the Milvus instance resides.
Develop the application flow.

Retain the default settings for the nodes or configure them based on your business requirements. Use the following settings for the key nodes:
Knowledge Retrieval: retrieves text relevant to user questions from the knowledge base.
Index Name: Select the knowledge base index created in 3. Create a knowledge base index.
Top K: The number of top matches to return.
LLM: uses the retrieved documents as context, sends the documents together with user questions to the LLM, and then generates an answer.
Model Configuration: Select the connection created in 2.1 Create an LLM connection.
Chat History: If you turn on this switch, the chat history feature is enabled, and previous conversations are used as input variables.
For more information about each component, see Develop an application flow.
Debug or run the application flow. Click Run in the upper-right corner of the details page to run the application flow. For common issues during application flow runtime, see FAQ.

View the traces. Click View Traces below the generated answer to view the trace details or topology view.

5. Deploy the application flow
On the development page of the application flow, click Deploy in the upper-right corner to deploy the application flow as an EAS service. Retain the default settings for the parameters or configure them based on your business requirements. Use the following settings for the key parameters:
Instances in the Resource Information section: Enter the number of service instances. Set this parameter to 1 for testing purposes. In production environments, we recommend that you configure multiple instances to mitigate the risk of a single point of failure (SPOF).
VPC (VPC) in the VPC section: Select the same VPC as that of the Milvus instance or a VPC that is connected to the VPC where the Milvus instance resides.
For more information about deployment, see Deploy an application flow.
6. Call the service
After the deployment is successful, you are redirected to the Elastic Algorithm Service (EAS) page of PAI. On the Online Debugging tab, configure and send a request. The key in the request body must be consistent with the Chat Input field in the Start Node of the application flow. In this example, the default field question is used.

For more methods to call the service (such as API operations) and detailed instructions, see Call a service.