This topic describes AI retrieval-augmented generation (enhanced) and how to use it.
Feature description
This feature connects to popular retrieval-augmented generation (RAG) engines to automatically retrieve information before calling a large language model (LLM). This feature supports RAGFlow and Model Studio knowledge bases.
Runtime properties
Plugin execution stage:
Default stage.Plugin execution priority:
400.
Configuration
Basic configuration
Name | Data type | Required | Default value | Description |
| string | Required | - | The RAG engine type. Supported enumeration types include ragflow and bailian. |
| string | Optional | append | Specifies how to place the RAG result. If you set this parameter to replace, the |
RAGFlow configuration
If you set rag.rag_engine_type=ragflow, configure the following parameters.
Name | Data type | Required | Default value | Description |
| string | Required | - | The API key for calling the RAGFlow API. To obtain the API key, go to the RAGFlow console, click your profile picture in the upper-right corner, and then choose API > RAGFlow API. |
| string | Required | - | The service name of RAGFlow in the AI gateway. |
| string | Required | - | The service port of RAGFlow in the AI gateway. |
| list[string] | Required | - | The ID of the dataset to retrieve from RAGFlow. |
| string | Optional | - | The domain name used by the AI gateway to access RAGFlow. |
| list[string] | Optional | - | The ID of the document to retrieve from RAGFlow. |
| float | Optional | 0.2 | The similarity threshold. Segments with a similarity score below this threshold are filtered out. |
| integer | Optional | 30 | The number of segments with the highest similarity scores to return. Other segments are filtered out. |
| float | Optional | 0.3 | The weight of the vector cosine similarity. If x represents the vector cosine similarity, |
| integer | Optional | - | The ID of the rerank model configured in the RAG engine. |
For more information about how to use the RAGFlow knowledge base, see Configure knowledge base.
For more information about the retrieval parameters for RAGFlow, see Retrieve chunks.
The following is a basic configuration example.
rag:
rag_engine_type: "ragflow"
ragflow:
api_key: "xxxxxxxx"
serviceFQDN: "xxxxxxxx"
servicePort: 80
dataset_ids:
- "xxxxxxxx"
document_ids:
- "xxxxxxxx"
similarity_threshold: 0.2
top_n: 5
vector_similarity_weight: 0.3
rerank_id: "gte-rerank"Model Studio knowledge base configuration
If you set rag.rag_engine_type=bailian, configure the following parameters.
Name | Data type | Required | Default value | Description |
| string | Required | - | The access key to call Model Studio. For more information, see Member management. |
| string | Required | - | The AccessKey secret for calling Model Studio. For more information, see Member management. |
| string | Required | - | The service name of the Model Studio service in the AI gateway. |
| string | Required | - | The workspace ID of Alibaba Cloud Model Studio. For more information, see Member management. |
| string | Required | - | The knowledge base ID of Alibaba Cloud Model Studio. For more information, see Knowledge Base API Guide. |
| string | Optional | 443 | The service port of Model Studio in the AI gateway. |
| string | Optional | bailian.cn-beijing.aliyuncs.com | The domain name used by the AI gateway to access Model Studio. |
| bool | Optional | false | Specifies whether to enable reranking. |
| float | Optional | The similarity threshold configured for the current knowledge base | This parameter takes effect only when reranking is enabled. It specifies the similarity threshold after reranking. Segments with a similarity score below this threshold are filtered out. Valid values: |
| integer | Optional | 5 | This parameter takes effect only when reranking is enabled. It specifies the number of top_n segments to return after reranking. Valid values: |
| string | Optional | gte-rerank-hybrid | This parameter takes effect only when reranking is enabled. It specifies the rerank model. Supported models include gte-rerank-hybrid and gte-rerank. |
| bool | Optional | false | Specifies whether to enable session rewriting. |
| string | Optional | conv-rewrite-qwen-1.8b | This parameter takes effect only when session rewriting is enabled. It specifies the name of the session rewriting model. This model automatically adjusts the original user query based on the session context to improve retrieval results. Supported models include conv-rewrite-qwen-1.8b. |
| bool | Optional | false | Specifies whether to save historical retrieval data. |
| integer | Optional | 100 | The number of top K results for vector retrieval. This feature generates a vector for the input text and retrieves the K text segments from the knowledge base that are most similar to the input vector. Valid values: The sum of dense_similarity_top_k and sparse_similarity_top_k must be less than or equal to 200. |
| integer | Optional | 100 | The number of top K results for keyword retrieval. This feature finds segments in the knowledge base that exactly match the keywords in the input text. It helps filter out irrelevant text segments and provides more accurate results. Valid values: The sum of dense_similarity_top_k and sparse_similarity_top_k must be less than or equal to 200. |
For more information about how to use the Model Studio knowledge base, see Instructions for operating and using the Model Studio knowledge base.
For more information about the retrieval parameters for Model Studio, see Retrieve - Retrieve knowledge index.
The following is a basic configuration example.
rag:
rag_engine_type: bailian
bailian:
ak: xxxxxxxx
sk: xxxxxxxx
workspace_id: xxxxxxxx
index_id: xxxxxxxx
serviceFQDN: xxxxxxxx.dns
enable_reranking: true
rerank_min_score: 0.3
rerank_top_n: 5
save_retriever_history: falseProcedure
RAGFlow
Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario. Verify that you can access the model for text-based conversations through the Model API.

Create a RAGFlow retrieval service using a fixed address as the service source. In the gateway instance console, click Service in the navigation pane on the left, and then click Create Service. Configure the parameters as follows.
If your RAGFlow instance is deployed in a container within the same VPC, you can also create a service using a container service as the source.
Service Source: Select Fixed Address.
Service Name: Enter a custom service name, such as
ragflow.Service Address: Enter the address in the
IP:Portformat. Set the port to 80.TLS Mode: Keep the default setting, which is disabled.
Obtain the FQDN and port of the RAGFlow service.
Click the RAGFlow service that you created in the previous step to view its details and obtain its FQDN. The default port for RAGFlow is 80. You can use this port or configure a different one as required.
Obtain the required information for the RAGFlow service.
Obtain the API key. Go to the RAGFlow console. In the upper-right corner, click your profile picture. In the navigation pane on the left, select API, and then click API KEY to obtain the API key.

Obtain the Dataset ID. Go to the knowledge base page in the RAGFlow console. Click the knowledge base that you want to retrieve. The Dataset ID is the 'id' value in the page URL.

(Optional) Obtain the Document ID. Go to the knowledge base page in the RAGFlow console. Click the knowledge base that you want to retrieve, and then click the name of the document that you want to retrieve. The Document ID is the 'doc_id' value in the page URL.

Configure the plugin in the AI gateway. In the gateway instance console, choose . Click Install and Configure and then set the effective scope. Enter the required parameters obtained from the previous steps (api_key, serviceFQDN, dataset_ids, and servicePort). You can add optional parameters as needed. The configuration takes effect after you click Enable.

Test and verify the result. In the gateway instance console, click Model API. Select the API that the plugin applies to and click Debugging. Verify the model's response now that the RAG retrieval capability is enabled.
Model Studio knowledge base
Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario. Verify that you can access the model for text-based conversations through the Model API.

Create a Model Studio retrieval service with a DNS domain name as the service source. In the console of the gateway instance, navigate to . Configure the parameters as follows:
Service Source: Select DNS Domain Name.
Service Name: Enter a custom service name, such as
bailian-rag.Service Address: Enter the address in the
DNS Domain Name:Portformat. Set the port to 443.TLS Mode: Select One-way TLS.
Obtain the FQDN of the Model Studio retrieval service. Click the Model Studio retrieval service that you created in the previous step to view its details and obtain its FQDN.

Obtain the information for the Model Studio knowledge base that you want to access.
Obtain the AccessKey and AccessKey secret. Log on to the Alibaba Cloud RAM console. For more information, see Create an AccessKey.
NoteFor data security, we recommend that you create a RAM user and use that user's AccessKey and AccessKey secret. Ensure that the RAM user:
Is granted the AliyunBailianDataFullAccess or sfm:Retrieve operation permission. For more information, see Grant permissions to a RAM user.
Is a member of the Model Studio workspace. For more information, see Member management.
Obtain the knowledge base namespace ID. Go to the Model Studio application page. In the lower-left corner, click your account to view the space details and obtain the workspace ID. This ID serves as the namespace ID.
Obtain the knowledge base ID. Navigate to Data > Knowledge Base. Select the knowledge base that you want to use for external retrieval to obtain its ID.
Configure the plugin in the AI gateway. In the console of the gateway instance, choose . Click Install And Configure, and then configure the effective scope. Enter the required parameters obtained in the previous steps (ak, sk, serviceFQDN, workspace_id, and index_id). You can add optional parameters as needed. As shown in the following figures, after you click Enable, the configuration takes effect.


Test and verify the result. In the gateway instance console, click Model API. Select the API affected by the plugin and click Debugging. Verify the model's response with the RAG retrieval capability enabled.