This topic describes the AI Advanced RAG feature and how to use it.
Features
This feature connects to popular RAG engines to automatically perform retrieval-augmented generation (RAG) before calling a large language model (LLM). It supports RAGFlow and the Model Studio knowledge base.
Runtime properties
-
Plugin execution stage:
Default stage. -
Plugin execution priority:
400.
Configuration
Basic configuration
|
Parameter |
Type |
Required |
Default |
Description |
|
|
string |
Required |
- |
The type of RAG engine. Valid values: |
|
|
string |
Optional |
append |
This parameter specifies how the RAG result is placed. |
RAGFlow configuration
If you set rag.rag_engine_type=ragflow, configure the following parameters.
|
Parameter |
Type |
Required |
Default |
Description |
|
|
string |
Required |
- |
The API key for calling the RAGFlow API. To obtain the API key, go to the RAGFlow console, click your profile picture in the upper-right corner, and then choose API > RAGFlow API. |
|
|
string |
Required |
- |
The service name of RAGFlow in the AI gateway. |
|
|
string |
Required |
- |
The service port of RAGFlow in the AI gateway. |
|
|
list[string] |
Required |
- |
The dataset ID to retrieve from RAGFlow. |
|
|
string |
Optional |
- |
The domain name used by the AI gateway to access RAGFlow. |
|
|
list[string] |
Optional |
- |
The document ID to retrieve from RAGFlow. |
|
|
float |
Optional |
0.2 |
The similarity threshold. Segments with a similarity score below this threshold are filtered out. |
|
|
integer |
Optional |
30 |
The maximum number of segments to return, ranked by similarity score. Segments with lower scores are filtered out. |
|
|
float |
Optional |
0.3 |
The weight of vector cosine similarity. If x represents the vector cosine similarity, |
|
|
integer |
Optional |
- |
The ID of the rerank model configured in the RAG engine. |
For more information about how to use the RAGFlow knowledge base, see Configure knowledge base.
For more information about the retrieval parameters for RAGFlow, see Retrieve chunks.
The following is a basic configuration example.
rag:
rag_engine_type: "ragflow"
ragflow:
api_key: "xxxxxxxx"
serviceFQDN: "xxxxxxxx"
servicePort: 80
dataset_ids:
- "xxxxxxxx"
document_ids:
- "xxxxxxxx"
similarity_threshold: 0.2
top_n: 5
vector_similarity_weight: 0.3
rerank_id: "gte-rerank"
Model Studio knowledge base configuration
If you set rag.rag_engine_type=bailian, configure the following parameters.
|
Parameter |
Type |
Required |
Default value |
Description |
|
|
string |
Required |
- |
The AccessKey for calling Model Studio. To obtain this value, see Member management. |
|
|
string |
Required |
- |
The AccessKey secret for calling Model Studio. To obtain this value, see Member management. |
|
|
string |
Required |
- |
The service name of the Model Studio service in the AI gateway. |
|
|
string |
Required |
- |
The ID of the Alibaba Cloud Model Studio workspace. To obtain this value, see Member management. |
|
|
string |
Required |
- |
The ID of the Alibaba Cloud Model Studio knowledge base. To obtain this value, see Knowledge Base API Guide. |
|
|
string |
Optional |
443 |
The service port of Model Studio in the AI gateway. |
|
|
string |
Optional |
bailian.cn-beijing.aliyuncs.com |
The domain name used by the AI gateway to access Model Studio. |
|
|
bool |
Optional |
false |
Specifies whether to enable rerank. |
|
|
float |
Optional |
The similarity threshold configured for the current knowledge base. |
This setting takes effect only when Rerank is enabled. It specifies the similarity threshold after reranking. Segments with a similarity score below this threshold are filtered out. The value range is |
|
|
integer |
Optional |
5 |
This parameter takes effect only when Rerank is enabled. It specifies the number of top segments to return after reranking, with a value range of |
|
|
string |
Optional |
gte-rerank-hybrid |
If rerank is enabled, this parameter specifies the rerank model. Supported models include |
|
|
bool |
Optional |
false |
Specifies whether to enable session rewriting. |
|
|
string |
Optional |
conv-rewrite-qwen-1.8b |
If session rewriting is enabled, this parameter specifies the session rewriting model. This model automatically adjusts the original user query based on the conversation history to improve retrieval results. The supported model is |
|
|
bool |
Optional |
false |
Specifies whether to save historical retrieval data. |
|
|
integer |
Optional |
100 |
Vector retrieval top-K generates a vector for the input text and retrieves from the knowledge base the K text chunks that are most similar to that vector. Value range: The sum of |
|
|
integer |
Optional |
100 |
Keyword retrieval top-K finds slices in the knowledge base that exactly match the keywords of the input text. It helps you filter out irrelevant text slices and provide more accurate results. The value range is The sum of |
For more information about how to use the Model Studio knowledge base, see Instructions for operating and using the Model Studio knowledge base.
For more information about the retrieval parameters for Model Studio, see Retrieve from a knowledge base.
The following is a basic configuration example.
rag:
rag_engine_type: bailian
bailian:
ak: xxxxxxxx
sk: xxxxxxxx
workspace_id: xxxxxxxx
index_id: xxxxxxxx
serviceFQDN: xxxxxxxx.dns
enable_reranking: true
rerank_min_score: 0.3
rerank_top_n: 5
save_retriever_history: false
Procedure
RAGFlow
-
Ensure the API can be used for text-based conversations with the model. Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario.
After you create the Model API, the API named bailian-llm appears in the Model API list. Its type is text generation, its domain name is
*.example.com(HTTP), and its model service is bailian.ai. You can Edit, Debug, or Delete this API from the Actions column. -
Create a RAGFlow retrieval service with **Fixed Address** as the service source. In the gateway instance console, navigate to Service > Create Service.
If your RAGFlow instance is deployed in a container within the same VPC, you can also create a service by using Container Service as the service source.
-
Service Source: Select **Fixed Address**.
-
Service Name: Enter a custom name, such as
ragflow. -
Service URL: The format is
IP:Port. Set the port to 80. -
TLS Mode: Keep it disabled (default).
-
Obtain the Ragflow service FQDN and port.
Click the RAGFlow service that you created in the previous step to view its details and obtain its FQDN. The default port for RAGFlow is 80. You can use this port or configure a different one as required.
-
Obtain the required information for the RAGFlow service.
-
Obtain the API key. Go to the RAGFlow console. In the upper-right corner, click your profile picture, and then in the left-side navigation pane, select API > API KEY to get the API key.
-
Obtain the Dataset ID. Go to the Knowledge Base page in the RAGFlow console. Click the knowledge base that you want to retrieve. The Dataset ID is the
idvalue in the page URL. -
(Optional) Obtain the Document ID. Go to the Knowledge Base page in the RAGFlow console. Click the knowledge base to retrieve, and then click the document name. The Document ID is the
doc_idvalue in the page URL.
-
-
Configure the plugin in the AI gateway. In the gateway instance console, navigate to Plug-in > Install Plug-in > AI > AI Advanced RAG, and click Install and Configure. Configure the plugin and its effective scope. Enter the required parameters (api_key, serviceFQDN, dataset_ids, and servicePort). Add optional parameters as needed. Enable the plugin to apply the configuration.
The configuration is in YAML format. Set the top-level field
rag_engine_typetoragflowand nest the other parameters under theragflownode. Optional parameters includedocument_ids,similarity_threshold(example: 0.2),top_n(example: 5),vector_similarity_weight(example: 0.3), andrerank_id(example: gte-rerank). For Effective Scope, select Instance-level. Enable the plugin and click Save. -
Debug and verify the result. In the gateway instance console, click Model API, select the target API, and click Debug. Verify the model's response to confirm that the RAG retrieval capability is active.
Model Studio knowledge base
-
Ensure the API can be used for text-based conversations with the model. Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario.
After you create the Model API, the API named bailian-llm appears in the Model API list. Its type is text generation, its domain name is
*.example.com(HTTP), and its model service is bailian.ai. You can Edit, Debug, or Delete this API from the Actions column. -
Create a Model Studio retrieval service with **DNS Domain Name** as the service source. In the gateway instance console, navigate to and configure the form.
-
Service Source: Select **DNS Domain Name**.
-
Service Name: Enter a custom name, such as
bailian-rag. -
Service URL: The format is
DNS domain name:Port, where the port is set to 443. -
TLS Mode: Select **One-way TLS**.
-
-
Click the retrieval service you just created to find its FQDN.
On the service details page, go to the Overview > Basic Information section and find the value of the FQDN field, for example,
bailian-rag.dns. -
Obtain the required information for the Model Studio knowledge base.
-
Obtain the AccessKey and AccessKey secret. Log on to the Alibaba Cloud RAM console and create an AccessKey. For more information, see Create an AccessKey.
NoteFor data security, we strongly recommend that you create a RAM user and use that user's AccessKey and AccessKey secret. Make sure that the RAM user meets the following requirements:
-
The RAM user must have the
AliyunBailianDataFullAccessorsfm:Retrievepermission. For more information about how to grant permissions, see Manage RAM user permissions. -
The RAM user must be added to the Model Studio workspace. For more information, see Member management.
-
-
Obtain the workspace ID. Go to the Model Studio application page. In the lower-left corner, click your account to view the workspace details and get the workspace ID.
-
Obtain the knowledge base ID. Navigate to **Data** > **Knowledge Base**. Select the target knowledge base and note its ID.
-
-
Configure the plugin in the AI gateway. In the gateway instance console, navigate to Plug-in > Install Plug-in > AI > AI Advanced RAG, and click Install and Configure. Configure the plugin and its effective scope. Enter the required parameters (ak, sk, serviceFQDN, workspace_id, and index_id). You can add optional parameters as needed. Enable the plugin to apply the configuration.
For the Effective Scope, select Model API-level plugin rules. In the plugin rule YAML editor, under the
ragnode, setrag_engine_typetobailian. In thebailiansub-node, enter the required parameters. For example, setserviceFQDNtobailian-rag.dns. In the API selection panel above the editor, select the target API to bind.After the configuration is complete, a rule entry associated with bailian-llm appears in the Model API-level plugin rules list. The rule is not enabled by default.
-
Debug and verify the result. In the gateway instance console, click Model API, select the target API, and click Debug. Verify the model's response to confirm that the RAG retrieval capability is active.