AI Advanced RAG - API Gateway - Alibaba Cloud Documentation Center

Features

This feature connects to popular RAG engines to automatically perform retrieval-augmented generation (RAG) before calling a large language model (LLM). It supports RAGFlow and the Model Studio knowledge base.

Runtime properties

Plugin execution stage: Default stage.
Plugin execution priority: 400.

Configuration

Basic configuration

Parameter	Type	Required	Default	Description
`rag.rag_engine_type`	string	Required	-	The type of RAG engine. Valid values: `ragflow` and `bailian`.
`result_placement_type`	string	Optional	append	This parameter specifies how the RAG result is placed. `replace`: Replaces the`{{higressRagResultReplacementKey}}` placeholder in the system prompt template with the RAG content. `append`: Appends the RAG content to the user template.

RAGFlow configuration

If you set rag.rag_engine_type=ragflow, configure the following parameters.

Parameter	Type	Required	Default	Description
`rag.ragflow.api_key`	string	Required	-	The API key for calling the RAGFlow API. To obtain the API key, go to the RAGFlow console, click your profile picture in the upper-right corner, and then choose API > RAGFlow API.
`rag.ragflow.serviceFQDN`	string	Required	-	The service name of RAGFlow in the AI gateway.
`rag.ragflow.servicePort`	string	Required	-	The service port of RAGFlow in the AI gateway.
`rag.ragflow.dataset_ids`	list[string]	Required	-	The dataset ID to retrieve from RAGFlow.
`rag.ragflow.serviceHost`	string	Optional	-	The domain name used by the AI gateway to access RAGFlow.
`rag.ragflow.document_ids`	list[string]	Optional	-	The document ID to retrieve from RAGFlow.
`rag.ragflow.similarity_threshold`	float	Optional	0.2	The similarity threshold. Segments with a similarity score below this threshold are filtered out.
`rag.ragflow.top_n`	integer	Optional	30	The maximum number of segments to return, ranked by similarity score. Segments with lower scores are filtered out.
`rag.ragflow.vector_similarity_weight`	float	Optional	0.3	The weight of vector cosine similarity. If x represents the vector cosine similarity, `(1-x)` represents the semantic similarity weight.
`rag.ragflow.rerank_id`	integer	Optional	-	The ID of the rerank model configured in the RAG engine.

For more information about how to use the RAGFlow knowledge base, see Configure knowledge base.

For more information about the retrieval parameters for RAGFlow, see Retrieve chunks.

The following is a basic configuration example.

rag:
  rag_engine_type: "ragflow"
  ragflow:
    api_key: "xxxxxxxx"
    serviceFQDN: "xxxxxxxx"
    servicePort: 80
    dataset_ids:
      - "xxxxxxxx"
    document_ids:
      - "xxxxxxxx"
    similarity_threshold: 0.2
    top_n: 5
    vector_similarity_weight: 0.3
    rerank_id: "gte-rerank"

Model Studio knowledge base configuration

If you set rag.rag_engine_type=bailian, configure the following parameters.

Parameter	Type	Required	Default value	Description
`rag.bailian.ak`	string	Required	-	The AccessKey for calling Model Studio. To obtain this value, see Member management.
`rag.bailian.sk`	string	Required	-	The AccessKey secret for calling Model Studio. To obtain this value, see Member management.
`rag.bailian.serviceFQDN`	string	Required	-	The service name of the Model Studio service in the AI gateway.
`rag.bailian.workspace_id`	string	Required	-	The ID of the Alibaba Cloud Model Studio workspace. To obtain this value, see Member management.
`rag.bailian.index_id`	string	Required	-	The ID of the Alibaba Cloud Model Studio knowledge base. To obtain this value, see Knowledge Base API Guide.
`rag.bailian.servicePort`	string	Optional	443	The service port of Model Studio in the AI gateway.
`rag.bailian.serviceHost`	string	Optional	bailian.cn-beijing.aliyuncs.com	The domain name used by the AI gateway to access Model Studio.
`rag.bailian.enable_reranking`	bool	Optional	false	Specifies whether to enable rerank.
`rag.bailian.rerank_min_score`	float	Optional	The similarity threshold configured for the current knowledge base.	This setting takes effect only when Rerank is enabled. It specifies the similarity threshold after reranking. Segments with a similarity score below this threshold are filtered out. The value range is `[0.01 - 1.00]`.
`rag.bailian.rerank_top_n`	integer	Optional	5	This parameter takes effect only when Rerank is enabled. It specifies the number of top segments to return after reranking, with a value range of `[1 - 20]`.
`rag.bailian.rerank_model`	string	Optional	gte-rerank-hybrid	If rerank is enabled, this parameter specifies the rerank model. Supported models include `gte-rerank-hybrid` and `gte-rerank`.
`rag.bailian.enable_rewrite`	bool	Optional	false	Specifies whether to enable session rewriting.
`rag.bailian.rewrite_model`	string	Optional	conv-rewrite-qwen-1.8b	If session rewriting is enabled, this parameter specifies the session rewriting model. This model automatically adjusts the original user query based on the conversation history to improve retrieval results. The supported model is `conv-rewrite-qwen-1.8b`.
`rag.bailian.save_retriever_history`	bool	Optional	false	Specifies whether to save historical retrieval data.
`rag.bailian.dense_similarity_top_k`	integer	Optional	100	Vector retrieval top-K generates a vector for the input text and retrieves from the knowledge base the K text chunks that are most similar to that vector. Value range: `[0 - 100]`. The sum of `dense_similarity_top_k` and `sparse_similarity_top_k` must be less than or equal to 200.
`rag.bailian.sparse_similarity_top_k`	integer	Optional	100	Keyword retrieval top-K finds slices in the knowledge base that exactly match the keywords of the input text. It helps you filter out irrelevant text slices and provide more accurate results. The value range is `[0 - 100]`. The sum of `dense_similarity_top_k` and `sparse_similarity_top_k` must be less than or equal to 200.

For more information about how to use the Model Studio knowledge base, see Instructions for operating and using the Model Studio knowledge base.

For more information about the retrieval parameters for Model Studio, see Retrieve from a knowledge base.

The following is a basic configuration example.

rag:
  rag_engine_type: bailian
  bailian:
    ak: xxxxxxxx
    sk: xxxxxxxx
    workspace_id: xxxxxxxx
    index_id: xxxxxxxx
    serviceFQDN: xxxxxxxx.dns
    enable_reranking: true
    rerank_min_score: 0.3
    rerank_top_n: 5
    save_retriever_history: false

Procedure

RAGFlow

Ensure the API can be used for text-based conversations with the model. Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario.

After you create the Model API, the API named bailian-llm appears in the Model API list. Its type is text generation, its domain name is *.example.com (HTTP), and its model service is bailian.ai. You can Edit, Debug, or Delete this API from the Actions column.
Create a RAGFlow retrieval service with **Fixed Address** as the service source. In the gateway instance console, navigate to Service > Create Service.

If your RAGFlow instance is deployed in a container within the same VPC, you can also create a service by using Container Service as the service source.
- Service Source: Select **Fixed Address**.
- Service Name: Enter a custom name, such as ragflow.
- Service URL: The format is IP:Port. Set the port to 80.
- TLS Mode: Keep it disabled (default).
Obtain the Ragflow service FQDN and port.

Click the RAGFlow service that you created in the previous step to view its details and obtain its FQDN. The default port for RAGFlow is 80. You can use this port or configure a different one as required.
Obtain the required information for the RAGFlow service.
1. Obtain the API key. Go to the RAGFlow console. In the upper-right corner, click your profile picture, and then in the left-side navigation pane, select API > API KEY to get the API key.
2. Obtain the Dataset ID. Go to the Knowledge Base page in the RAGFlow console. Click the knowledge base that you want to retrieve. The Dataset ID is the id value in the page URL.
3. (Optional) Obtain the Document ID. Go to the Knowledge Base page in the RAGFlow console. Click the knowledge base to retrieve, and then click the document name. The Document ID is the doc_id value in the page URL.
Configure the plugin in the AI gateway. In the gateway instance console, navigate to Plug-in > Install Plug-in > AI > AI Advanced RAG, and click Install and Configure. Configure the plugin and its effective scope. Enter the required parameters (api_key, serviceFQDN, dataset_ids, and servicePort). Add optional parameters as needed. Enable the plugin to apply the configuration.

The configuration is in YAML format. Set the top-level field rag_engine_type to ragflow and nest the other parameters under the ragflow node. Optional parameters include document_ids, similarity_threshold (example: 0.2), top_n (example: 5), vector_similarity_weight (example: 0.3), and rerank_id (example: gte-rerank). For Effective Scope, select Instance-level. Enable the plugin and click Save.
Debug and verify the result. In the gateway instance console, click Model API, select the target API, and click Debug. Verify the model's response to confirm that the RAG retrieval capability is active.

Model Studio knowledge base

Ensure the API can be used for text-based conversations with the model. Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario.

After you create the Model API, the API named bailian-llm appears in the Model API list. Its type is text generation, its domain name is *.example.com (HTTP), and its model service is bailian.ai. You can Edit, Debug, or Delete this API from the Actions column.
Create a Model Studio retrieval service with **DNS Domain Name** as the service source. In the gateway instance console, navigate to Service > Create Service and configure the form.
- Service Source: Select **DNS Domain Name**.
- Service Name: Enter a custom name, such as bailian-rag.
- Service URL: The format is DNS domain name:Port, where the port is set to 443.
- TLS Mode: Select **One-way TLS**.
Click the retrieval service you just created to find its FQDN.

On the service details page, go to the Overview > Basic Information section and find the value of the FQDN field, for example, bailian-rag.dns.
Obtain the required information for the Model Studio knowledge base.
1. Obtain the AccessKey and AccessKey secret. Log on to the Alibaba Cloud RAM console and create an AccessKey. For more information, see Create an AccessKey.
  Note
  For data security, we strongly recommend that you create a RAM user and use that user's AccessKey and AccessKey secret. Make sure that the RAM user meets the following requirements:
  - The RAM user must have the AliyunBailianDataFullAccess or sfm:Retrieve permission. For more information about how to grant permissions, see Manage RAM user permissions.
  - The RAM user must be added to the Model Studio workspace. For more information, see Member management.
2. Obtain the workspace ID. Go to the Model Studio application page. In the lower-left corner, click your account to view the workspace details and get the workspace ID.
3. Obtain the knowledge base ID. Navigate to **Data** > **Knowledge Base**. Select the target knowledge base and note its ID.
Configure the plugin in the AI gateway. In the gateway instance console, navigate to Plug-in > Install Plug-in > AI > AI Advanced RAG, and click Install and Configure. Configure the plugin and its effective scope. Enter the required parameters (ak, sk, serviceFQDN, workspace_id, and index_id). You can add optional parameters as needed. Enable the plugin to apply the configuration.

For the Effective Scope, select Model API-level plugin rules. In the plugin rule YAML editor, under the rag node, set rag_engine_type to bailian. In the bailian sub-node, enter the required parameters. For example, set serviceFQDN to bailian-rag.dns. In the API selection panel above the editor, select the target API to bind.

After the configuration is complete, a rule entry associated with bailian-llm appears in the Model API-level plugin rules list. The rule is not enabled by default.
Debug and verify the result. In the gateway instance console, click Model API, select the target API, and click Debug. Verify the model's response to confirm that the RAG retrieval capability is active.