AI retrieval-augmented generation (enhanced) - API Gateway

This topic describes AI retrieval-augmented generation (enhanced) and how to use it.

Feature description

This feature connects to popular retrieval-augmented generation (RAG) engines to automatically retrieve information before calling a large language model (LLM). This feature supports RAGFlow and Model Studio knowledge bases.

Runtime properties

Plugin execution stage: Default stage.
Plugin execution priority: 400.

Configuration

Basic configuration

Name	Data type	Required	Default value	Description
`rag.rag_engine_type`	string	Required	-	The RAG engine type. Supported enumeration types include ragflow and bailian.
`result_placement_type`	string	Optional	append	Specifies how to place the RAG result. If you set this parameter to replace, the `{{higressRagResultReplacementKey}}` placeholder in the system prompt template is replaced with the content generated by RAG. If you set this parameter to append, the content generated by RAG is appended to the user template. Supported enumeration types include append and replace.

RAGFlow configuration

If you set rag.rag_engine_type=ragflow, configure the following parameters.

Name	Data type	Required	Default value	Description
`rag.ragflow.api_key`	string	Required	-	The API key for calling the RAGFlow API. To obtain the API key, go to the RAGFlow console, click your profile picture in the upper-right corner, and then choose API > RAGFlow API.
`rag.ragflow.serviceFQDN`	string	Required	-	The service name of RAGFlow in the AI gateway.
`rag.ragflow.servicePort`	string	Required	-	The service port of RAGFlow in the AI gateway.
`rag.ragflow.dataset_ids`	list[string]	Required	-	The ID of the dataset to retrieve from RAGFlow.
`rag.ragflow.serviceHost`	string	Optional	-	The domain name used by the AI gateway to access RAGFlow.
`rag.ragflow.document_ids`	list[string]	Optional	-	The ID of the document to retrieve from RAGFlow.
`rag.ragflow.similarity_threshold`	float	Optional	0.2	The similarity threshold. Segments with a similarity score below this threshold are filtered out.
`rag.ragflow.top_n`	integer	Optional	30	The number of segments with the highest similarity scores to return. Other segments are filtered out.
`rag.ragflow.vector_similarity_weight`	float	Optional	0.3	The weight of the vector cosine similarity. If x represents the vector cosine similarity, `(1-x)` represents the weight of the semantic similarity.
`rag.ragflow.rerank_id`	integer	Optional	-	The ID of the rerank model configured in the RAG engine.

For more information about how to use the RAGFlow knowledge base, see Configure knowledge base.

For more information about the retrieval parameters for RAGFlow, see Retrieve chunks.

The following is a basic configuration example.

rag:
  rag_engine_type: "ragflow"
  ragflow:
    api_key: "xxxxxxxx"
    serviceFQDN: "xxxxxxxx"
    servicePort: 80
    dataset_ids:
      - "xxxxxxxx"
    document_ids:
      - "xxxxxxxx"
    similarity_threshold: 0.2
    top_n: 5
    vector_similarity_weight: 0.3
    rerank_id: "gte-rerank"

Model Studio knowledge base configuration

If you set rag.rag_engine_type=bailian, configure the following parameters.

Name	Data type	Required	Default value	Description
`rag.bailian.ak`	string	Required	-	The access key to call Model Studio. For more information, see Member management.
`rag.bailian.sk`	string	Required	-	The AccessKey secret for calling Model Studio. For more information, see Member management.
`rag.bailian.serviceFQDN`	string	Required	-	The service name of the Model Studio service in the AI gateway.
`rag.bailian.workspace_id`	string	Required	-	The workspace ID of Alibaba Cloud Model Studio. For more information, see Member management.
`rag.bailian.index_id`	string	Required	-	The knowledge base ID of Alibaba Cloud Model Studio. For more information, see Knowledge Base API Guide.
`rag.bailian.servicePort`	string	Optional	443	The service port of Model Studio in the AI gateway.
`rag.bailian.serviceHost`	string	Optional	bailian.cn-beijing.aliyuncs.com	The domain name used by the AI gateway to access Model Studio.
`rag.bailian.enable_reranking`	bool	Optional	false	Specifies whether to enable reranking.
`rag.bailian.rerank_min_score`	float	Optional	The similarity threshold configured for the current knowledge base	This parameter takes effect only when reranking is enabled. It specifies the similarity threshold after reranking. Segments with a similarity score below this threshold are filtered out. Valid values: `[0.01 - 1.00]`.
`rag.bailian.rerank_top_n`	integer	Optional	5	This parameter takes effect only when reranking is enabled. It specifies the number of top_n segments to return after reranking. Valid values: `[1 - 20]`.
`rag.bailian.rerank_model`	string	Optional	gte-rerank-hybrid	This parameter takes effect only when reranking is enabled. It specifies the rerank model. Supported models include gte-rerank-hybrid and gte-rerank.
`rag.bailian.enable_rewrite`	bool	Optional	false	Specifies whether to enable session rewriting.
`rag.bailian.rewrite_model`	string	Optional	conv-rewrite-qwen-1.8b	This parameter takes effect only when session rewriting is enabled. It specifies the name of the session rewriting model. This model automatically adjusts the original user query based on the session context to improve retrieval results. Supported models include conv-rewrite-qwen-1.8b.
`rag.bailian.save_retriever_history`	bool	Optional	false	Specifies whether to save historical retrieval data.
`rag.bailian.dense_similarity_top_k`	integer	Optional	100	The number of top K results for vector retrieval. This feature generates a vector for the input text and retrieves the K text segments from the knowledge base that are most similar to the input vector. Valid values: `[0 - 100]`. The sum of dense_similarity_top_k and sparse_similarity_top_k must be less than or equal to 200.
`rag.bailian.sparse_similarity_top_k`	integer	Optional	100	The number of top K results for keyword retrieval. This feature finds segments in the knowledge base that exactly match the keywords in the input text. It helps filter out irrelevant text segments and provides more accurate results. Valid values: `[0 - 100]`. The sum of dense_similarity_top_k and sparse_similarity_top_k must be less than or equal to 200.

For more information about how to use the Model Studio knowledge base, see Instructions for operating and using the Model Studio knowledge base.

For more information about the retrieval parameters for Model Studio, see Retrieve - Retrieve knowledge index.

The following is a basic configuration example.

rag:
  rag_engine_type: bailian
  bailian:
    ak: xxxxxxxx
    sk: xxxxxxxx
    workspace_id: xxxxxxxx
    index_id: xxxxxxxx
    serviceFQDN: xxxxxxxx.dns
    enable_reranking: true
    rerank_min_score: 0.3
    rerank_top_n: 5
    save_retriever_history: false

Procedure

RAGFlow

Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario. Verify that you can access the model for text-based conversations through the Model API.
Create a RAGFlow retrieval service using a fixed address as the service source. In the gateway instance console, click Service in the navigation pane on the left, and then click Create Service. Configure the parameters as follows.
If your RAGFlow instance is deployed in a container within the same VPC, you can also create a service using a container service as the source.
- Service Source: Select Fixed Address.
- Service Name: Enter a custom service name, such as ragflow.
- Service Address: Enter the address in the IP:Port format. Set the port to 80.
- TLS Mode: Keep the default setting, which is disabled.
Obtain the FQDN and port of the RAGFlow service.
Click the RAGFlow service that you created in the previous step to view its details and obtain its FQDN. The default port for RAGFlow is 80. You can use this port or configure a different one as required.
Obtain the required information for the RAGFlow service.
1. Obtain the API key. Go to the RAGFlow console. In the upper-right corner, click your profile picture. In the navigation pane on the left, select API, and then click API KEY to obtain the API key.
2. Obtain the Dataset ID. Go to the knowledge base page in the RAGFlow console. Click the knowledge base that you want to retrieve. The Dataset ID is the 'id' value in the page URL.
3. (Optional) Obtain the Document ID. Go to the knowledge base page in the RAGFlow console. Click the knowledge base that you want to retrieve, and then click the name of the document that you want to retrieve. The Document ID is the 'doc_id' value in the page URL.
Configure the plugin in the AI gateway. In the gateway instance console, choose Plug-in > Install Plug-in > AI > Al Advanced RAG. Click Install and Configure and then set the effective scope. Enter the required parameters obtained from the previous steps (api_key, serviceFQDN, dataset_ids, and servicePort). You can add optional parameters as needed. The configuration takes effect after you click Enable.
Test and verify the result. In the gateway instance console, click Model API. Select the API that the plugin applies to and click Debugging. Verify the model's response now that the RAG retrieval capability is enabled.

Model Studio knowledge base

Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario. Verify that you can access the model for text-based conversations through the Model API.
Create a Model Studio retrieval service with a DNS domain name as the service source. In the console of the gateway instance, navigate to Service > Create Service. Configure the parameters as follows:
- Service Source: Select DNS Domain Name.
- Service Name: Enter a custom service name, such as bailian-rag.
- Service Address: Enter the address in the DNS Domain Name:Port format. Set the port to 443.
- TLS Mode: Select One-way TLS.
Obtain the FQDN of the Model Studio retrieval service. Click the Model Studio retrieval service that you created in the previous step to view its details and obtain its FQDN.
Obtain the information for the Model Studio knowledge base that you want to access.
1. Obtain the AccessKey and AccessKey secret. Log on to the Alibaba Cloud RAM console. For more information, see Create an AccessKey.
  Note
  For data security, we recommend that you create a RAM user and use that user's AccessKey and AccessKey secret. Ensure that the RAM user:
  - Is granted the AliyunBailianDataFullAccess or sfm:Retrieve operation permission. For more information, see Grant permissions to a RAM user.
  - Is a member of the Model Studio workspace. For more information, see Member management.
2. Obtain the knowledge base namespace ID. Go to the Model Studio application page. In the lower-left corner, click your account to view the space details and obtain the workspace ID. This ID serves as the namespace ID.
3. Obtain the knowledge base ID. Navigate to Data > Knowledge Base. Select the knowledge base that you want to use for external retrieval to obtain its ID.
Configure the plugin in the AI gateway. In the console of the gateway instance, choose Plug-in > Install Plug-in > AI > Al Advanced RAG. Click Install And Configure, and then configure the effective scope. Enter the required parameters obtained in the previous steps (ak, sk, serviceFQDN, workspace_id, and index_id). You can add optional parameters as needed. As shown in the following figures, after you click Enable, the configuration takes effect.
Test and verify the result. In the gateway instance console, click Model API. Select the API affected by the plugin and click Debugging. Verify the model's response with the RAG retrieval capability enabled.