All Products
Search
Document Center

API Gateway:AI retrieval-augmented generation (enhanced)

Last Updated:Dec 03, 2025

This topic describes AI retrieval-augmented generation (enhanced) and how to use it.

Feature description

This feature connects to popular retrieval-augmented generation (RAG) engines to automatically retrieve information before calling a large language model (LLM). This feature supports RAGFlow and Model Studio knowledge bases.

Runtime properties

  • Plugin execution stage: Default stage.

  • Plugin execution priority: 400.

Configuration

Basic configuration

Name

Data type

Required

Default value

Description

rag.rag_engine_type

string

Required

-

The RAG engine type. Supported enumeration types include ragflow and bailian.

result_placement_type

string

Optional

append

Specifies how to place the RAG result. If you set this parameter to replace, the {{higressRagResultReplacementKey}} placeholder in the system prompt template is replaced with the content generated by RAG. If you set this parameter to append, the content generated by RAG is appended to the user template. Supported enumeration types include append and replace.

RAGFlow configuration

If you set rag.rag_engine_type=ragflow, configure the following parameters.

Name

Data type

Required

Default value

Description

rag.ragflow.api_key

string

Required

-

The API key for calling the RAGFlow API. To obtain the API key, go to the RAGFlow console, click your profile picture in the upper-right corner, and then choose API > RAGFlow API.

rag.ragflow.serviceFQDN

string

Required

-

The service name of RAGFlow in the AI gateway.

rag.ragflow.servicePort

string

Required

-

The service port of RAGFlow in the AI gateway.

rag.ragflow.dataset_ids

list[string]

Required

-

The ID of the dataset to retrieve from RAGFlow.

rag.ragflow.serviceHost

string

Optional

-

The domain name used by the AI gateway to access RAGFlow.

rag.ragflow.document_ids

list[string]

Optional

-

The ID of the document to retrieve from RAGFlow.

rag.ragflow.similarity_threshold

float

Optional

0.2

The similarity threshold. Segments with a similarity score below this threshold are filtered out.

rag.ragflow.top_n

integer

Optional

30

The number of segments with the highest similarity scores to return. Other segments are filtered out.

rag.ragflow.vector_similarity_weight

float

Optional

0.3

The weight of the vector cosine similarity. If x represents the vector cosine similarity, (1-x) represents the weight of the semantic similarity.

rag.ragflow.rerank_id

integer

Optional

-

The ID of the rerank model configured in the RAG engine.

For more information about how to use the RAGFlow knowledge base, see Configure knowledge base.

For more information about the retrieval parameters for RAGFlow, see Retrieve chunks.

The following is a basic configuration example.

rag:
  rag_engine_type: "ragflow"
  ragflow:
    api_key: "xxxxxxxx"
    serviceFQDN: "xxxxxxxx"
    servicePort: 80
    dataset_ids:
      - "xxxxxxxx"
    document_ids:
      - "xxxxxxxx"
    similarity_threshold: 0.2
    top_n: 5
    vector_similarity_weight: 0.3
    rerank_id: "gte-rerank"

Model Studio knowledge base configuration

If you set rag.rag_engine_type=bailian, configure the following parameters.

Name

Data type

Required

Default value

Description

rag.bailian.ak

string

Required

-

The access key to call Model Studio. For more information, see Member management.

rag.bailian.sk

string

Required

-

The AccessKey secret for calling Model Studio. For more information, see Member management.

rag.bailian.serviceFQDN

string

Required

-

The service name of the Model Studio service in the AI gateway.

rag.bailian.workspace_id

string

Required

-

The workspace ID of Alibaba Cloud Model Studio. For more information, see Member management.

rag.bailian.index_id

string

Required

-

The knowledge base ID of Alibaba Cloud Model Studio. For more information, see Knowledge Base API Guide.

rag.bailian.servicePort

string

Optional

443

The service port of Model Studio in the AI gateway.

rag.bailian.serviceHost

string

Optional

bailian.cn-beijing.aliyuncs.com

The domain name used by the AI gateway to access Model Studio.

rag.bailian.enable_reranking

bool

Optional

false

Specifies whether to enable reranking.

rag.bailian.rerank_min_score

float

Optional

The similarity threshold configured for the current knowledge base

This parameter takes effect only when reranking is enabled. It specifies the similarity threshold after reranking. Segments with a similarity score below this threshold are filtered out. Valid values: [0.01 - 1.00].

rag.bailian.rerank_top_n

integer

Optional

5

This parameter takes effect only when reranking is enabled. It specifies the number of top_n segments to return after reranking. Valid values: [1 - 20].

rag.bailian.rerank_model

string

Optional

gte-rerank-hybrid

This parameter takes effect only when reranking is enabled. It specifies the rerank model. Supported models include gte-rerank-hybrid and gte-rerank.

rag.bailian.enable_rewrite

bool

Optional

false

Specifies whether to enable session rewriting.

rag.bailian.rewrite_model

string

Optional

conv-rewrite-qwen-1.8b

This parameter takes effect only when session rewriting is enabled. It specifies the name of the session rewriting model. This model automatically adjusts the original user query based on the session context to improve retrieval results. Supported models include conv-rewrite-qwen-1.8b.

rag.bailian.save_retriever_history

bool

Optional

false

Specifies whether to save historical retrieval data.

rag.bailian.dense_similarity_top_k

integer

Optional

100

The number of top K results for vector retrieval. This feature generates a vector for the input text and retrieves the K text segments from the knowledge base that are most similar to the input vector. Valid values: [0 - 100].

The sum of dense_similarity_top_k and sparse_similarity_top_k must be less than or equal to 200.

rag.bailian.sparse_similarity_top_k

integer

Optional

100

The number of top K results for keyword retrieval. This feature finds segments in the knowledge base that exactly match the keywords in the input text. It helps filter out irrelevant text segments and provides more accurate results. Valid values: [0 - 100].

The sum of dense_similarity_top_k and sparse_similarity_top_k must be less than or equal to 200.

For more information about how to use the Model Studio knowledge base, see Instructions for operating and using the Model Studio knowledge base.

For more information about the retrieval parameters for Model Studio, see Retrieve - Retrieve knowledge index.

The following is a basic configuration example.

rag:
  rag_engine_type: bailian
  bailian:
    ak: xxxxxxxx
    sk: xxxxxxxx
    workspace_id: xxxxxxxx
    index_id: xxxxxxxx
    serviceFQDN: xxxxxxxx.dns
    enable_reranking: true
    rerank_min_score: 0.3
    rerank_top_n: 5
    save_retriever_history: false

Procedure

RAGFlow

  1. Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario. Verify that you can access the model for text-based conversations through the Model API.

    image.png

  2. Create a RAGFlow retrieval service using a fixed address as the service source. In the gateway instance console, click Service in the navigation pane on the left, and then click Create Service. Configure the parameters as follows.

    If your RAGFlow instance is deployed in a container within the same VPC, you can also create a service using a container service as the source.
    • Service Source: Select Fixed Address.

    • Service Name: Enter a custom service name, such as ragflow.

    • Service Address: Enter the address in the IP:Port format. Set the port to 80.

    • TLS Mode: Keep the default setting, which is disabled.

  3. Obtain the FQDN and port of the RAGFlow service.

    Click the RAGFlow service that you created in the previous step to view its details and obtain its FQDN. The default port for RAGFlow is 80. You can use this port or configure a different one as required.

  4. Obtain the required information for the RAGFlow service.

    1. Obtain the API key. Go to the RAGFlow console. In the upper-right corner, click your profile picture. In the navigation pane on the left, select API, and then click API KEY to obtain the API key.

      image.png

    2. Obtain the Dataset ID. Go to the knowledge base page in the RAGFlow console. Click the knowledge base that you want to retrieve. The Dataset ID is the 'id' value in the page URL.

      image.png

    3. (Optional) Obtain the Document ID. Go to the knowledge base page in the RAGFlow console. Click the knowledge base that you want to retrieve, and then click the name of the document that you want to retrieve. The Document ID is the 'doc_id' value in the page URL.

      image.png

  5. Configure the plugin in the AI gateway. In the gateway instance console, choose Plug-in > Install Plug-in > AI > Al Advanced RAG. Click Install and Configure and then set the effective scope. Enter the required parameters obtained from the previous steps (api_key, serviceFQDN, dataset_ids, and servicePort). You can add optional parameters as needed. The configuration takes effect after you click Enable.

    image.png

  6. Test and verify the result. In the gateway instance console, click Model API. Select the API that the plugin applies to and click Debugging. Verify the model's response now that the RAG retrieval capability is enabled.

Model Studio knowledge base

  1. Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario. Verify that you can access the model for text-based conversations through the Model API.

    image.png

  2. Create a Model Studio retrieval service with a DNS domain name as the service source. In the console of the gateway instance, navigate to Service > Create Service. Configure the parameters as follows:

    • Service Source: Select DNS Domain Name.

    • Service Name: Enter a custom service name, such as bailian-rag.

    • Service Address: Enter the address in the DNS Domain Name:Port format. Set the port to 443.

    • TLS Mode: Select One-way TLS.

  3. Obtain the FQDN of the Model Studio retrieval service. Click the Model Studio retrieval service that you created in the previous step to view its details and obtain its FQDN.

    image.png

  4. Obtain the information for the Model Studio knowledge base that you want to access.

    1. Obtain the AccessKey and AccessKey secret. Log on to the Alibaba Cloud RAM console. For more information, see Create an AccessKey.

      Note

      For data security, we recommend that you create a RAM user and use that user's AccessKey and AccessKey secret. Ensure that the RAM user:

    2. Obtain the knowledge base namespace ID. Go to the Model Studio application page. In the lower-left corner, click your account to view the space details and obtain the workspace ID. This ID serves as the namespace ID.

    3. Obtain the knowledge base ID. Navigate to Data > Knowledge Base. Select the knowledge base that you want to use for external retrieval to obtain its ID.

  5. Configure the plugin in the AI gateway. In the console of the gateway instance, choose Plug-in > Install Plug-in > AI > Al Advanced RAG. Click Install And Configure, and then configure the effective scope. Enter the required parameters obtained in the previous steps (ak, sk, serviceFQDN, workspace_id, and index_id). You can add optional parameters as needed. As shown in the following figures, after you click Enable, the configuration takes effect.

    image.png

    image.png

  6. Test and verify the result. In the gateway instance console, click Model API. Select the API affected by the plugin and click Debugging. Verify the model's response with the RAG retrieval capability enabled.