All Products
Search
Document Center

API Gateway:AI Advanced RAG

Last Updated:Jun 21, 2026

This topic describes the AI Advanced RAG feature and how to use it.

Features

This feature connects to popular RAG engines to automatically perform retrieval-augmented generation (RAG) before calling a large language model (LLM). It supports RAGFlow and the Model Studio knowledge base.

Runtime properties

  • Plugin execution stage: Default stage.

  • Plugin execution priority: 400.

Configuration

Basic configuration

Parameter

Type

Required

Default

Description

rag.rag_engine_type

string

Required

-

The type of RAG engine. Valid values: ragflow and bailian.

result_placement_type

string

Optional

append

This parameter specifies how the RAG result is placed. replace: Replaces the{{higressRagResultReplacementKey}} placeholder in the system prompt template with the RAG content. append: Appends the RAG content to the user template.

RAGFlow configuration

If you set rag.rag_engine_type=ragflow, configure the following parameters.

Parameter

Type

Required

Default

Description

rag.ragflow.api_key

string

Required

-

The API key for calling the RAGFlow API. To obtain the API key, go to the RAGFlow console, click your profile picture in the upper-right corner, and then choose API > RAGFlow API.

rag.ragflow.serviceFQDN

string

Required

-

The service name of RAGFlow in the AI gateway.

rag.ragflow.servicePort

string

Required

-

The service port of RAGFlow in the AI gateway.

rag.ragflow.dataset_ids

list[string]

Required

-

The dataset ID to retrieve from RAGFlow.

rag.ragflow.serviceHost

string

Optional

-

The domain name used by the AI gateway to access RAGFlow.

rag.ragflow.document_ids

list[string]

Optional

-

The document ID to retrieve from RAGFlow.

rag.ragflow.similarity_threshold

float

Optional

0.2

The similarity threshold. Segments with a similarity score below this threshold are filtered out.

rag.ragflow.top_n

integer

Optional

30

The maximum number of segments to return, ranked by similarity score. Segments with lower scores are filtered out.

rag.ragflow.vector_similarity_weight

float

Optional

0.3

The weight of vector cosine similarity. If x represents the vector cosine similarity, (1-x) represents the semantic similarity weight.

rag.ragflow.rerank_id

integer

Optional

-

The ID of the rerank model configured in the RAG engine.

For more information about how to use the RAGFlow knowledge base, see Configure knowledge base.

For more information about the retrieval parameters for RAGFlow, see Retrieve chunks.

The following is a basic configuration example.

rag:
  rag_engine_type: "ragflow"
  ragflow:
    api_key: "xxxxxxxx"
    serviceFQDN: "xxxxxxxx"
    servicePort: 80
    dataset_ids:
      - "xxxxxxxx"
    document_ids:
      - "xxxxxxxx"
    similarity_threshold: 0.2
    top_n: 5
    vector_similarity_weight: 0.3
    rerank_id: "gte-rerank"

Model Studio knowledge base configuration

If you set rag.rag_engine_type=bailian, configure the following parameters.

Parameter

Type

Required

Default value

Description

rag.bailian.ak

string

Required

-

The AccessKey for calling Model Studio. To obtain this value, see Member management.

rag.bailian.sk

string

Required

-

The AccessKey secret for calling Model Studio. To obtain this value, see Member management.

rag.bailian.serviceFQDN

string

Required

-

The service name of the Model Studio service in the AI gateway.

rag.bailian.workspace_id

string

Required

-

The ID of the Alibaba Cloud Model Studio workspace. To obtain this value, see Member management.

rag.bailian.index_id

string

Required

-

The ID of the Alibaba Cloud Model Studio knowledge base. To obtain this value, see Knowledge Base API Guide.

rag.bailian.servicePort

string

Optional

443

The service port of Model Studio in the AI gateway.

rag.bailian.serviceHost

string

Optional

bailian.cn-beijing.aliyuncs.com

The domain name used by the AI gateway to access Model Studio.

rag.bailian.enable_reranking

bool

Optional

false

Specifies whether to enable rerank.

rag.bailian.rerank_min_score

float

Optional

The similarity threshold configured for the current knowledge base.

This setting takes effect only when Rerank is enabled. It specifies the similarity threshold after reranking. Segments with a similarity score below this threshold are filtered out. The value range is [0.01 - 1.00].

rag.bailian.rerank_top_n

integer

Optional

5

This parameter takes effect only when Rerank is enabled. It specifies the number of top segments to return after reranking, with a value range of [1 - 20].

rag.bailian.rerank_model

string

Optional

gte-rerank-hybrid

If rerank is enabled, this parameter specifies the rerank model. Supported models include gte-rerank-hybrid and gte-rerank.

rag.bailian.enable_rewrite

bool

Optional

false

Specifies whether to enable session rewriting.

rag.bailian.rewrite_model

string

Optional

conv-rewrite-qwen-1.8b

If session rewriting is enabled, this parameter specifies the session rewriting model. This model automatically adjusts the original user query based on the conversation history to improve retrieval results. The supported model is conv-rewrite-qwen-1.8b.

rag.bailian.save_retriever_history

bool

Optional

false

Specifies whether to save historical retrieval data.

rag.bailian.dense_similarity_top_k

integer

Optional

100

Vector retrieval top-K generates a vector for the input text and retrieves from the knowledge base the K text chunks that are most similar to that vector. Value range: [0 - 100].

The sum of dense_similarity_top_k and sparse_similarity_top_k must be less than or equal to 200.

rag.bailian.sparse_similarity_top_k

integer

Optional

100

Keyword retrieval top-K finds slices in the knowledge base that exactly match the keywords of the input text. It helps you filter out irrelevant text slices and provide more accurate results. The value range is [0 - 100].

The sum of dense_similarity_top_k and sparse_similarity_top_k must be less than or equal to 200.

For more information about how to use the Model Studio knowledge base, see Instructions for operating and using the Model Studio knowledge base.

For more information about the retrieval parameters for Model Studio, see Retrieve from a knowledge base.

The following is a basic configuration example.

rag:
  rag_engine_type: bailian
  bailian:
    ak: xxxxxxxx
    sk: xxxxxxxx
    workspace_id: xxxxxxxx
    index_id: xxxxxxxx
    serviceFQDN: xxxxxxxx.dns
    enable_reranking: true
    rerank_min_score: 0.3
    rerank_top_n: 5
    save_retriever_history: false

Procedure

RAGFlow

  1. Ensure the API can be used for text-based conversations with the model. Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario.

    After you create the Model API, the API named bailian-llm appears in the Model API list. Its type is text generation, its domain name is *.example.com (HTTP), and its model service is bailian.ai. You can Edit, Debug, or Delete this API from the Actions column.

  2. Create a RAGFlow retrieval service with **Fixed Address** as the service source. In the gateway instance console, navigate to Service > Create Service.

    If your RAGFlow instance is deployed in a container within the same VPC, you can also create a service by using Container Service as the service source.
    • Service Source: Select **Fixed Address**.

    • Service Name: Enter a custom name, such as ragflow.

    • Service URL: The format is IP:Port. Set the port to 80.

    • TLS Mode: Keep it disabled (default).

  3. Obtain the Ragflow service FQDN and port.

    Click the RAGFlow service that you created in the previous step to view its details and obtain its FQDN. The default port for RAGFlow is 80. You can use this port or configure a different one as required.

  4. Obtain the required information for the RAGFlow service.

    1. Obtain the API key. Go to the RAGFlow console. In the upper-right corner, click your profile picture, and then in the left-side navigation pane, select API > API KEY to get the API key.

    2. Obtain the Dataset ID. Go to the Knowledge Base page in the RAGFlow console. Click the knowledge base that you want to retrieve. The Dataset ID is the id value in the page URL.

    3. (Optional) Obtain the Document ID. Go to the Knowledge Base page in the RAGFlow console. Click the knowledge base to retrieve, and then click the document name. The Document ID is the doc_id value in the page URL.

  5. Configure the plugin in the AI gateway. In the gateway instance console, navigate to Plug-in > Install Plug-in > AI > AI Advanced RAG, and click Install and Configure. Configure the plugin and its effective scope. Enter the required parameters (api_key, serviceFQDN, dataset_ids, and servicePort). Add optional parameters as needed. Enable the plugin to apply the configuration.

    The configuration is in YAML format. Set the top-level field rag_engine_type to ragflow and nest the other parameters under the ragflow node. Optional parameters include document_ids, similarity_threshold (example: 0.2), top_n (example: 5), vector_similarity_weight (example: 0.3), and rerank_id (example: gte-rerank). For Effective Scope, select Instance-level. Enable the plugin and click Save.

  6. Debug and verify the result. In the gateway instance console, click Model API, select the target API, and click Debug. Verify the model's response to confirm that the RAG retrieval capability is active.

Model Studio knowledge base

  1. Ensure the API can be used for text-based conversations with the model. Create a Model API. In the AI gateway, create an AI service and a Model API for the text generation scenario.

    After you create the Model API, the API named bailian-llm appears in the Model API list. Its type is text generation, its domain name is *.example.com (HTTP), and its model service is bailian.ai. You can Edit, Debug, or Delete this API from the Actions column.

  2. Create a Model Studio retrieval service with **DNS Domain Name** as the service source. In the gateway instance console, navigate to Service > Create Service and configure the form.

    • Service Source: Select **DNS Domain Name**.

    • Service Name: Enter a custom name, such as bailian-rag.

    • Service URL: The format is DNS domain name:Port, where the port is set to 443.

    • TLS Mode: Select **One-way TLS**.

  3. Click the retrieval service you just created to find its FQDN.

    On the service details page, go to the Overview > Basic Information section and find the value of the FQDN field, for example, bailian-rag.dns.

  4. Obtain the required information for the Model Studio knowledge base.

    1. Obtain the AccessKey and AccessKey secret. Log on to the Alibaba Cloud RAM console and create an AccessKey. For more information, see Create an AccessKey.

      Note

      For data security, we strongly recommend that you create a RAM user and use that user's AccessKey and AccessKey secret. Make sure that the RAM user meets the following requirements:

      • The RAM user must have the AliyunBailianDataFullAccess or sfm:Retrieve permission. For more information about how to grant permissions, see Manage RAM user permissions.

      • The RAM user must be added to the Model Studio workspace. For more information, see Member management.

    2. Obtain the workspace ID. Go to the Model Studio application page. In the lower-left corner, click your account to view the workspace details and get the workspace ID.

    3. Obtain the knowledge base ID. Navigate to **Data** > **Knowledge Base**. Select the target knowledge base and note its ID.

  5. Configure the plugin in the AI gateway. In the gateway instance console, navigate to Plug-in > Install Plug-in > AI > AI Advanced RAG, and click Install and Configure. Configure the plugin and its effective scope. Enter the required parameters (ak, sk, serviceFQDN, workspace_id, and index_id). You can add optional parameters as needed. Enable the plugin to apply the configuration.

    For the Effective Scope, select Model API-level plugin rules. In the plugin rule YAML editor, under the rag node, set rag_engine_type to bailian. In the bailian sub-node, enter the required parameters. For example, set serviceFQDN to bailian-rag.dns. In the API selection panel above the editor, select the target API to bind.

    After the configuration is complete, a rule entry associated with bailian-llm appears in the Model API-level plugin rules list. The rule is not enabled by default.

  6. Debug and verify the result. In the gateway instance console, click Model API, select the target API, and click Debug. Verify the model's response to confirm that the RAG retrieval capability is active.