This topic describes the AI RAG plug-in.
Description
Knowledge of large language models (LLMs) is restricted by the data used to train the LLMs. After the training of an LLM is complete, the LLM cannot acquire or learn new information. In addition, information about specific fields may be missing or specific topics are not deeply covered even if LLMs are trained based on massive data. As a result, the query results in the fields may be inaccurate or lack depth. The Retrieval-Augmented Generation (RAG) technique can use retrieval systems to find relevant information from large-scale databases, and then provide the information to the text generation model to help generate more accurate, richer, and more realistic text.
Higress can connect to Alibaba Cloud DashVector to quickly implement the RAG feature. The following figure shows the process of implementing the RAG feature.
Configuration description
Name | Data type | Required | Default value | Description |
| STRING | Yes | - | The token that is used for authentication when a gateway accesses Qwen. |
| STRING | Yes | - | The service name of Qwen. |
| INT | Yes | - | The service port of Qwen. |
| STRING | Yes | - | The domain name that is used to access Qwen. |
| STRING | Yes | - | The token that is used for authentication when a gateway accesses Alibaba Cloud DashVector. |
| STRING | Yes | - | The service name of Alibaba Cloud DashVector. |
| INT | Yes | - | The service port of Alibaba Cloud DashVector. |
| STRING | Yes | - | The domain name that is used to access Alibaba Cloud DashVector. |
Example
The CEC-Corpus dataset contains the corpus and labeled data of 332 news reports about emergencies. The original press release text is extracted, vectorized, and then added to Alibaba Cloud DashVector.