The ai-rag plug-in adds Retrieval Augmented Generation (RAG) to a cloud-native gateway by connecting Alibaba Cloud DashVector with Qwen. Instead of relying solely on a large language model's (LLM) training data, the plug-in retrieves relevant documents from a vector database at request time and injects them into the prompt -- producing responses grounded in your actual data.
By handling RAG at the gateway level, you centralize retrieval logic in your infrastructure rather than embedding it in each application. This simplifies your RAG workflow, limits vector database access to the gateway, and lets platform teams manage configurations without requiring changes to application code.
How it works
The RAG workflow has two phases: data preparation (offline) and retrieval + generation (per request).
Data preparation (offline)
Extract text from your source documents (PDFs, articles, knowledge base entries).
Split the text into semantically meaningful chunks.
Convert each chunk into a vector embedding and store it in an Alibaba Cloud DashVector collection.
Retrieval and generation (per request)
A user sends a query to the cloud-native gateway.
The
ai-ragplug-in converts the query into a vector embedding.It searches Alibaba Cloud DashVector for the top-k most relevant document chunks, filtered by a vector distance threshold.
It injects the retrieved chunks into the prompt and forwards the enriched prompt to Qwen.
Qwen generates a response grounded in both the retrieved context and its own knowledge.

Runtime attributes
| Attribute | Value |
|---|---|
| Execution stage | default stage |
| Execution priority | 400 |
Configuration
Configure connections to two backend services: Qwen (the LLM) and Alibaba Cloud DashVector (the vector database). All parameters are required.
Qwen (LLM) parameters
| Parameter | Type | Description |
|---|---|---|
dashscope.apiKey | string | API key for authenticating with Qwen. |
dashscope.serviceFQDN | string | Fully qualified domain name of the Qwen service within the gateway's service registry. |
dashscope.servicePort | int | Port of the Qwen service. |
dashscope.serviceHost | string | Domain name for accessing Qwen (used in the HTTP Host header). |
DashVector (vector database) parameters
| Parameter | Type | Description |
|---|---|---|
dashvector.apiKey | string | API key for authenticating with Alibaba Cloud DashVector. |
dashvector.serviceFQDN | string | Fully qualified domain name of the DashVector service within the gateway's service registry. |
dashvector.servicePort | int | Port of the DashVector service. |
dashvector.serviceHost | string | Domain name for accessing Alibaba Cloud DashVector (used in the HTTP Host header). |
dashvector.collection | string | Name of the DashVector collection to search. |
dashvector.topk | int | Number of top matching document chunks to retrieve during vector search. A higher value provides more context but increases latency. |
dashvector.threshold | float | Maximum vector distance allowed. Documents with a vector distance above this value are filtered out. Lower values return more precise matches; higher values return broader results. |
dashvector.field | string | Field name in the DashVector collection that stores the document text. |
When the plug-in is enabled and Tracing Analysis is active, the document ID retrieved by the ai-rag plug-in is added to the attribute field in each span. Use this to trace which documents influenced a specific response.
Example
The following YAML shows a complete ai-rag plug-in configuration:
dashscope:
apiKey: <your-dashscope-api-key>
serviceFQDN: dashscope
servicePort: 443
serviceHost: dashscope.aliyuncs.com
dashvector:
apiKey: <your-dashvector-api-key>
serviceFQDN: dashvector
servicePort: 443
serviceHost: <your-dashvector-endpoint>
collection: <your-collection-name>
topk: 1
threshold: 0.4
field: rawReplace the following placeholders with your actual values:
| Placeholder | Description | Example |
|---|---|---|
<your-dashscope-api-key> | DashScope API key from the Alibaba Cloud console | sk-xxxxxxxxxxxxxxxx |
<your-dashvector-api-key> | DashVector API key from the Alibaba Cloud console | sk-xxxxxxxxxxxxxxxx |
<your-dashvector-endpoint> | Endpoint of your DashVector cluster | vrs-cn-xxxxxxx.dashvector.cn-hangzhou.aliyuncs.com |
<your-collection-name> | Name of the target collection in DashVector | my_knowledge_base |
RAG in action: before and after
The following example uses the CEC-Corpus dataset -- 332 labeled emergency news reports, vectorized and stored in a DashVector collection.
Query: "Where did the Hainan rear-end collision occur? What is the reason? What are the casualties?"
Without ai-rag -- the LLM lacks external context and cannot answer:
I'm sorry. As an AI model, I cannot obtain and update specific information about news events in real time, including details such as locations, causes, and casualties. For such specific events, you are advised to view the latest news reports or official announcements for accurate information. You can also visit mainstream media websites, use news applications, or follow the announcements of relevant government departments to obtain the latest news.
With ai-rag -- the LLM retrieves relevant documents and returns a grounded response:
The rear-end collision in Hainan occurred at 37 kilometers from Wenchang to Haikou on the Haiwen Expressway. Regarding the specific cause of the accident, the traffic police department was still under investigation at the time. The casualties were 1 person died (the driver died on the spot) and 8 persons injured (including 2 children and 6 adults). All injured persons were rescued and sent to the hospital for treatment.
What's next
Explore other AI-related plug-ins available for cloud-native gateways.