When your application needs to call multiple LLM providers, each with its own API format, authentication method, and model naming convention, integration complexity grows with every provider you add. The ai-proxy plug-in solves this by exposing a single OpenAI-compatible interface on your gateway route. The plug-in supports providers such as OpenAI, Azure OpenAI, Moonshot, and Qwen. You configure the target provider, and the plug-in handles protocol translation, authentication, and model name mapping automatically.
Enable this plug-in only on routes that handle AI traffic. Non-conforming requests receive an HTTP 404 response.
Requests to paths ending in
/v1/chat/completionsare parsed as OpenAI chat completions (text-to-text) and converted to the target provider's format.Requests to paths ending in
/v1/embeddingsare parsed as OpenAI embeddings (text vectorization) and converted to the target provider's format.
How it works
Client request ai-proxy plug-in LLM provider
───────────── ────────────────── ──────────────
POST /v1/chat/completions
model: "gpt-4" ──────> 1. Match path to protocol
2. Look up modelMapping
"gpt-4" → "qwen-max"
3. Select apiToken
4. Convert to provider API ──> Qwen API
<────────────────────────────────── ResponseQuick start
A provider type and an API token are all you need for the simplest configuration:
provider:
type: qwen
apiTokens:
- "<your-api-token>"All requests on this route are forwarded to Qwen. Clients send standard OpenAI-format requests, and the plug-in translates them to the Qwen API automatically.
To map client-side model names to provider-specific models, add modelMapping:
provider:
type: qwen
apiTokens:
- "<your-api-token>"
modelMapping:
"gpt-4-turbo": "qwen-max"
"gpt-3": "qwen-turbo"
"*": "qwen-turbo"Running attributes
| Attribute | Value |
|---|---|
| Execution stage | default stage |
| Execution priority | 100 |
Configuration reference
Provider
The provider object is the top-level configuration item. All fields below are nested under provider.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
type | string | Yes | - | Provider identifier. See Supported providers. |
apiTokens | array of string | No | - | API tokens for authentication. When multiple tokens are configured, the plug-in selects one at random per request. Some providers support only one token. |
timeout | number | No | 120000 | Request timeout in milliseconds (120,000 ms = 2 minutes). |
modelMapping | map of string | No | - | Maps model names in client requests to provider-specific model names. See Model mapping. |
protocol | string | No | openai | API protocol. Valid values: openai (OpenAI-compatible, default) and original (provider's native protocol). |
context | object | No | - | External context file for AI conversations. See Context. |
customSettings | array of object | No | - | Override or inject request parameters. See Custom settings. |
Model mapping
The modelMapping field maps model names from client requests to provider-specific names. Three matching modes are supported:
| Mode | Example key | Behavior |
|---|---|---|
| Exact match | "gpt-4" | Matches the model name gpt-4 exactly. |
| Prefix match | "gpt-3-*" | Matches all models whose names start with gpt-3-. |
| Wildcard fallback | "*" or "" | Catches all models not matched by other rules. You can use either "*" or "" (empty double quotation marks) as the key to configure a general mapping. |
If the mapped value is an empty string (""), the original model name from the request is preserved.
Example: Map OpenAI model names to Qwen models.
modelMapping:
"gpt-3": "qwen-turbo"
"gpt-35-turbo": "qwen-plus"
"gpt-4-turbo": "qwen-max"
"gpt-4-*": "qwen-max"
"*": "qwen-turbo"Context
Load an external plaintext file as AI conversation context. This object is nested under provider.
| Field | Type | Required | Description |
|---|---|---|---|
fileUrl | string | Yes | URL of the plaintext context file. |
serviceName | string | Yes | Full name of the Higress backend service that hosts the file. |
servicePort | number | Yes | Port of the Higress backend service. |
Custom settings
Override or inject parameters in AI requests. Each entry in the customSettings array has the following fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Parameter name, such as max_tokens. |
value | string, number, float, or boolean | Yes | - | Parameter value. |
mode | string | No | auto | auto: rewrites the parameter name to match the target provider's protocol. raw: uses the parameter name as-is with no validation. |
overwrite | boolean | No | true | true: always overwrite the parameter. false: set the parameter only if the client did not include it. |
Parameter name rewriting (auto mode)
In auto mode, the plug-in rewrites parameter names to match each provider's API. The following table shows how standard parameter names map to each provider's protocol. none means the provider does not support the parameter.
| Setting name | openai | baidu | spark | qwen | gemini | hunyuan | claude | minimax |
|---|---|---|---|---|---|---|---|---|
max_tokens | max_tokens | max_output_tokens | max_tokens | max_tokens | maxOutputTokens | none | max_tokens | tokens_to_generate |
temperature | temperature | temperature | temperature | temperature | temperature | Temperature | temperature | temperature |
top_p | top_p | top_p | none | top_p | topP | TopP | top_p | top_p |
top_k | none | none | top_k | none | topK | none | top_k | none |
seed | seed | none | none | seed | none | none | none | none |
If a parameter name is not in this table and mode is auto, the setting has no effect. Use raw mode to pass arbitrary parameters.
Parameter injection paths (raw mode)
In raw mode, the name and value are injected directly into the request JSON. The injection path depends on the provider:
| Provider | Injection path |
|---|---|
| Most providers | Root of the JSON body |
| Qwen | parameters subpath |
| Gemini | generation_config subpath |
Supported providers
Providers that require only type and apiTokens
| Provider | type value |
|---|---|
| OpenAI | openai |
| Baichuan AI | baichuan |
| Yi | yi |
| Zhipu AI | zhipuai |
| DeepSeek | deepseek |
| Groq | groq |
| Baidu | baidu |
| 360 Brain | ai360 |
| Mistral | mistral |
| Stepfun | stepfun |
| Cohere | cohere |
Providers with additional fields
OpenAI (custom endpoint)
Set type to openai.
| Field | Type | Required | Description |
|---|---|---|---|
openaiCustomUrl | string | No | Custom backend URL for OpenAI-compatible services. Example: www.example.com/myai/v1/chat/completions. |
responseJsonSchema | object | No | Predefined JSON schema for structured responses. Only specific models support this field. |
Azure OpenAI
Set type to azure.
| Field | Type | Required | Description |
|---|---|---|---|
azureServiceUrl | string | Yes | Azure OpenAI service URL. Must include the api-version query parameter. |
Azure OpenAI supports only one API token.
Moonshot
Set type to moonshot.
| Field | Type | Required | Description |
|---|---|---|---|
moonshotFileId | string | No | ID of a file uploaded through the Moonshot file API. The file content serves as conversation context. Cannot be used together with the context field. |
Qwen
Set type to qwen.
| Field | Type | Required | Description |
|---|---|---|---|
qwenEnableSearch | boolean | No | Enable Qwen's built-in internet search. |
qwenFileIds | array of string | No | IDs of files uploaded to DashScope through the file API. File content serves as conversation context. Cannot be used together with the context field. |
MiniMax
Set type to minimax.
| Field | Type | Required | Description |
|---|---|---|---|
minimaxGroupId | string | Required for abab6.5-chat, abab6.5s-chat, abab5.5s-chat, and abab5.5-chat models | Group ID for ChatCompletion Pro. |
Anthropic Claude
Set type to claude.
| Field | Type | Required | Description |
|---|---|---|---|
claudeVersion | string | No | Anthropic Claude API version. Default: 2023-06-01. |
Ollama
Set type to ollama.
| Field | Type | Required | Description |
|---|---|---|---|
ollamaServerHost | string | Yes | Host IP address of the Ollama server. |
ollamaServerPort | number | Yes | Port of the Ollama server. Default: 11434. |
Hunyuan
Set type to hunyuan.
| Field | Type | Required | Description |
|---|---|---|---|
hunyuanAuthId | string | Yes | Hunyuan ID for v3 authentication. |
hunyuanAuthKey | string | Yes | Hunyuan key for v3 authentication. |
Cloudflare Workers AI
Set type to cloudflare.
| Field | Type | Required | Description |
|---|---|---|---|
cloudflareAccountId | string | Yes | Cloudflare account ID. For details, see Cloudflare account ID. |
Spark (iFLYTEK)
Set type to spark.
No provider-specific fields are required. However, the apiTokens value must use the format APIKey:APISecret (colon-separated).
Gemini
Set type to gemini.
| Field | Type | Required | Description |
|---|---|---|---|
geminiSafetySetting | map of string | No | Content filtering and safety settings. For details, see Safety settings. |
DeepL
Set type to deepl.
| Field | Type | Required | Description |
|---|---|---|---|
targetLang | string | Yes | Target language code for translation (for example, ZH). |
Configuration examples
Azure OpenAI
Route requests to Azure OpenAI with a single deployment:
provider:
type: azure
apiTokens:
- "<your-azure-openai-api-token>"
azureServiceUrl: "https://<your-resource-name>.openai.azure.com/openai/deployments/<your-deployment-name>/chat/completions?api-version=2024-02-15-preview"Qwen with model mapping
Map OpenAI model names to Qwen equivalents:
provider:
type: qwen
apiTokens:
- "<your-qwen-api-token>"
modelMapping:
"gpt-3": "qwen-turbo"
"gpt-35-turbo": "qwen-plus"
"gpt-4-turbo": "qwen-max"
"gpt-4-*": "qwen-max"
"gpt-4o": "qwen-vl-plus"
"text-embedding-v1": "text-embedding-v1"
"*": "qwen-turbo"Alibaba Cloud Model Studio (native protocol)
Use the Qwen provider type with the original DashScope protocol instead of OpenAI-compatible:
provider:
type: qwen
apiTokens:
- "<your-dashscope-api-token>"
protocol: originalDoubao with extended timeout
Route requests to Doubao (ByteDance). Use modelMapping to point to your Doubao endpoint, and set a longer timeout for large model responses:
provider:
type: doubao
apiTokens:
- "<your-doubao-api-key>"
modelMapping:
"*": "<your-doubao-endpoint>"
timeout: 1200000Moonshot with file context
Upload a file to Moonshot, then reference it as conversation context:
provider:
type: moonshot
apiTokens:
- "<your-moonshot-api-token>"
moonshotFileId: "<your-moonshot-file-id>"
modelMapping:
"*": "moonshot-v1-32k"Groq
provider:
type: groq
apiTokens:
- "<your-groq-api-token>"Anthropic Claude
provider:
type: claude
apiTokens:
- "<your-claude-api-token>"
claudeVersion: "2023-06-01"Hunyuan
provider:
type: hunyuan
hunyuanAuthKey: "<your-auth-key>"
apiTokens:
- ""
hunyuanAuthId: "<your-auth-id>"
timeout: 1200000
modelMapping:
"*": "hunyuan-lite"Baidu
provider:
type: baidu
apiTokens:
- "<your-baidu-api-token>"
modelMapping:
"gpt-3": "ERNIE-4.0"
"*": "ERNIE-4.0"MiniMax
provider:
type: minimax
apiTokens:
- "<your-minimax-api-token>"
modelMapping:
"gpt-3": "abab6.5g-chat"
"gpt-4": "abab6.5-chat"
"*": "abab6.5g-chat"
minimaxGroupId: "<your-minimax-group-id>"360 Brain
provider:
type: ai360
apiTokens:
- "<your-ai360-api-token>"
modelMapping:
"gpt-4o": "360gpt-turbo-responsibility-8k"
"gpt-4": "360gpt2-pro"
"gpt-3.5": "360gpt-turbo"
"text-embedding-3-small": "embedding_s1_v1.2"
"*": "360gpt-pro"Cloudflare Workers AI
provider:
type: cloudflare
apiTokens:
- "<your-workers-ai-api-token>"
cloudflareAccountId: "<your-cloudflare-account-id>"
modelMapping:
"*": "@cf/meta/llama-3-8b-instruct"Spark (iFLYTEK)
The apiTokens value uses the format APIKey:APISecret:
provider:
type: spark
apiTokens:
- "<your-api-key>:<your-api-secret>"
modelMapping:
"gpt-4o": "generalv3.5"
"gpt-4": "generalv3"
"*": "general"Gemini
provider:
type: gemini
apiTokens:
- "<your-gemini-api-token>"
modelMapping:
"*": "gemini-pro"
geminiSafetySetting:
"HARM_CATEGORY_SEXUALLY_EXPLICIT": "BLOCK_NONE"
"HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE"
"HARM_CATEGORY_HARASSMENT": "BLOCK_NONE"
"HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_NONE"DeepL translation
provider:
type: deepl
apiTokens:
- "<your-deepl-api-token>"
targetLang: "ZH"Request format: Set model to Free or Pro to select the DeepL service tier. Each message in the messages array contains text to translate. A message with role: system provides translation context to improve accuracy but is not itself translated.
{
"model": "Free",
"messages": [
{
"role": "system",
"content": "money"
},
{
"content": "sit by the bank"
},
{
"content": "a bank in China"
}
]
}Response:
{
"choices": [
{
"index": 0
},
{
"index": 1
}
],
"created": 1722747752,
"model": "Free",
"object": "chat.completion",
"usage": {}
}