All Products
Search
Document Center

Microservices Engine:AI proxy

Last Updated:Mar 11, 2026

When your application needs to call multiple LLM providers, each with its own API format, authentication method, and model naming convention, integration complexity grows with every provider you add. The ai-proxy plug-in solves this by exposing a single OpenAI-compatible interface on your gateway route. The plug-in supports providers such as OpenAI, Azure OpenAI, Moonshot, and Qwen. You configure the target provider, and the plug-in handles protocol translation, authentication, and model name mapping automatically.

Important
  • Enable this plug-in only on routes that handle AI traffic. Non-conforming requests receive an HTTP 404 response.

  • Requests to paths ending in /v1/chat/completions are parsed as OpenAI chat completions (text-to-text) and converted to the target provider's format.

  • Requests to paths ending in /v1/embeddings are parsed as OpenAI embeddings (text vectorization) and converted to the target provider's format.

How it works

Client request            ai-proxy plug-in                  LLM provider
─────────────           ──────────────────                ──────────────
POST /v1/chat/completions
  model: "gpt-4"  ──────>  1. Match path to protocol
                            2. Look up modelMapping
                               "gpt-4" → "qwen-max"
                            3. Select apiToken
                            4. Convert to provider API  ──>  Qwen API
                        <──────────────────────────────────  Response

Quick start

A provider type and an API token are all you need for the simplest configuration:

provider:
  type: qwen
  apiTokens:
    - "<your-api-token>"

All requests on this route are forwarded to Qwen. Clients send standard OpenAI-format requests, and the plug-in translates them to the Qwen API automatically.

To map client-side model names to provider-specific models, add modelMapping:

provider:
  type: qwen
  apiTokens:
    - "<your-api-token>"
  modelMapping:
    "gpt-4-turbo": "qwen-max"
    "gpt-3": "qwen-turbo"
    "*": "qwen-turbo"

Running attributes

AttributeValue
Execution stagedefault stage
Execution priority100

Configuration reference

Provider

The provider object is the top-level configuration item. All fields below are nested under provider.

FieldTypeRequiredDefaultDescription
typestringYes-Provider identifier. See Supported providers.
apiTokensarray of stringNo-API tokens for authentication. When multiple tokens are configured, the plug-in selects one at random per request. Some providers support only one token.
timeoutnumberNo120000Request timeout in milliseconds (120,000 ms = 2 minutes).
modelMappingmap of stringNo-Maps model names in client requests to provider-specific model names. See Model mapping.
protocolstringNoopenaiAPI protocol. Valid values: openai (OpenAI-compatible, default) and original (provider's native protocol).
contextobjectNo-External context file for AI conversations. See Context.
customSettingsarray of objectNo-Override or inject request parameters. See Custom settings.

Model mapping

The modelMapping field maps model names from client requests to provider-specific names. Three matching modes are supported:

ModeExample keyBehavior
Exact match"gpt-4"Matches the model name gpt-4 exactly.
Prefix match"gpt-3-*"Matches all models whose names start with gpt-3-.
Wildcard fallback"*" or ""Catches all models not matched by other rules. You can use either "*" or "" (empty double quotation marks) as the key to configure a general mapping.

If the mapped value is an empty string (""), the original model name from the request is preserved.

Example: Map OpenAI model names to Qwen models.

modelMapping:
  "gpt-3": "qwen-turbo"
  "gpt-35-turbo": "qwen-plus"
  "gpt-4-turbo": "qwen-max"
  "gpt-4-*": "qwen-max"
  "*": "qwen-turbo"

Context

Load an external plaintext file as AI conversation context. This object is nested under provider.

FieldTypeRequiredDescription
fileUrlstringYesURL of the plaintext context file.
serviceNamestringYesFull name of the Higress backend service that hosts the file.
servicePortnumberYesPort of the Higress backend service.

Custom settings

Override or inject parameters in AI requests. Each entry in the customSettings array has the following fields:

FieldTypeRequiredDefaultDescription
namestringYes-Parameter name, such as max_tokens.
valuestring, number, float, or booleanYes-Parameter value.
modestringNoautoauto: rewrites the parameter name to match the target provider's protocol. raw: uses the parameter name as-is with no validation.
overwritebooleanNotruetrue: always overwrite the parameter. false: set the parameter only if the client did not include it.

Parameter name rewriting (auto mode)

In auto mode, the plug-in rewrites parameter names to match each provider's API. The following table shows how standard parameter names map to each provider's protocol. none means the provider does not support the parameter.

Setting nameopenaibaidusparkqwengeminihunyuanclaudeminimax
max_tokensmax_tokensmax_output_tokensmax_tokensmax_tokensmaxOutputTokensnonemax_tokenstokens_to_generate
temperaturetemperaturetemperaturetemperaturetemperaturetemperatureTemperaturetemperaturetemperature
top_ptop_ptop_pnonetop_ptopPTopPtop_ptop_p
top_knonenonetop_knonetopKnonetop_knone
seedseednonenoneseednonenonenonenone

If a parameter name is not in this table and mode is auto, the setting has no effect. Use raw mode to pass arbitrary parameters.

Parameter injection paths (raw mode)

In raw mode, the name and value are injected directly into the request JSON. The injection path depends on the provider:

ProviderInjection path
Most providersRoot of the JSON body
Qwenparameters subpath
Geminigeneration_config subpath

Supported providers

Providers that require only type and apiTokens

Providertype value
OpenAIopenai
Baichuan AIbaichuan
Yiyi
Zhipu AIzhipuai
DeepSeekdeepseek
Groqgroq
Baidubaidu
360 Brainai360
Mistralmistral
Stepfunstepfun
Coherecohere

Providers with additional fields

OpenAI (custom endpoint)

Set type to openai.

FieldTypeRequiredDescription
openaiCustomUrlstringNoCustom backend URL for OpenAI-compatible services. Example: www.example.com/myai/v1/chat/completions.
responseJsonSchemaobjectNoPredefined JSON schema for structured responses. Only specific models support this field.

Azure OpenAI

Set type to azure.

FieldTypeRequiredDescription
azureServiceUrlstringYesAzure OpenAI service URL. Must include the api-version query parameter.
Important

Azure OpenAI supports only one API token.

Moonshot

Set type to moonshot.

FieldTypeRequiredDescription
moonshotFileIdstringNoID of a file uploaded through the Moonshot file API. The file content serves as conversation context. Cannot be used together with the context field.

Qwen

Set type to qwen.

FieldTypeRequiredDescription
qwenEnableSearchbooleanNoEnable Qwen's built-in internet search.
qwenFileIdsarray of stringNoIDs of files uploaded to DashScope through the file API. File content serves as conversation context. Cannot be used together with the context field.

MiniMax

Set type to minimax.

FieldTypeRequiredDescription
minimaxGroupIdstringRequired for abab6.5-chat, abab6.5s-chat, abab5.5s-chat, and abab5.5-chat modelsGroup ID for ChatCompletion Pro.

Anthropic Claude

Set type to claude.

FieldTypeRequiredDescription
claudeVersionstringNoAnthropic Claude API version. Default: 2023-06-01.

Ollama

Set type to ollama.

FieldTypeRequiredDescription
ollamaServerHoststringYesHost IP address of the Ollama server.
ollamaServerPortnumberYesPort of the Ollama server. Default: 11434.

Hunyuan

Set type to hunyuan.

FieldTypeRequiredDescription
hunyuanAuthIdstringYesHunyuan ID for v3 authentication.
hunyuanAuthKeystringYesHunyuan key for v3 authentication.

Cloudflare Workers AI

Set type to cloudflare.

FieldTypeRequiredDescription
cloudflareAccountIdstringYesCloudflare account ID. For details, see Cloudflare account ID.

Spark (iFLYTEK)

Set type to spark.

No provider-specific fields are required. However, the apiTokens value must use the format APIKey:APISecret (colon-separated).

Gemini

Set type to gemini.

FieldTypeRequiredDescription
geminiSafetySettingmap of stringNoContent filtering and safety settings. For details, see Safety settings.

DeepL

Set type to deepl.

FieldTypeRequiredDescription
targetLangstringYesTarget language code for translation (for example, ZH).

Configuration examples

Azure OpenAI

Route requests to Azure OpenAI with a single deployment:

provider:
  type: azure
  apiTokens:
    - "<your-azure-openai-api-token>"
  azureServiceUrl: "https://<your-resource-name>.openai.azure.com/openai/deployments/<your-deployment-name>/chat/completions?api-version=2024-02-15-preview"

Qwen with model mapping

Map OpenAI model names to Qwen equivalents:

provider:
  type: qwen
  apiTokens:
    - "<your-qwen-api-token>"
  modelMapping:
    "gpt-3": "qwen-turbo"
    "gpt-35-turbo": "qwen-plus"
    "gpt-4-turbo": "qwen-max"
    "gpt-4-*": "qwen-max"
    "gpt-4o": "qwen-vl-plus"
    "text-embedding-v1": "text-embedding-v1"
    "*": "qwen-turbo"

Alibaba Cloud Model Studio (native protocol)

Use the Qwen provider type with the original DashScope protocol instead of OpenAI-compatible:

provider:
  type: qwen
  apiTokens:
    - "<your-dashscope-api-token>"
  protocol: original

Doubao with extended timeout

Route requests to Doubao (ByteDance). Use modelMapping to point to your Doubao endpoint, and set a longer timeout for large model responses:

provider:
  type: doubao
  apiTokens:
    - "<your-doubao-api-key>"
  modelMapping:
    "*": "<your-doubao-endpoint>"
  timeout: 1200000

Moonshot with file context

Upload a file to Moonshot, then reference it as conversation context:

provider:
  type: moonshot
  apiTokens:
    - "<your-moonshot-api-token>"
  moonshotFileId: "<your-moonshot-file-id>"
  modelMapping:
    "*": "moonshot-v1-32k"

Groq

provider:
  type: groq
  apiTokens:
    - "<your-groq-api-token>"

Anthropic Claude

provider:
  type: claude
  apiTokens:
    - "<your-claude-api-token>"
  claudeVersion: "2023-06-01"

Hunyuan

provider:
  type: hunyuan
  hunyuanAuthKey: "<your-auth-key>"
  apiTokens:
    - ""
  hunyuanAuthId: "<your-auth-id>"
  timeout: 1200000
  modelMapping:
    "*": "hunyuan-lite"

Baidu

provider:
  type: baidu
  apiTokens:
    - "<your-baidu-api-token>"
  modelMapping:
    "gpt-3": "ERNIE-4.0"
    "*": "ERNIE-4.0"

MiniMax

provider:
  type: minimax
  apiTokens:
    - "<your-minimax-api-token>"
  modelMapping:
    "gpt-3": "abab6.5g-chat"
    "gpt-4": "abab6.5-chat"
    "*": "abab6.5g-chat"
  minimaxGroupId: "<your-minimax-group-id>"

360 Brain

provider:
  type: ai360
  apiTokens:
    - "<your-ai360-api-token>"
  modelMapping:
    "gpt-4o": "360gpt-turbo-responsibility-8k"
    "gpt-4": "360gpt2-pro"
    "gpt-3.5": "360gpt-turbo"
    "text-embedding-3-small": "embedding_s1_v1.2"
    "*": "360gpt-pro"

Cloudflare Workers AI

provider:
  type: cloudflare
  apiTokens:
    - "<your-workers-ai-api-token>"
  cloudflareAccountId: "<your-cloudflare-account-id>"
  modelMapping:
    "*": "@cf/meta/llama-3-8b-instruct"

Spark (iFLYTEK)

The apiTokens value uses the format APIKey:APISecret:

provider:
  type: spark
  apiTokens:
    - "<your-api-key>:<your-api-secret>"
  modelMapping:
    "gpt-4o": "generalv3.5"
    "gpt-4": "generalv3"
    "*": "general"

Gemini

provider:
  type: gemini
  apiTokens:
    - "<your-gemini-api-token>"
  modelMapping:
    "*": "gemini-pro"
  geminiSafetySetting:
    "HARM_CATEGORY_SEXUALLY_EXPLICIT": "BLOCK_NONE"
    "HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE"
    "HARM_CATEGORY_HARASSMENT": "BLOCK_NONE"
    "HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_NONE"

DeepL translation

provider:
  type: deepl
  apiTokens:
    - "<your-deepl-api-token>"
  targetLang: "ZH"

Request format: Set model to Free or Pro to select the DeepL service tier. Each message in the messages array contains text to translate. A message with role: system provides translation context to improve accuracy but is not itself translated.

{
  "model": "Free",
  "messages": [
    {
      "role": "system",
      "content": "money"
    },
    {
      "content": "sit by the bank"
    },
    {
      "content": "a bank in China"
    }
  ]
}

Response:

{
  "choices": [
    {
      "index": 0
    },
    {
      "index": 1
    }
  ],
  "created": 1722747752,
  "model": "Free",
  "object": "chat.completion",
  "usage": {}
}