All Products
Search
Document Center

Microservices Engine:AI proxy

Last Updated:Jan 10, 2025

The ai-proxy plug-in implements the AI agent feature based on OpenAI API contracts. The ai-proxy plug-in supports AI service providers, such as OpenAI, Azure OpenAI, Moonshot, and Qwen.

Important
  • Enable this plug-in for routes that process only AI traffic. For the requests that do not comply with the OpenAI API specifications, the plug-in returns the HTTP 404 status code.

  • If the suffix of a request path matches /v1/chat/completions, the text-to-text protocol of OpenAI is used to parse the request body. Then, the system converts the parsed request to comply with the text-to-text protocol of the related large language model (LLM) provider.

  • If the suffix of a request path matches /v1/embeddings, the text vectorization protocol of OpenAI is used to parse the request body. Then, the system converts the parsed request to comply with the text vectorization protocol of the related LLM provider.

Running attributes

Plug-in execution stage: default stage. Plug-in execution priority: 100.

Configuration items

Basic configuration items

Name

Data type

Required

Default value

Description

provider

object

Yes

-

The information about the AI service provider.

The following table describes the fields in the provider configuration item.

Name

Data type

Required

Default value

Description

type

string

Yes

-

The name of the AI service provider.

apiTokens

array of string

No

-

The token that is used for authentication when the plug-in accesses the AI service. If multiple tokens are configured, the plug-in randomly selects a token when it requests access to the AI service. Specific AI service providers support only one token.

timeout

number

No

-

The timeout period for accessing the AI service. Unit: milliseconds. Default value: 120000, which is equivalent to 2 minutes.

modelMapping

map of string

No

-

The AI model mapping table, which is used to map the model name in the request to the model name that is supported by the service provider.

  1. Prefix matching is supported. For example, you can use "gpt-3-" to match all models whose names start with "gpt-3-".

  2. You can use double quotation marks ("") as the key to configure the general mapping relationships.

  3. If an empty string ("") is returned after mapping, the original model name in the request is retained.

protocol

string

No

-

The API contract provided by the plug-in. Valid values: openai and original. openai indicates that the API contract of OpenAI is used. This is the default value. original indicates that the original API contract of the service provider is used.

context

object

No

-

The AI context information.

customSettings

array of customSetting

No

-

The parameters to be overwritten or filled in for the AI request.

The following table describes the fields in the context configuration item.

Name

Data type

Required

Default value

Description

fileUrl

string

Yes

-

The URL of the file in which the AI context is saved. Only plaintext file content is supported.

serviceName

string

Yes

-

The full name of the Higress backend service that corresponds to the URL.

servicePort

number

Yes

-

The access port of the Higress backend service that corresponds to the URL.

The following table describes the fields in the customSettings configuration item.

Name

Data type

Required

Default value

Description

name

string

Yes

-

The name of the parameter you want to manage. Example: max_tokens.

value

string/int/float/bool

Yes

-

The value of the parameter that you want to manage. Example: 0.

mode

string

No

"auto"

The parameter configuration mode. Valid values: "auto" and "raw". If you set this parameter to "auto", the parameter name is automatically rewritten based on protocols. If you set this parameter to "raw", the parameter name is not rewritten based on protocols, and no restriction check is performed on the parameter name.

overwrite

bool

No

true

If the value of this parameter is set to false, the parameter is filled in only when you have not specified the parameter. Otherwise, the parameter settings that you specify are overwritten.

The following table describes the parameter name rewriting rules for customSettings. The parameter names that you specify by using the name parameter are rewritten based on protocols. You must use the values in the settingName column to specify the name parameter. For example, if you set name to max_tokens, the parameter name is rewritten to max_tokens based on the OpenAI protocol and to maxOutputTokens based on the Gemini protocol. none indicates that the protocol does not support this parameter. When you set the name parameter to a value that is not listed in this table or a value that is not supported by the protocol, the configuration does not take effect if the raw mode is disabled.

settingName

openai

baidu

spark

qwen

gemini

hunyuan

claude

minimax

max_tokens

max_tokens

max_output_tokens

max_tokens

max_tokens

maxOutputTokens

none

max_tokens

tokens_to_generate

temperature

temperature

temperature

temperature

temperature

temperature

Temperature

temperature

temperature

top_p

top_p

top_p

none

top_p

topP

TopP

top_p

top_p

top_k

none

none

top_k

none

topK

none

top_k

none

seed

seed

none

none

seed

none

none

none

none

If the raw mode is enabled, the name and value that you specify in customSettings are directly used to rewrite the JSON content in the request. You do not need to modify parameter names. For most protocols, you can use customSettings to modify or fill in parameters in the root path of the JSON content. For the Qwen protocol, you can use the ai-proxy plug-in to configure parameters in the parameters subpath of the JSON content. For the Gemini protocol, you can use the ai-proxy plug-in to configure parameters in the generation_config subpath of the JSON content.

Configuration items specific to each service provider

OpenAI

The value of type for OpenAI is openai. The following table describes the configuration item that is specific to this service provider.

Name

Data type

Required

Default value

Description

openaiCustomUrl

string

No

-

The custom URL of backend services based on the OpenAI protocol. Example: www.example.com/myai/v1/chat/completions.

responseJsonSchema

object

No

-

The JSON schema that is predefined in OpenAI responses. Only specific models support this configuration item.

Azure OpenAI

The value of type for Azure OpenAI is azure. The following table describes the configuration items specific to this service provider.

Name

Data type

Required

Default value

Description

azureServiceUrl

string

Yes

-

The URL of the Azure OpenAI service. The value must include the api-version query parameter.

Important

Azure OpenAI supports only one API token.

Moonshot

The value of type for Moonshot is moonshot. The following table describes the configuration items that are specific to this service provider.

Name

Data type

Required

Default value

Description

moonshotFileId

string

No

-

The ID of the file that is uploaded to Moonshot by using the file interface. The file content is used as the context of the AI conversation. This field cannot be used together with the context field.

Qwen

The value of type for Qwen is qwen. The following table describes the configuration items that are specific to this service provider.

Name

Data type

Required

Default value

Description

qwenEnableSearch

boolean

No

-

Specifies whether to enable the built-in Internet search feature of Qwen.

qwenFileIds

array of string

No

-

The ID of the file that is uploaded to DashScope by using the file interface. The file content is used as the context of the AI conversation. This field cannot be used together with the context field.

Baichuan AI

The value of type for Baichuan AI is baichuan. No specific configuration items are required.

Yi

The value of type for Yi is yi. No specific configuration items are required.

Zhipu AI

The value of type for Zhipu AI is zhipuai. No specific configuration items are required.

DeepSeek

The value of type for DeepSeek is deepseek. No specific configuration items are required.

Groq

The value of type for Groq is groq. No specific configuration items are required.

Baidu

The value of type for Baidu is baidu. No specific configuration items are required.

360 Brain

The value of type for 360 is ai360. No specific configuration items are required.

Mistral

The value of type for Mistral is mistral. No specific configuration items are required.

MiniMax

The value of type for MiniMax is minimax. The following table describes the configuration item that is specific to this service provider.

Name

Data type

Required

Default value

Description

minimaxGroupId

string

Required when the abab6.5-chat, abab6.5s-chat, abab5.5s-chat, or abab5.5-chat model is used.

-

ChatCompletion Pro is used when the abab6.5-chat, abab6.5s-chat, abab5.5s-chat, or abab5.5-chat model is used. In this case, the group ID must be specified.

Anthropic Claude

The value of type for Anthropic Claude is claude. The following table describes the configuration item that is specific to this service provider.

Name

Data type

Required

Default value

Description

claudeVersion

string

No

-

The API version of the Anthropic Claude service. Default value: 2023-06-01.

Ollama

The value of type for Ollama is ollama. The following table describes the configuration items that are specific to this service provider.

Name

Data type

Required

Default value

Description

ollamaServerHost

string

Yes

-

The host IP address of the Ollama server.

ollamaServerPort

number

Yes

-

The port number of the Ollama server. Default value: 11434.

Hunyuan

The value of type for Hunyuan is hunyuan. The following table describes the configuration items that are specific to this service provider.

Name

Data type

Required

Default value

Description

hunyuanAuthId

string

Yes

-

The ID of Hunyuan used for v3 authentication.

hunyuanAuthKey

string

Yes

-

The key of Hunyuan used for v3 authentication.

Stepfun

The value of type for Stepfun is stepfun. No specific configuration items are required.

Cloudflare Workers AI

The value of type for Cloudflare Workers AI is cloudflare. The following table describes the configuration item that is specific to this service provider.

Name

Data type

Required

Default value

Description

cloudflareAccountId

string

Yes

-

The Cloudflare account ID. For more information, see Cloudflare account ID.

Spark

The value of type for Spark is spark. No specific configuration items are required.

The value of the apiTokens field of iFLYTEK Spark is in the APIKey:APISecret format. You must replace APIKey and APISecret with your API key and API secret. Separate the API key and API secret with a colon (:).

Gemini

The value of type for Gemini is gemini. The following table describes the configuration item that is specific to this service provider.

Name

Data type

Required

Default value

Description

geminiSafetySetting

map of string

No

-

The content filtering and security settings of Gemini. For more information, see Safety settings

DeepL

The value of type for DeepL is deepl. The following table describes the configuration item that is specific to this service provider.

Name

Data type

Required

Default value

Description

targetLang

string

Yes

-

The target language that you specify when you use the DeepL translation service.

Cohere

The value of type for Cohere is cohere. No specific configuration items are required.

Examples

Use the Azure OpenAI service by using the OpenAI protocol

Use the most basic Azure OpenAI service without the need to configure context.

Configuration information

provider:
  type: azure
  apiTokens:
    - "YOUR_AZURE_OPENAI_API_TOKEN"
  azureServiceUrl: "https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2024-02-15-preview",

Use the Qwen service by using the OpenAI protocol

Use the Qwen service and configure the model mapping from the OpenAI LLM to the Qwen service.

Configuration information

provider:
  type: qwen
  apiTokens:
    - "YOUR_QWEN_API_TOKEN"
  modelMapping:
    'gpt-3': "qwen-turbo"
    'gpt-35-turbo': "qwen-plus"
    'gpt-4-turbo': "qwen-max"
    'gpt-4-*': "qwen-max"
    'gpt-4o': "qwen-vl-plus"
    'text-embedding-v1': 'text-embedding-v1'
    '*': "qwen-turbo"

Use the Alibaba Cloud Model Studio service by using the original protocol

Configuration information

provider:
  type: qwen
  apiTokens:
    - "YOUR_DASHSCOPE_API_TOKEN"
  protocol: original

Use the Doubao service by using the OpenAI protocol

Configuration information

provider:
  type: doubao
  apiTokens:
    - YOUR_DOUBAO_API_KEY
  modelMapping:
    '*': YOUR_DOUBAO_ENDPOINT
  timeout: 1200000

Use the Moonshot service based on the content of a file

Upload a file to the Moonshot service in advance. Then, use the Moonshot service together with the content of the file as the context.

Configuration information

provider:
  type: moonshot
  apiTokens:
    - "YOUR_MOONSHOT_API_TOKEN"
  moonshotFileId: "YOUR_MOONSHOT_FILE_ID",
  modelMapping:
    '*': "moonshot-v1-32k"

Use the Groq service by using the OpenAI protocol

Configuration information

provider:
  type: groq
  apiTokens:
    - "YOUR_GROQ_API_TOKEN"

Use the Anthropic Claude service by using the OpenAI protocol

Configuration information

provider:
  type: claude
  apiTokens:
    - "YOUR_CLAUDE_API_TOKEN"
  version: "2023-06-01"

Use the Hunyuan service by using the OpenAI protocol

Configuration information

provider:
  type: "hunyuan"
  hunyuanAuthKey: "<YOUR AUTH KEY>"
  apiTokens:
    - ""
  hunyuanAuthId: "<YOUR AUTH ID>"
  timeout: 1200000
  modelMapping:
    "*": "hunyuan-lite"

Use the Baidu service by using the OpenAI protocol

Configuration information

provider:
  type: baidu
  apiTokens:
    - "YOUR_BAIDU_API_TOKEN"
  modelMapping:
    'gpt-3': "ERNIE-4.0"
    '*': "ERNIE-4.0"

Use the MiniMax service by using the OpenAI protocol

Configuration information

provider:
  type: minimax
  apiTokens:
    - "YOUR_MINIMAX_API_TOKEN"
  modelMapping:
    "gpt-3": "abab6.5g-chat"
    "gpt-4": "abab6.5-chat"
    "*": "abab6.5g-chat"
  minimaxGroupId: "YOUR_MINIMAX_GROUP_ID"

Use the 360 Brain service by using the OpenAI protocol

Configuration information

provider:
  type: ai360
  apiTokens:
    - "YOUR_MINIMAX_API_TOKEN"
  modelMapping:
    "gpt-4o": "360gpt-turbo-responsibility-8k"
    "gpt-4": "360gpt2-pro"
    "gpt-3.5": "360gpt-turbo"
    "text-embedding-3-small": "embedding_s1_v1.2"
    "*": "360gpt-pro"

Use the Cloudflare Workers AI service by using the OpenAI protocol

Configuration information

provider:
  type: cloudflare
  apiTokens:
    - "YOUR_WORKERS_AI_API_TOKEN"
  cloudflareAccountId: "YOUR_CLOUDFLARE_ACCOUNT_ID"
  modelMapping:
    "*": "@cf/meta/llama-3-8b-instruct"

Use the Spark service by using the OpenAI protocol

Configuration information

provider:
  type: spark
  apiTokens:
    - "APIKey:APISecret"
  modelMapping:
    "gpt-4o": "generalv3.5"
    "gpt-4": "generalv3"
    "*": "general"

Use the Gemini service by using the OpenAI protocol

Configuration information

provider:
  type: gemini
  apiTokens:
    - "YOUR_GEMINI_API_TOKEN"
  modelMapping:
    "*": "gemini-pro"
  geminiSafetySetting:
    "HARM_CATEGORY_SEXUALLY_EXPLICIT" :"BLOCK_NONE"
    "HARM_CATEGORY_HATE_SPEECH" :"BLOCK_NONE"
    "HARM_CATEGORY_HARASSMENT" :"BLOCK_NONE"
    "HARM_CATEGORY_DANGEROUS_CONTENT" :"BLOCK_NONE"

Use the DeepL translation service by using the OpenAI protocol

Configuration information

provider:
  type: deepl
  apiTokens:
    - "YOUR_DEEPL_API_TOKEN"
  targetLang: "ZH"

Sample request

In the following sample request, model indicates the service type of DeepL. You can only set this value to Free or Pro. Enter the text that you want to translate in content. The content parameter in the role: system configuration can contain contextual information that helps improve translation accuracy but is not itself translated. For example, you can enter the description of a product as contextual information when you use the service to translate the product name. This may help improve the quality of the translation.

{
  "model": "Free",
  "messages": [
    {
      "role": "system",
      "content": "money"
    },
    {
      "content": "sit by the bank"
    },
    {
      "content": "a bank in China"
    }
  ]
}

Sample response

{
  "choices": [
    {
      "index": 0,
       
    },
    {
      "index": 1,
       
    }
  ],
  "created": 1722747752,
  "model": "Free",
  "object": "chat.completion",
  "usage": {}
}