AI proxy - Microservices Engine - Alibaba Cloud Documentation Center

The ai-proxy plug-in implements the AI agent feature based on OpenAI API contracts. The ai-proxy plug-in supports AI service providers, such as OpenAI, Azure OpenAI, Moonshot, and Qwen.

Important

Enable this plug-in for routes that process only AI traffic. For the requests that do not comply with the OpenAI API specifications, the plug-in returns the HTTP 404 status code.
If the suffix of a request path matches /v1/chat/completions, the text-to-text protocol of OpenAI is used to parse the request body. Then, the system converts the parsed request to comply with the text-to-text protocol of the related large language model (LLM) provider.
If the suffix of a request path matches /v1/embeddings, the text vectorization protocol of OpenAI is used to parse the request body. Then, the system converts the parsed request to comply with the text vectorization protocol of the related LLM provider.

Running attributes

Plug-in execution stage: default stage. Plug-in execution priority: 100.

Configuration items

Basic configuration items

Name	Data type	Required	Default value	Description
`provider`	object	Yes	-	The information about the AI service provider.

The following table describes the fields in the provider configuration item.

Name	Data type	Required	Default value	Description
`type`	string	Yes	-	The name of the AI service provider.
`apiTokens`	array of string	No	-	The token that is used for authentication when the plug-in accesses the AI service. If multiple tokens are configured, the plug-in randomly selects a token when it requests access to the AI service. Specific AI service providers support only one token.
`timeout`	number	No	-	The timeout period for accessing the AI service. Unit: milliseconds. Default value: 120000, which is equivalent to 2 minutes.
`modelMapping`	map of string	No	-	The AI model mapping table, which is used to map the model name in the request to the model name that is supported by the service provider. Prefix matching is supported. For example, you can use "gpt-3-" to match all models whose names start with "gpt-3-". You can use double quotation marks ("") as the key to configure the general mapping relationships. If an empty string ("") is returned after mapping, the original model name in the request is retained.
`protocol`	string	No	-	The API contract provided by the plug-in. Valid values: openai and original. openai indicates that the API contract of OpenAI is used. This is the default value. original indicates that the original API contract of the service provider is used.
`context`	object	No	-	The AI context information.
`customSettings`	array of customSetting	No	-	The parameters to be overwritten or filled in for the AI request.

The following table describes the fields in the context configuration item.

Name	Data type	Required	Default value	Description
`fileUrl`	string	Yes	-	The URL of the file in which the AI context is saved. Only plaintext file content is supported.
`serviceName`	string	Yes	-	The full name of the Higress backend service that corresponds to the URL.
`servicePort`	number	Yes	-	The access port of the Higress backend service that corresponds to the URL.

The following table describes the fields in the customSettings configuration item.

Name	Data type	Required	Default value	Description
`name`	string	Yes	-	The name of the parameter you want to manage. Example: `max_tokens`.
`value`	string/int/float/bool	Yes	-	The value of the parameter that you want to manage. Example: 0.
`mode`	string	No	"auto"	The parameter configuration mode. Valid values: "auto" and "raw". If you set this parameter to "auto", the parameter name is automatically rewritten based on protocols. If you set this parameter to "raw", the parameter name is not rewritten based on protocols, and no restriction check is performed on the parameter name.
`overwrite`	bool	No	true	If the value of this parameter is set to false, the parameter is filled in only when you have not specified the parameter. Otherwise, the parameter settings that you specify are overwritten.

The following table describes the parameter name rewriting rules for customSettings. The parameter names that you specify by using the name parameter are rewritten based on protocols. You must use the values in the settingName column to specify the name parameter. For example, if you set name to max_tokens, the parameter name is rewritten to max_tokens based on the OpenAI protocol and to maxOutputTokens based on the Gemini protocol. none indicates that the protocol does not support this parameter. When you set the name parameter to a value that is not listed in this table or a value that is not supported by the protocol, the configuration does not take effect if the raw mode is disabled.

settingName	openai	baidu	spark	qwen	gemini	hunyuan	claude	minimax
max_tokens	max_tokens	max_output_tokens	max_tokens	max_tokens	maxOutputTokens	none	max_tokens	tokens_to_generate
temperature	temperature	temperature	temperature	temperature	temperature	Temperature	temperature	temperature
top_p	top_p	top_p	none	top_p	topP	TopP	top_p	top_p
top_k	none	none	top_k	none	topK	none	top_k	none
seed	seed	none	none	seed	none	none	none	none

If the raw mode is enabled, the name and value that you specify in customSettings are directly used to rewrite the JSON content in the request. You do not need to modify parameter names. For most protocols, you can use customSettings to modify or fill in parameters in the root path of the JSON content. For the Qwen protocol, you can use the ai-proxy plug-in to configure parameters in the parameters subpath of the JSON content. For the Gemini protocol, you can use the ai-proxy plug-in to configure parameters in the generation_config subpath of the JSON content.

Configuration items specific to each service provider

OpenAI

The value of type for OpenAI is openai. The following table describes the configuration item that is specific to this service provider.

Name	Data type	Required	Default value	Description
`openaiCustomUrl`	string	No	-	The custom URL of backend services based on the OpenAI protocol. Example: `www.example.com/myai/v1/chat/completions`.
`responseJsonSchema`	object	No	-	The JSON schema that is predefined in OpenAI responses. Only specific models support this configuration item.

Azure OpenAI

The value of type for Azure OpenAI is azure. The following table describes the configuration items specific to this service provider.

Name	Data type	Required	Default value	Description
`azureServiceUrl`	string	Yes	-	The URL of the Azure OpenAI service. The value must include the `api-version` query parameter.

Important

Azure OpenAI supports only one API token.

Moonshot

The value of type for Moonshot is moonshot. The following table describes the configuration items that are specific to this service provider.

Name	Data type	Required	Default value	Description
`moonshotFileId`	string	No	-	The ID of the file that is uploaded to Moonshot by using the file interface. The file content is used as the context of the AI conversation. This field cannot be used together with the `context` field.

Qwen

The value of type for Qwen is qwen. The following table describes the configuration items that are specific to this service provider.

Name	Data type	Required	Default value	Description
`qwenEnableSearch`	boolean	No	-	Specifies whether to enable the built-in Internet search feature of Qwen.
`qwenFileIds`	array of string	No	-	The ID of the file that is uploaded to DashScope by using the file interface. The file content is used as the context of the AI conversation. This field cannot be used together with the `context` field.

Baichuan AI

The value of type for Baichuan AI is baichuan. No specific configuration items are required.

Yi

The value of type for Yi is yi. No specific configuration items are required.

Zhipu AI

The value of type for Zhipu AI is zhipuai. No specific configuration items are required.

DeepSeek

The value of type for DeepSeek is deepseek. No specific configuration items are required.

Groq

The value of type for Groq is groq. No specific configuration items are required.

Baidu

The value of type for Baidu is baidu. No specific configuration items are required.

360 Brain

The value of type for 360 is ai360. No specific configuration items are required.

Mistral

The value of type for Mistral is mistral. No specific configuration items are required.

MiniMax

The value of type for MiniMax is minimax. The following table describes the configuration item that is specific to this service provider.

Name	Data type	Required	Default value	Description
`minimaxGroupId`	string	Required when the `abab6.5-chat`, `abab6.5s-chat`, `abab5.5s-chat`, or `abab5.5-chat` model is used.	-	ChatCompletion Pro is used when the `abab6.5-chat`, `abab6.5s-chat`, `abab5.5s-chat`, or `abab5.5-chat` model is used. In this case, the group ID must be specified.

Anthropic Claude

The value of type for Anthropic Claude is claude. The following table describes the configuration item that is specific to this service provider.

Name	Data type	Required	Default value	Description
`claudeVersion`	string	No	-	The API version of the Anthropic Claude service. Default value: 2023-06-01.

Ollama

The value of type for Ollama is ollama. The following table describes the configuration items that are specific to this service provider.

Name	Data type	Required	Default value	Description
`ollamaServerHost`	string	Yes	-	The host IP address of the Ollama server.
`ollamaServerPort`	number	Yes	-	The port number of the Ollama server. Default value: 11434.

Hunyuan

The value of type for Hunyuan is hunyuan. The following table describes the configuration items that are specific to this service provider.

Name	Data type	Required	Default value	Description
`hunyuanAuthId`	string	Yes	-	The ID of Hunyuan used for v3 authentication.
`hunyuanAuthKey`	string	Yes	-	The key of Hunyuan used for v3 authentication.

Stepfun

The value of type for Stepfun is stepfun. No specific configuration items are required.

Cloudflare Workers AI

The value of type for Cloudflare Workers AI is cloudflare. The following table describes the configuration item that is specific to this service provider.

Name	Data type	Required	Default value	Description
`cloudflareAccountId`	string	Yes	-	The Cloudflare account ID. For more information, see Cloudflare account ID.

Spark

The value of type for Spark is spark. No specific configuration items are required.

The value of the apiTokens field of iFLYTEK Spark is in the APIKey:APISecret format. You must replace APIKey and APISecret with your API key and API secret. Separate the API key and API secret with a colon (:).

Gemini

The value of type for Gemini is gemini. The following table describes the configuration item that is specific to this service provider.

Name	Data type	Required	Default value	Description
`geminiSafetySetting`	map of string	No	-	The content filtering and security settings of Gemini. For more information, see Safety settings

DeepL

The value of type for DeepL is deepl. The following table describes the configuration item that is specific to this service provider.

Name	Data type	Required	Default value	Description
`targetLang`	string	Yes	-	The target language that you specify when you use the DeepL translation service.

Cohere

The value of type for Cohere is cohere. No specific configuration items are required.

Examples

Use the Azure OpenAI service by using the OpenAI protocol

Use the most basic Azure OpenAI service without the need to configure context.

Configuration information

provider:
  type: azure
  apiTokens:
    - "YOUR_AZURE_OPENAI_API_TOKEN"
  azureServiceUrl: "https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2024-02-15-preview",

Use the Qwen service by using the OpenAI protocol

Use the Qwen service and configure the model mapping from the OpenAI LLM to the Qwen service.

Configuration information

provider:
  type: qwen
  apiTokens:
    - "YOUR_QWEN_API_TOKEN"
  modelMapping:
    'gpt-3': "qwen-turbo"
    'gpt-35-turbo': "qwen-plus"
    'gpt-4-turbo': "qwen-max"
    'gpt-4-*': "qwen-max"
    'gpt-4o': "qwen-vl-plus"
    'text-embedding-v1': 'text-embedding-v1'
    '*': "qwen-turbo"

Use the Alibaba Cloud Model Studio service by using the original protocol

Configuration information

provider:
  type: qwen
  apiTokens:
    - "YOUR_DASHSCOPE_API_TOKEN"
  protocol: original

Use the Doubao service by using the OpenAI protocol

Configuration information

provider:
  type: doubao
  apiTokens:
    - YOUR_DOUBAO_API_KEY
  modelMapping:
    '*': YOUR_DOUBAO_ENDPOINT
  timeout: 1200000

Use the Moonshot service based on the content of a file

Upload a file to the Moonshot service in advance. Then, use the Moonshot service together with the content of the file as the context.

Configuration information

provider:
  type: moonshot
  apiTokens:
    - "YOUR_MOONSHOT_API_TOKEN"
  moonshotFileId: "YOUR_MOONSHOT_FILE_ID",
  modelMapping:
    '*': "moonshot-v1-32k"

Use the Groq service by using the OpenAI protocol

Configuration information

provider:
  type: groq
  apiTokens:
    - "YOUR_GROQ_API_TOKEN"

Use the Anthropic Claude service by using the OpenAI protocol

Configuration information

provider:
  type: claude
  apiTokens:
    - "YOUR_CLAUDE_API_TOKEN"
  version: "2023-06-01"

Use the Hunyuan service by using the OpenAI protocol

Configuration information

provider:
  type: "hunyuan"
  hunyuanAuthKey: "<YOUR AUTH KEY>"
  apiTokens:
    - ""
  hunyuanAuthId: "<YOUR AUTH ID>"
  timeout: 1200000
  modelMapping:
    "*": "hunyuan-lite"

Use the Baidu service by using the OpenAI protocol

Configuration information

provider:
  type: baidu
  apiTokens:
    - "YOUR_BAIDU_API_TOKEN"
  modelMapping:
    'gpt-3': "ERNIE-4.0"
    '*': "ERNIE-4.0"

Use the MiniMax service by using the OpenAI protocol

Configuration information

provider:
  type: minimax
  apiTokens:
    - "YOUR_MINIMAX_API_TOKEN"
  modelMapping:
    "gpt-3": "abab6.5g-chat"
    "gpt-4": "abab6.5-chat"
    "*": "abab6.5g-chat"
  minimaxGroupId: "YOUR_MINIMAX_GROUP_ID"

Use the 360 Brain service by using the OpenAI protocol

Configuration information

provider:
  type: ai360
  apiTokens:
    - "YOUR_MINIMAX_API_TOKEN"
  modelMapping:
    "gpt-4o": "360gpt-turbo-responsibility-8k"
    "gpt-4": "360gpt2-pro"
    "gpt-3.5": "360gpt-turbo"
    "text-embedding-3-small": "embedding_s1_v1.2"
    "*": "360gpt-pro"

Use the Cloudflare Workers AI service by using the OpenAI protocol

Configuration information

provider:
  type: cloudflare
  apiTokens:
    - "YOUR_WORKERS_AI_API_TOKEN"
  cloudflareAccountId: "YOUR_CLOUDFLARE_ACCOUNT_ID"
  modelMapping:
    "*": "@cf/meta/llama-3-8b-instruct"

Use the Spark service by using the OpenAI protocol

Configuration information

provider:
  type: spark
  apiTokens:
    - "APIKey:APISecret"
  modelMapping:
    "gpt-4o": "generalv3.5"
    "gpt-4": "generalv3"
    "*": "general"

Use the Gemini service by using the OpenAI protocol

Configuration information

provider:
  type: gemini
  apiTokens:
    - "YOUR_GEMINI_API_TOKEN"
  modelMapping:
    "*": "gemini-pro"
  geminiSafetySetting:
    "HARM_CATEGORY_SEXUALLY_EXPLICIT" :"BLOCK_NONE"
    "HARM_CATEGORY_HATE_SPEECH" :"BLOCK_NONE"
    "HARM_CATEGORY_HARASSMENT" :"BLOCK_NONE"
    "HARM_CATEGORY_DANGEROUS_CONTENT" :"BLOCK_NONE"

Use the DeepL translation service by using the OpenAI protocol

Configuration information

provider:
  type: deepl
  apiTokens:
    - "YOUR_DEEPL_API_TOKEN"
  targetLang: "ZH"

Sample request

In the following sample request, model indicates the service type of DeepL. You can only set this value to Free or Pro. Enter the text that you want to translate in content. The content parameter in the role: system configuration can contain contextual information that helps improve translation accuracy but is not itself translated. For example, you can enter the description of a product as contextual information when you use the service to translate the product name. This may help improve the quality of the translation.

{
  "model": "Free",
  "messages": [
    {
      "role": "system",
      "content": "money"
    },
    {
      "content": "sit by the bank"
    },
    {
      "content": "a bank in China"
    }
  ]
}

Sample response

{
  "choices": [
    {
      "index": 0,
       
    },
    {
      "index": 1,
       
    }
  ],
  "created": 1722747752,
  "model": "Free",
  "object": "chat.completion",
  "usage": {}
}