This topic describes the input and output parameters for the Qwen-MT OpenAI-compatible API or DashScope API.
References: Machine translation (Qwen-MT)
OpenAI compatibility
Singapore region
For SDK, set base_url to: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
For HTTP, the endpoint is: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
Beijing region
For SDK, set base_url to: https://dashscope.aliyuncs.com/compatible-mode/v1
For HTTP, the endpoint is: POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
You must first create an API key and export the API key as an environment variable. If using the OpenAI SDK, install the SDK.
Request body | Basic usagePythonNode.jscurlTerm interventionPythonNode.jscurlTranslation memoryPythonNode.jscurlDomain promptingPythonNode.jscurl |
model The name of the model. Supported models: qwen-mt-plus, qwen-mt-flash, qwen-mt-lite, and qwen-mt-turbo. | |
messages An array of messages that provides context to the model. Only user messages are supported. | |
stream Specifies whether to return the response in streaming mode. Valid values:
Note Currently, only qwen-mt-flash and qwen-mt-lite support incremental data return, where each chunk contains only the newly generated content. qwen-mt-plus and qwen-mt-turbo return data non-incrementally, where each chunk contains the entire sequence generated so far. This behavior cannot be changed. For example: I I didn I didn't I didn't laugh I didn't laugh after ... | |
stream_options The configuration items for streaming output. This parameter takes effect only when | |
max_tokens The maximum number of tokens to generate. If the generated content exceeds this value, the response is truncated. The default and maximum values are the maximum output length of the model. For more information, see Model selection. | |
seed The random number seed. This ensures that results are reproducible with the same input and parameters. If you use the same Value range: | |
temperature The sampling temperature, which controls the diversity of the generated text. A higher temperature value results in more diverse text. A lower temperature value results in more deterministic text. Value range: [0, 2) Both `temperature` and `top_p` control the diversity of the generated text. Set only one of them. | |
top_p The probability threshold for nucleus sampling, which controls the diversity of the generated text. A higher `top_p` value results in more diverse text. A lower `top_p` value results in more deterministic text. Value range: (0, 1.0] Both `temperature` and `top_p` control the diversity of the generated text. Set only one of them. | |
top_k The size of the candidate set for sampling during generation. For example, if you set this parameter to 50, only the 50 tokens with the highest scores in a single generation are used to form the candidate set for sampling. A larger value increases randomness, and a smaller value increases determinism. If the value is `None` or greater than 100, the `top_k` policy is disabled and only the `top_p` policy takes effect. The value must be greater than or equal to 0. This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: | |
repetition_penalty The penalty for repetition in consecutive sequences during model generation. A higher `repetition_penalty` value reduces repetition. A value of 1.0 indicates no penalty. The value must be greater than 0, but there is no strict value range. This parameter is not a standard OpenAI parameter. For Python SDK calls, place it in the extra_body object, as follows: | |
translation_options The translation parameters to configure. This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: |
Chat response object (non-streaming output) | |
id The unique identifier of the request. | |
choices An array of content generated by the model. | |
created The UNIX timestamp when the request was created. | |
model The model used for the request. | |
object This is always | |
service_tier This parameter is currently fixed to | |
system_fingerprint This parameter is currently fixed to | |
usage The token usage information for the request. |
Chat response chunk object (streaming output) | Incremental outputNon-incremental output |
id The unique identifier of the call. Each chunk object has the same ID. | |
choices An array of content generated by the model. If | |
created The UNIX timestamp when the request was created. Each chunk has the same timestamp. | |
model The model used for the request. | |
object This is always | |
service_tier This parameter is currently fixed to | |
system_fingerprint This parameter is currently fixed to | |
usage The tokens consumed by the request. This is returned in the last chunk only when |
DashScope
Singapore region
For HTTP, the endpoint is: POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
For SDK, set base_url to:
Python code
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'Java code
Method 1:
import com.alibaba.dashscope.protocol.Protocol; Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");Method 2:
import com.alibaba.dashscope.utils.Constants; Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
Beijing region
For HTTP, the endpoint is: POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
For SDK calls, you do not need to configure base_url.
You must create an API key and export the API key as an environment variable. If using the DashScope SDK, install the DashScope SDK.
Request body | Basic usagePythonJavacurlTerm interventionPythonJavacurlTranslation memoryPythonJavacurlDomain promptingPythonJavacurl |
model The name of the model. Supported models: qwen-mt-plus, qwen-mt-flash, qwen-mt-lite, and qwen-mt-turbo. | |
messages An array of messages that provides context to the model. Only user messages are supported. | |
max_tokens The maximum number of tokens to generate. If the generated content exceeds this value, the response is truncated. The default and maximum values are the maximum output length of the model. For more information, see Model selection. In the Java SDK, the parameter is maxTokens. For an HTTP call, place max_tokens in the parameters object. | |
seed The random number seed. This ensures that results are reproducible with the same input and parameters. If you use the same Value range: When you make an HTTP call, place seed in the parameters object. | |
temperature The sampling temperature, which controls the diversity of the generated text. A higher temperature value results in more diverse text. A lower temperature value results in more deterministic text. Value range: [0, 2) Both `temperature` and `top_p` control the diversity of the generated text. Set only one of them. When you make an HTTP call, place temperature in the parameters object. | |
top_p The probability threshold for nucleus sampling, which controls the diversity of the generated text. A higher `top_p` value results in more diverse text. A lower `top_p` value results in more deterministic text. Value range: (0, 1.0] Both `temperature` and `top_p` control the diversity of the generated text. Set only one of them. In the Java SDK, the parameter is topP. For HTTP calls, place top_p in the parameters object. | |
repetition_penalty The penalty for repetition in consecutive sequences during model generation. A higher `repetition_penalty` value reduces repetition. A value of 1.0 indicates no penalty. The value must be greater than 0, but there is no strict value range. In the Java SDK, the parameter is repetitionPenalty. When you make an HTTP call, place repetition_penalty in the parameters object. | |
top_k The size of the candidate set for sampling during generation. For example, if you set this parameter to 50, only the 50 tokens with the highest scores in a single generation are used to form the candidate set for sampling. A larger value increases randomness, and a smaller value increases determinism. If the value is `None` or greater than 100, the `top_k` policy is disabled and only the `top_p` policy takes effect. The value must be greater than or equal to 0. In the Java SDK, this parameter is topKtop_k in the parameters object. | |
stream Specifies whether to return the response in streaming mode. Valid values:
Note Currently, only qwen-mt-flash and qwen-mt-lite support incremental data return, where each chunk contains only the newly generated content. qwen-mt-plus and qwen-mt-turbo return data non-incrementally, where each chunk contains the entire sequence generated so far. This behavior cannot be changed. For example: I I didn I didn't I didn't laugh I didn't laugh after ... This parameter is supported only by the Python SDK. To implement streaming output with the Java SDK, you can call the | |
translation_options The translation parameters to configure. In the Java SDK, this parameter is |
Chat response object (Same for streaming and non-streaming output) | |
status_code The status code of the request. A value of 200 indicates that the request is successful. Otherwise, the request failed. The Java SDK does not return this parameter. If the call fails, an exception is thrown. The exception message contains the content of status_code and message. | |
request_id The unique identifier of the call. In the Java SDK, the returned parameter is requestId | |
code The error code. This is empty if the call is successful. Only the Python SDK returns this parameter. | |
output The information about the call result. | |
usage The token usage details for the request. |
Error codes
If the model call fails and an error message is returned, see Error messages to resolve the issue.