This topic describes the input and output parameters for calling Qwen-MT through the OpenAI compatible interface or the DashScope API.
References: Machine translation (Qwen-MT)
OpenAI compatible
Singapore region
base_url for SDK: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
HTTP endoint: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
Virginia region
base_url for SDK: https://dashscope-us.aliyuncs.com/compatible-mode/v1
HTTP endoint: POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
Beijing region
base_url for SDK: https://dashscope.aliyuncs.com/compatible-mode/v1
HTTP endoint: POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
You must first create an API key and configure the API key as an environment variable. If you use the OpenAI SDK to make calls, you need to install the SDK.
Request body | Basic usagePythonNode.jscurlEach region has a different request endpoint and API key. The following is the request endpoint for the Singapore region. Term interventionPythonNode.jscurlEach region has a different request endpoint and API key. The following is the request endpoint for the Singapore region. Translation memoryPythonNode.jscurlEach region has a different request endpoint and API key. The following is the request endpoint for the Singapore region. Domain promptingPythonNode.jscurlEach region has a different request endpoint and API key. The following is the request endpoint for the Singapore region. |
model The model name. Supported models: qwen-mt-plus, qwen-mt-flash, qwen-mt-lite, and qwen-mt-turbo. | |
messages An array of messages that provides context to the model. Only user messages are supported. | |
stream Specifies whether to return the response in streaming output mode. Valid values:
Note Currently, only the qwen-mt-flash and qwen-mt-lite models support returning data incrementally. Each returned data chunk contains only the newly generated content. The qwen-mt-plus and qwen-mt-turbo models return data non-incrementally. Each returned data chunk contains the entire sequence generated so far. This behavior cannot be changed. For example: I I didn I didn't I didn't laugh I didn't laugh after ... | |
stream_options The configuration items for streaming output. This parameter takes effect only when | |
max_tokens The maximum number of tokens to generate. If the generated content exceeds this value, the response is truncated. The default and maximum values are the maximum output length of the model. For more information, see Model selection. | |
seed The random number seed. This ensures that results are reproducible with the same input and parameters. If you use the same Value range: | |
temperature The sampling temperature, which controls the diversity of the generated text. A higher temperature value results in more diverse text. A lower temperature value results in more deterministic text. Value range: [0, 2) Both temperature and top_p control the diversity of the generated text. Set only one of them. | |
top_p The probability threshold for nucleus sampling, which controls the diversity of the generated text. A higher top_p value results in more diverse text. A lower top_p value results in more deterministic text. Value range: (0, 1.0] Both temperature and top_p control the diversity of the generated text. Set only one of them. | |
top_k The size of the candidate set for sampling during generation. For example, if you set this parameter to 50, only the 50 tokens with the highest scores in a single generation are used to form the candidate set for random sampling. A larger value increases randomness. A smaller value increases determinism. If the value is None or greater than 100, the top_k policy is disabled and only the top_p policy takes effect. The value must be greater than or equal to 0. This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: | |
repetition_penalty The penalty for repetition in consecutive sequences during model generation. A higher repetition_penalty value reduces repetition. A value of 1.0 indicates no penalty. The value must be greater than 0, but there is no strict value range. This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: | |
translation_options The translation parameters to configure. This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: |
Chat response object (non-streaming output) | |
id The unique ID of the request. | |
choices An array of content generated by the model. | |
created The UNIX timestamp when the request was created. | |
model The model used for the request. | |
object This is always | |
service_tier This parameter is currently fixed to | |
system_fingerprint This parameter is currently fixed to | |
usage The token consumption information for the request. |
Chat response chunk object (streaming output) | Incremental outputNon-incremental output |
id The unique ID of the call. Each chunk object has the same ID. | |
choices An array of content generated by the model. If | |
created The UNIX timestamp when the request was created. Each chunk has the same timestamp. | |
model The model used for the request. | |
object This is always | |
service_tier This parameter is currently fixed to | |
system_fingerprint This parameter is currently fixed to | |
usage The tokens consumed by the request. This is returned in the last chunk only when |
DashScope
Singapore
HTTP endoint: POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
Set base_url to:
Python code
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'Java code
Method 1:
import com.alibaba.dashscope.protocol.Protocol; Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");Method 2:
import com.alibaba.dashscope.utils.Constants; Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
Virginia
HTTP endoint: POST https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation
Set base_url to:
Python code
dashscope.base_http_api_url = 'https://dashscope-us.aliyuncs.com/api/v1'Java code
Method 1:
import com.alibaba.dashscope.protocol.Protocol; Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-us.aliyuncs.com/api/v1");Method 2:
import com.alibaba.dashscope.utils.Constants; Constants.baseHttpApiUrl="https://dashscope-us.aliyuncs.com/api/v1";
Beijing
HTTP endoint: POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
You do not need to configure base_url for SDK calls. The default value is https://dashscope.aliyuncs.com/api/v1.
You must create an API key and export the API key as an environment variable. If using the DashScope SDK, install the DashScope SDK.
Request body | Basic usagePythonJavacurlEach region has a different request endpoint and API key. The following is the request endpoint for the Singapore region. Term interventionPythonJavacurlEach region has a different request endpoint and API key. The following is the request endpoint for the Singapore region. Translation memoryPythonJavacurlEach region has a different request endpoint and API key. The following is the request endpoint for the Singapore region. Domain promptingPythonJavacurlEach region has a different request endpoint and API key. The following is the request endpoint for the Singapore region. |
model The model name. Supported models: qwen-mt-plus, qwen-mt-flash, qwen-mt-lite, and qwen-mt-turbo. | |
messages An array of messages that provides context to the model. Only user messages are supported. | |
max_tokens The maximum number of tokens to generate. If the generated content exceeds this value, the response is truncated. The default and maximum values are the maximum output length of the model. For more information, see Model selection. In the Java SDK, the parameter is maxTokens. For HTTP calls, place max_tokens in the parameters object. | |
seed The random number seed. This ensures that results are reproducible with the same input and parameters. If you use the same Value range: When you make an HTTP call, place seed in the parameters object. | |
temperature The sampling temperature, which controls the diversity of the generated text. A higher temperature value results in more diverse text. A lower temperature value results in more deterministic text. Value range: [0, 2) Both temperature and top_p control the diversity of the generated text. Set only one of them. When you make an HTTP call, place temperature in the parameters object. | |
top_p The probability threshold for nucleus sampling, which controls the diversity of the generated text. A higher top_p value results in more diverse text. A lower top_p value results in more deterministic text. Value range: (0, 1.0] Both temperature and top_p control the diversity of the generated text. Set only one of them. In the Java SDK, the parameter is topPparameters object. | |
repetition_penalty The penalty for repetition in consecutive sequences during model generation. A higher repetition_penalty value reduces repetition. A value of 1.0 indicates no penalty. The value must be greater than 0, but there is no strict value range. In the Java SDK, the parameter is repetitionPenalty. For HTTP calls, add repetition_penalty to the parameters object. | |
top_k The size of the candidate set for sampling during generation. For example, if you set this parameter to 50, only the 50 tokens with the highest scores in a single generation are used to form the candidate set for random sampling. A larger value increases randomness. A smaller value increases determinism. If the value is None or greater than 100, the top_k policy is disabled and only the top_p policy takes effect. The value must be greater than or equal to 0. In the Java SDK, the parameter is topK. When you make an HTTP call, set top_k in the parameters object. | |
stream Specifies whether to return the response in streaming output mode. Valid values:
Note Currently, only the qwen-mt-flash and qwen-mt-lite models support returning data incrementally. Each returned data chunk contains only the newly generated content. The qwen-mt-plus and qwen-mt-turbo models return data non-incrementally. Each returned data chunk contains the entire sequence generated so far. This behavior cannot be changed. For example: I I didn I didn't I didn't laugh I didn't laugh after ... This parameter is supported only by the Python SDK. To implement streaming output with the Java SDK, call the | |
translation_options The translation parameters to configure. In the Java SDK, the parameter is |
Chat response object (same for streaming and non-streaming output) | |
status_code The status code of the request. A value of 200 indicates that the request is successful. Otherwise, the request failed. The Java SDK does not return this parameter. If the call fails, an exception is thrown. The exception message contains the content of status_code and message. | |
request_id The unique ID of the call. In the Java SDK, the returned parameter is requestId. | |
code The error code. This is empty if the call is successful. Only the Python SDK returns this parameter. | |
output The information about the call result. | |
usage The token usage information for the request. |
Error codes
If the model call fails and an error message is returned, see Error messages to resolve the issue.