AI Search Open Platform enables API calls to large model services, including the RAG-specific large model built on Alibaba's proprietary model foundation. This service is ideal for RAG scenarios, enhancing answer accuracy and reducing hallucination rates when used alongside document processing and retrieval services.
Service | Service ID (service_id) | Service description | QPS limit for API calls (For Alibaba Cloud account and RAM users) |
Qwen3-235B-A22B | qwen3-235b-a22b | This model is a new-generation Qwen series large language model (LLM) that is extensively trained. Qwen3 has made significant breakthroughs in inference, instruction following, agent capability, and multi-language support, can support more than 100 languages and dialects, and has powerful multi-language understanding, inference, and generation capabilities. | 3 Note To apply for higher QPS, submit a ticket. |
QwQ deepseek model | qwq-32b | This model is a QwQ inference model trained based on the Qwen2.5-32B model and greatly improves the model inference capability through reinforcement learning. The performance of this model in math and coding (AIME 24/25 and LiveCodeBench) and some of its general performance indicators, such as IFEval and LiveBench, have reached the level of the DeepSeek-R1 full version. | |
OpenSearch-Qwen-Turbo | ops-qwen-turbo | Leverages the qwen-turbo large-scale language model for supervised fine-tuning, enhancing retrieval capabilities and reducing harmfulness. | |
Qwen-Turbo | qwen-turbo | A Qwen model that features fast speed and low cost and is suitable for simple tasks. | |
Qwen-Plus | qwen-plus | A model whose inference performance, cost, and speed are positioned between Qwen-Max and Qwen-Turbo and is suitable for moderately complex tasks. | |
Qwen-Max | qwen-max | A Qwen model that features best performance among Qwen models and is suitable for complex and multi-step tasks. | |
DeepSeek-R1 | deepseek-r1 | An LLM that focuses on complex inference tasks, performs well in understanding complex instructions and ensuring result accuracy, and supports the web search feature. | |
DeepSeek-V3 | deepseek-v3 | A mixture of experts (MoE) model that excels in long text, coding, mathematics, encyclopedic knowledge, and Chinese language proficiency. | |
DeepSeek-R1-distill-qwen-7b | deepseek-r1-distill-qwen-7b | This model is obtained by fine-tuning Qwen-7B based on the training sample that is generated by DeepSeek-R1 based on the knowledge distillation technology. | |
DeepSeek-R1-distill-qwen-14b | deepseek-r1-distill-qwen-14b | This model is obtained by fine-tuning Qwen-14B based on the training sample that is generated by DeepSeek-R1 based on the knowledge distillation technology. |
Prerequisites
The authentication information is obtained.
When you call an AI Search Open Platform service by using an API, you need to authenticate the caller's identity.
The service access address is obtained.
You can call a service over the Internet or a virtual private cloud (VPC). For more information, see Get service registration address.
Request description
Common description
The request body cannot exceed 8 MB in size.
HTTP request method
POST
URL
{host}/v3/openapi/workspaces/{workspace_name}/text-generation/{service_id}Parameter description:
host: the address for calling the service. You can call the service over the Internet or a virtual private cloud (VPC). For more information, see Query service endpoint.
workspace_name: the name of the workspace, such as default.
service_id: the ID of the system's built-in service, such as ops-qwen-turbo.
Request parameters
Header parameters
API key authentication
Parameter | Type | Required | Description | Example |
Content-Type | String | Yes | The request type. Valid values: application and json. | application/json |
Authorization | String | Yes | The API key for authentication. | Bearer OS-d1**2a |
Body parameters
Parameter | Type | Required | Description | Example |
messages | List | Yes | The conversation history between the user and the model. Each list element is a JSON object with 'role' and 'content' keys. The 'role' can be 'system', 'user', or 'assistant'.
| |
stream | Boolean | No | Indicates whether to return results in streaming mode. By default, this is set to false. When this parameter is set to true, each output is the entire sequence generated up to that point, with the last output being the final complete result. | false |
enable_search | Boolean | No | Indicates whether to enable web search. Default value: false. If you set this parameter to true, the large model uses the built-in prompt to determine whether to enable web search. Note Only deepseek-r1 is supported. | false |
csi_level | String | No | The content moderation filtering level. Default value: strict. Valid values:
| strict |
parameters | Map | No | A set of adjustable parameters for the large model request. | |
parameters.search_return_result | Boolean | No | This parameter takes effect only when you set enable_search to true. Valid values:
| false |
parameters.search_top_k | Integer | No | The number of the outputs returned by web search. Note This parameter takes effect only when you set enable_search to true. This parameter supports only the deepseek-r1 model. | 5 |
parameters.search_way | String | No | The web search strategy, which is the same as the web search API.
Note This parameter takes effect only when you set enable_search to true. This parameter supports only the deepseek-r1 model. | normal |
parameters.seed | Integer | No | The random seed used during content generation. This parameter controls the randomness of the content generated by the model. Valid values: 64-bit unsigned integers. If you specify the random seed, the model tries to generate the same or similar content for the output of each model call. However, the model cannot ensure that the output is exactly the same for each model call. | "parameters":{"seed":666} |
parameters.max_tokens | Integer | No | The maximum number of tokens that can be generated by the model. If you use the qwen-turbo model, the maximum value and default value are 1500. If you use the qwen-max and qwen-plus model, the maximum value and default value are 2000. | "parameters":{"max_tokens":1500} |
parameters.top_p | Float | No | The probability threshold in the nucleus sampling method used during the generation process. For example, if this parameter is set to 0.8, only the smallest subset of the most probable tokens that sum to a cumulative probability of at least 0.8 is kept as the candidate set. Valid values: (0,1.0). A larger value indicates the higher randomness of generated content. A smaller value indicates the lower randomness of generated content. | "parameters":{"top_p":0.7} |
parameters.top_k | Integer | No | The size of the candidate set from which tokens are sampled during the generation process. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A larger value indicates the higher randomness of generated content. A smaller value indicates the higher accuracy of generated content. If this parameter is left empty or set to a value greater than 100, the top_k policy is disabled. In this case, only the top_p policy takes effect. | "parameters":{"top_k":50} |
parameters.repetition_penalty | Float | No | The repetition level of the content generated by the model. A larger value indicates lower repetition. The value 1.0 indicates no penalty. No valid values are specified for this parameter. We recommend that you set this parameter to a value greater than 0. | "parameters":{"repetition_penalty":1.0} |
parameters.presence_penalty | Float | No | The repetition of words in generated content. A larger value indicates lower repetition. Valid values: [-2.0, 2.0]. | "parameters":{"presence_penalty":1.0} |
parameters.temperature | Float | No | The level of randomness and diversity of the content generated by the model. To be specific, the value of this parameter determines the smoothness of the probability distribution of each candidate word for text generation. A larger value indicates a smaller peak value of the probability distribution. In this case, more low-probability words are selected and the generated content is more diversified. A smaller value indicates a larger peak value of the probability distribution. In this case, more high-probability words are selected and the generated content is more accurate. Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless. | "parameters":{"temperature":0.85} |
parameters.stop | string/array | No | The precision of the content generated by the model. The model automatically stops generating content when the content generated by the model is about to contain the specified string or token ID. The value of this parameter can be a string or an array.
| "parameters":{"stop":["Hello","Weather"]} |
Note: The maximum token limit for ops-qwen-turbo is 4000.
Response parameters
Parameter | Type | Description | Example value |
result.text | String | The text generated by the model during the current interaction. | Zhengzhou is a... |
result.search_results | List<SearchResult> | When you enable web search and set search_return_source to true, the web search results are returned. | [] |
result.search_results[].title | String | The title of the search result. | Today's weather in Zhengzhou |
result.search_results[].url | String | The search result link. | https://xxxx.com |
result.search_results[].snippet | String | The summary of the content from the search result web pages. | It is sunny in Zhengzhou. |
usage.output_tokens | Integer | The number of tokens in the content generated by the model. | 100 |
usage.input_tokens | Integer | The number of tokens in the user's input content. | 100 |
usage.total_tokens | Integer | The combined token count of the user's input and the model's generated content. | 200 |
cURL request example
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Your API key" \
"http://xxxx-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-generation/qwen-max" \
-d '{
"messages":[
{
"role":"system",
"content":"You are an AI assistant."
},
{
"role":"user",
"content":"What is the capital of Henan Province?"
},
{
"role":"assistant",
"content":"Zhengzhou"
},
{
"role":"user",
"content":"What is the weather like in Zhengzhou?"
}
],
"parameters":{
"search_return_result":true,
"search_top_k":5, //This parameter supports only the deepseek-r1 model.
"search_way":"normal" // This parameter supports only the deepseek-r1 model.
},
"stream":false,
"enable_search":true //Enable the web search feature.
}'Response example
Sample success example
{
"request_id": "450fcb80-f796-****-8d69-e1e86d29aa9f",
"latency": 564.903929,
"result": {
"text":"According to the latest weather forecast, Zhengzhou will be cloudy during the day, with the temperature ranging from approximately 9°C to 19°C and a northeast wind at about level 2...."
"search_results":[
{
"url":"https://xxxxx.com",
"title":"xxxx",
"snippet":" It is sunny in Zhengzhou."
}
]
},
"usage": {
"output_tokens": 934,
"input_tokens": 798,
"total_tokens": 1732
}
}Sample error example
In the event of an error during the request, the output will provide the error reason through a code and message.
{
"request_id": "45C8C9E5-6BCB-****-80D3-E298F788512B",
"latency": 0,
"code": "InvalidParameter",
"message": "JSON parse error: Unexpected character ..."
}Status code description
For more information, see Status codes of AI Search Open Platform.