Choose the right text generation model for AI agents, chatbots, and document processing.
OpenClaw, Claude Code, or Hermes
qwen3.6-plus is ideal for large codebases, offering a balance of performance and cost, full tool calling support, and a 1 million-token context window. Coding Plan users can also select glm-5 or MiniMax-M2.5. All these models are optimized for agent workflows.
Use cases
For chatbots, content generation, summarization, and document processing, we recommend qwen3.6-plus. It offers a good balance of performance and cost, a 1 million-token context window, and built-in tools. Once you've confirmed its performance meets your needs, try qwen3.6-flash to reduce costs. Its performance is close to that of the flagship model, and it supports the same context length and features. If you require the most powerful reasoning capabilities, select qwen3.6-max-preview, which has a higher cost.
Context window
1 million tokens is roughly equivalent to 750,000 English words or about 8 to 10 novels.
-
For long documents or large codebases:
qwen3.6-plus/qwen3.6-flash(1 million tokens). -
For standard tasks: A context window of 128k to 256k tokens is typically sufficient.
For context window details, visit the Models page.
China (Beijing) | Singapore | US | China (Hong Kong) | Frankfurt
Thinking mode
This mode enables step-by-step reasoning, which is ideal for scenarios like multi-step mathematical calculations, code debugging, architecture planning, or legal cross-referencing.
Use the enable_thinking parameter to enable this mode. In the Responses API, the reasoning.effort parameter lets you enable or disable thinking mode and control its depth. All Qwen3 and later models support this feature, most of which operate in a hybrid mode that can be toggled per request.
See Deep thinking.
Function calling and built-in tools
These features allow the model to perform actions, such as querying the weather, searching a database, or booking a meeting.
-
Function calling (custom tools that the model calls): Supported by all general-purpose models.
-
Built-in tools (such as web search, code interpreter, and web scraping) that require no complex configuration.
See Tool calling.
Structured output
This feature ensures that the model returns valid JSON, for example, when extracting names and addresses from text.
See Structured output.
Batch inference
Batch inference is ideal for high-volume, non-latency-sensitive scenarios because it reduces costs.
See Batch inference.
Recommended models
International
|
Model |
Context |
Thinking mode |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
|
|
256k |
|
|
|
|
|
|
|
1M |
|
|
|
|
|
|
|
1M |
|
|
|
|
|
Global
|
Model |
Context |
Thinking mode |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
|
|
256k |
|
|
|
|
|
|
|
1M |
|
|
|
|
|
|
|
1M |
|
|
|
|
|
US
|
Model |
Context |
Thinking mode |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
|
|
1M |
|
|
|
|
|
|
|
1M |
|
|
|
|
|
Chinese mainland
|
Model |
Context |
Thinking mode |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
|
|
|
256k |
|
|
|
|
|
|
|
|
1M |
|
|
|
|
|
|
|
|
1M |
|
|
|
|
|
|
|
|
256k |
|
|
|
|
|
|
|
|
1M |
|
|
|
|
|
|
|
|
1M |
|
|
|
|
|
|
|
|
198k |
|
|
|
|
|
|
|
|
192k |
|
|
|
|
|
China (Hong Kong) and EU
|
Model |
Context |
Thinking mode |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
|
|
1M |
|
|
|
|
|
|
|
1M |
|
|
|
|
|
All models
Qwen3.6
|
Model ID |
Context |
Max output |
Thinking budget |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
Coding Plan |
|
|
256k |
64k |
128k |
|
|
|
|
|
|
|
1M |
64k |
80k |
|
|
|
|
(Pro only) |
|
|
1M |
64k |
80k |
|
|
|
|
|
|
|
1M |
64k |
128k |
|
|
|
|
|
|
|
1M |
64k |
128k |
|
|
|
|
|
Qwen3.5
Global and Chinese mainland
|
Model ID |
Context |
Max output |
Thinking budget |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
Coding Plan |
|
|
1M |
64k |
80k |
|
|
|
|
|
|
|
1M |
64k |
80k |
|
|
|
|
|
|
|
1M |
64k |
80k |
|
|
|
|
|
|
|
1M |
64k |
80k |
|
|
|
|
|
|
|
256k |
64k |
80k |
|
|
|
|
|
|
|
256k |
64k |
80k |
|
|
|
|
|
|
|
256k |
64k |
80k |
|
|
|
|
|
|
|
256k |
64k |
80k |
|
|
|
|
|
Hong Kong (China) and EU
|
Model ID |
Context |
Max output |
Thinking budget |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
Coding Plan |
|
|
1M |
64k |
80k |
|
|
|
|
|
|
|
1M |
64k |
80k |
|
|
|
|
|
Third-party models
|
Model ID |
Context |
Max output |
Thinking budget |
Function Calling |
Built-in tools |
Structured output |
Batch calling |
Coding Plan |
|
|
1M |
384k (shared) |
|
|
|
|
|
|
|
|
1M |
384k (shared) |
|
|
|
|
|
|
|
|
198k |
128k |
128k |
|
|
|
|
|
|
|
256k |
96k |
80k |
|
|
|
|
|
|
|
192k |
32k (includes chain-of-thought) |
|
|
|
|
|
|
Legacy and other models
For new projects, use the Qwen3.6 or Qwen3.5 series. The following models are legacy and no longer recommended. Visit the Models page to view detailed model parameters, such as context window and billing.
China (Beijing) | Singapore | United States | China (Hong Kong) | Germany (Frankfurt)