This topic describes how to call DeepSeek models on Alibaba Cloud Model Studio using an OpenAI compatible API or the DashScope SDK.
This document applies only to the China (Beijing) region. To use these models, you must use an API key from the China (Beijing) region.
Model availability
deepseek-v3.2-exp and deepseek-v3.1 (Use a parameter to control whether the model thinks before it replies)
deepseek-v3.2-exp and deepseek-v3.1 are hybrid thinking models. The thinking mode is disabled by default. In thinking mode, the response quality of deepseek-v3.1 is on par with deepseek-r1-0528. deepseek-v3.2-exp uses a sparse attention mechanism to improve training and inference efficiency for long text. It is less expensive than deepseek-v3.1.
Use the
enable_thinkingparameter to control the thinking mode.deepseek-r1 (Always thinks before replying)
deepseek-r1-0528, released in May 2025, is an upgraded version of the deepseek-r1 model that was released in January 2025. The new version shows significant improvement in complex reasoning tasks. It has an increased depth of thought during inference, which results in a longer response time.
The deepseek-r1 model on Model Studio has been upgraded to version 0528.
The deepseek-r1-distill series of models are created by fine-tuning open-source large language models, such as Qwen and Llama, with training samples generated by deepseek-r1 through knowledge distillation.
deepseek-v3 (Does not think before replying)
The deepseek-v3 model was pre-trained on 14.8 T tokens and excels in long text processing, code, math, encyclopedic knowledge, and Chinese language capabilities.
This is the version released on December 26, 2024, not the version released on March 24, 2025.
In thinking mode, the model thinks before it replies. The thinking steps are displayed in the reasoning_content field. Compared to the non-thinking mode, the response time is longer, but the response quality is better.
We recommend selecting the deepseek-v3.2-exp model. It is the latest model from DeepSeek, features an optional thinking mode, has less restrictive rate limits, and is priced lower than deepseek-v3.1.
Model | Context window | Max input | Max chain-of-thought | Max response |
(Tokens) | ||||
deepseek-v3.2-exp 685B full version | 131,072 | 98,304 | 32,768 | 65,536 |
deepseek-v3.1 685B full version | ||||
deepseek-r1 685B full version | 16,384 | |||
deepseek-r1-0528 685B full version | ||||
deepseek-v3 671B full version | 131,072 | - | ||
Max chain-of-thought is the maximum number of tokens for the thinking process in thinking mode.
The models listed above are not integrated third-party services. They are all deployed on Model Studio servers.
For information about concurrent request limits, see DeepSeek rate limits.
Getting started
deepseek-v3.2-exp is the latest model in the DeepSeek series. Use the enable_thinking parameter to switch between thinking and non-thinking modes. The following code shows how to quickly call the deepseek-v3.2-exp model in thinking mode.
Before you begin, create an API key and export the API key as an environment variable. If you call the model using an SDK, install the OpenAI or DashScope SDK.
OpenAI compatible
The enable_thinking parameter is not a standard OpenAI parameter. In the OpenAI Python SDK, you must pass this parameter in extra_body. In the Node.js SDK, you must pass it as a top-level parameter.
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If the environment variable is not configured, replace the following with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
# This example uses deepseek-v3.2-exp. You can replace it with deepseek-v3.1, deepseek-v3, or deepseek-r1 as needed.
model="deepseek-v3.2-exp",
messages=messages,
# Set enable_thinking in extra_body to enable thinking mode. This parameter is valid only for deepseek-v3.2-exp and deepseek-v3.1. Setting it for deepseek-v3 or deepseek-r1 does not cause an error.
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Full thinking process
answer_content = "" # Full response
is_answering = False # Indicates whether the response phase has started
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the thinking content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# Start replying when content is received
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.contentResponse
====================Thinking process====================
Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================
I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. How can I help you?
====================Token usage====================
CompletionUsage(completion_tokens=140, prompt_tokens=4, total_tokens=144, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=79, rejected_prediction_tokens=None), prompt_tokens_details=None)Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
// If the environment variable is not configured, replace the following with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = ''; // Full thinking process
let answerContent = ''; // Full response
let isAnswering = false; // Indicates whether the response phase has started
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
// This example uses deepseek-v3.2-exp. You can replace it with deepseek-v3.1, deepseek-v3, or deepseek-r1 as needed.
model: 'deepseek-v3.2-exp',
messages,
// Note: In the Node.js SDK, non-standard parameters such as enable_thinking are passed as top-level properties and do not need to be placed in extra_body.
// This parameter is valid only for deepseek-v3.2-exp and deepseek-v3.1. Setting it for deepseek-v3 or deepseek-r1 does not cause an error.
enable_thinking: true,
stream: true,
stream_options: {
include_usage: true
},
});
console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\n' + '='.repeat(20) + 'Token usage' + '='.repeat(20) + '\n');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the thinking content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// Start replying when content is received
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();Response
====================Thinking process====================
Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================
I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. ✨ How can I help you?
====================Token usage====================
CompletionUsage(completion_tokens=140, prompt_tokens=4, total_tokens=144, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=79, rejected_prediction_tokens=None), prompt_tokens_details=None)HTTP
Sample code
curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v3.2-exp",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'DashScope
Python
Sample code
import os
from dashscope import Generation
# Initialize the request parameters
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If the environment variable is not configured, replace the following with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# This example uses deepseek-v3.2-exp. You can replace it with deepseek-v3.1, deepseek-v3, or deepseek-r1 as needed.
model="deepseek-v3.2-exp",
messages=messages,
result_format="message", # Set the result format to message
enable_thinking=True, # Enable thinking mode. This parameter is valid only for deepseek-v3.2-exp and deepseek-v3.1. Setting it for deepseek-v3 or deepseek-r1 does not cause an error.
stream=True, # Enable streaming output
incremental_output=True, # Enable incremental output
)
reasoning_content = "" # Full thinking process
answer_content = "" # Full response
is_answering = False # Indicates whether the response phase has started
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")
for chunk in completion:
message = chunk.output.choices[0].message
# Collect only the thinking content
if "reasoning_content" in message:
if not is_answering:
print(message.reasoning_content, end="", flush=True)
reasoning_content += message.reasoning_content
# Start replying when content is received
if message.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(message.content, end="", flush=True)
answer_content += message.content
print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
print(chunk.usage)Response
====================Thinking process====================
Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================
I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. How can I help you?
====================Token usage====================
{"input_tokens": 5, "output_tokens": 167, "total_tokens": 172, "output_tokens_details": {"reasoning_tokens": 113}}Java
Sample code
The DashScope Java SDK must be version 2.19.4 or later.
// The DashScope SDK version must be 2.19.4 or later.
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
public class Main {
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (reasoning != null && !reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (content != null && !content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If the environment variable is not configured, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// This example uses deepseek-v3.2-exp. You can replace it with deepseek-v3.1, deepseek-v3, or deepseek-r1 as needed.
.model("deepseek-v3.2-exp")
// Enable thinking mode. This parameter is valid only for deepseek-v3.2-exp and deepseek-v3.1. Setting it for deepseek-v3 or deepseek-r1 does not cause an error.
.enableThinking(true)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.err.println("An exception occurred: " + e.getMessage());
}
}
}Response
====================Thinking process====================
Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================
I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. How can I help you?HTTP
Sample code
curl
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "deepseek-v3.2-exp",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'Other features
Model | |||||
deepseek-v3.2-exp | Supported only in non-thinking mode. | ||||
deepseek-v3.1 | Supported only in non-thinking mode. | ||||
deepseek-r1 | |||||
deepseek-r1-0528 | |||||
deepseek-v3 | |||||
Distilled models |
Default parameter values
Model | temperature | top_p | repetition_penalty | presence_penalty |
deepseek-v3.2-exp | 0.6 | 0.95 | 1.0 | - |
deepseek-v3.1 | 0.6 | 0.95 | 1.0 | - |
deepseek-r1 | 0.6 | 0.95 | - | 1 |
deepseek-r1-0528 | 0.6 | 0.95 | - | 1 |
Distilled version | 0.6 | 0.95 | - | 1 |
deepseek-v3 | 0.7 | 0.6 | - | - |
A hyphen (-) indicates that the parameter has no default value and cannot be configured.
The deepseek-r1, deepseek-r1-0528, and distilled models do not support setting these parameters.
Billing
Billing is based on the number of input and output tokens. For pricing details, see Models and pricing.
In thinking mode, the chain-of-thought is billed as output tokens.
FAQ
Can I upload images or documents to ask questions?
DeepSeek models support only text input. They do not support image or document input. The Qwen-VL model supports image input, and the Qwen-Long model supports document input.
How do I view token usage and the number of calls?
One hour after a model is called, you can go to the Model Observation page. On this page, set the query conditions, such as the time range and workspace, find the target model in the Models area, and then click Monitor in the Actions column to view the call statistics for the model. For more information, see the Usage and performance monitoring document.
Data is updated hourly. During peak hours, updates may be delayed by up to one hour.

Error codes
If an error occurs, see Error messages for solutions.