DeepSeek R1, DeepSeek V3, DeepSeek V3.1 API - Alibaba Cloud Model Studio

This topic describes how to call DeepSeek models on Alibaba Cloud Model Studio using an OpenAI compatible API or the DashScope SDK.

Important

This document applies only to the China (Beijing) region. To use these models, you must use an API key from the China (Beijing) region.

Model availability

deepseek-v3.2, deepseek-v3.2-exp, and deepseek-v3.1 (A parameter controls whether the model thinks before it replies)
Hybrid thinking models, with the thinking mode disabled by default. deepseek-v3.2 is the first model from DeepSeek that integrates thinking with tool use. It supports tool calling in both thinking and non-thinking modes.
Use the enable_thinking parameter to control the thinking mode.
deepseek-r1 (Always thinks before replying)
- deepseek-r1-0528, released in May 2025, is an upgraded version of deepseek-r1 released in January 2025. The new version shows significant improvement in complex reasoning tasks. It has an increased depth of thought during inference, which results in a longer response time.
  deepseek-r1 on Model Studio has been upgraded to version 0528.
- The deepseek-r1-distill models are fine-tuned from open-source large language models, such as Qwen and Llama, with training samples generated by deepseek-r1 through knowledge distillation.
deepseek-v3 (Does not think before replying)
Pre-trained on 14.8 T tokens, the deepseek-v3 model excels in long text processing, code, math, encyclopedic knowledge, and Chinese.
This is the version released on December 26, 2024, not the version released on March 24, 2025.

In thinking mode, the model thinks before it replies. The thinking steps are displayed in the reasoning_content field. Compared to the non-thinking mode, the response time is longer, but the response quality is better.

We recommend deepseek-v3.2, the latest model from DeepSeek. It features an optional thinking mode, has less restrictive rate limits, and is priced lower than deepseek-v3.1.

Model	Context window	Max input	Max chain-of-thought	Max response
	(Tokens)
deepseek-v3.2 685B full version	131,072	98,304	32,768	65,536
deepseek-v3.2-exp 685B full version
deepseek-v3.1 685B full version
deepseek-r1 685B full version				16,384
deepseek-r1-0528 685B full version
deepseek-v3 671B full version		131,072	-

Distilled models

Model	Context window	Max input	Max chain-of-thought	Max response
	(Tokens)
deepseek-r1-distill-qwen-1.5b Based on Qwen2.5-Math-1.5B	32,768	32,768	16,384	16,384
deepseek-r1-distill-qwen-7b Based on Qwen2.5-Math-7B
deepseek-r1-distill-qwen-14b Based on Qwen2.5-14B
deepseek-r1-distill-qwen-32b Based on Qwen2.5-32B
deepseek-r1-distill-llama-8b Based on Llama-3.1-8B
deepseek-r1-distill-llama-70b Based on Llama-3.3-70B

Max chain-of-thought is the maximum number of tokens for the thinking process in thinking mode.

The models listed above are not integrated third-party services. They are all deployed on Model Studio servers.

For information about concurrent request limits, see DeepSeek rate limits.

Getting started

deepseek-v3.2 is the latest model in the DeepSeek series. Use the enable_thinking parameter to switch between thinking and non-thinking modes. The following code shows how to call the deepseek-v3.2 model in thinking mode.

Before you begin, create an API key and export the API key as an environment variable. If you call the model using an SDK, install the OpenAI or DashScope SDK.

OpenAI compatible

Note

The enable_thinking parameter is not a standard OpenAI parameter. In the OpenAI Python SDK, you must pass this parameter in extra_body. In the Node.js SDK, you must pass it as a top-level parameter.

Python

Sample code

from openai import OpenAI
import os

# Initialize the OpenAI client
client = OpenAI(
    # If the environment variable is not configured, replace the following with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=messages,
    # Set enable_thinking in extra_body to enable thinking mode
    extra_body={"enable_thinking": True},
    stream=True,
    stream_options={
        "include_usage": True
    },
)

reasoning_content = ""  # Full thinking process
answer_content = ""  # Full response
is_answering = False  # Indicates whether the response phase has started
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    if not chunk.choices:
        print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
        print(chunk.usage)
        continue

    delta = chunk.choices[0].delta

    # Collect only the thinking content
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        if not is_answering:
            print(delta.reasoning_content, end="", flush=True)
        reasoning_content += delta.reasoning_content

    # Start replying when content is received
    if hasattr(delta, "content") and delta.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
            is_answering = True
        print(delta.content, end="", flush=True)
        answer_content += delta.content

Response

====================Thinking process====================

Ah, the user is asking who I am. This is a very common opening question. I need to introduce my identity and functions simply and clearly. I can start with my company background and core capabilities to help the user quickly understand.
I should highlight my free-to-use nature and text-based strengths, but avoid going into too much detail. Finally, I'll guide the conversation with an open-ended question, which is in line with the nature of an assistant.
I'll position myself as an enterprise-level AI assistant, which is both professional and friendly. The emoji in parentheses can add a touch of friendliness.
====================Full response====================

Hello! I am DeepSeek, an AI assistant created by DeepSeek.

I am a text-only model. Although I do not support multimodal recognition, I have a file upload feature that can help you process various files such as images, txt, pdf, ppt, word, and excel, and read text information from them to assist you. I am completely free to use, have a 128K context window, and support web search (you need to manually enable it in the Web/App).

My knowledge is current up to July 2024, and I will help you with enthusiasm and care. You can download my app from the official app store.

Is there anything I can help you with? Whether it's a question about your studies, work, or daily life, I'm happy to assist you! ✨
====================Token usage====================

CompletionUsage(completion_tokens=238, prompt_tokens=5, total_tokens=243, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=93, rejected_prediction_tokens=None), prompt_tokens_details=None)

Node.js

Sample code

import OpenAI from "openai";
import process from 'process';

// Initialize the OpenAI client
const openai = new OpenAI({
    // If the environment variable is not configured, replace the following with your Model Studio API key: apiKey: "sk-xxx"
    apiKey: process.env.DASHSCOPE_API_KEY, 
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = ''; // Full thinking process
let answerContent = ''; // Full response
let isAnswering = false; // Indicates whether the response phase has started

async function main() {
    try {
        const messages = [{ role: 'user', content: 'Who are you' }];
        
        const stream = await openai.chat.completions.create({
            model: 'deepseek-v3.2',
            messages,
            // Note: In the Node.js SDK, non-standard parameters such as enable_thinking are passed as top-level properties and do not need to be placed in extra_body.
            enable_thinking: true,
            stream: true,
            stream_options: {
                include_usage: true
            },
        });

        console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\n' + '='.repeat(20) + 'Token usage' + '='.repeat(20) + '\n');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;
            
            // Collect only the thinking content
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // Start replying when content is received
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

Response

====================Thinking process====================

Ah, the user is asking who I am. This is a very common opening question. I need to introduce my identity and core functions simply and clearly, without going into too much detail.

I can start with my company background and basic positioning, then list a few key capabilities to let the user quickly understand what I can do. I'll end with an open-ended question to make it easy for the user to continue.

I should highlight practical features like being free, having a long context, and file processing. I'll maintain a friendly but restrained tone, without using emojis.
====================Full response====================

Hello! I am DeepSeek, an AI assistant created by DeepSeek.

I am a text-only model with a 128K context window, and I can help you answer questions, engage in conversations, and assist with text-based tasks. Although I do not support multimodal recognition, I can process files you upload, such as images, txt, pdf, ppt, word, and excel, and read text information from them to help you.

I am completely free to use and have no voice function, but you can download my app from the official app store. To use web search, remember to manually enable it in the Web or App.

My knowledge is current up to July 2024, and I will help you with enthusiasm and care. If you have any questions or need assistance, just let me know! I'm happy to help. ✨
====================Token usage====================

{
  prompt_tokens: 5,
  completion_tokens: 243,
  total_tokens: 248,
  completion_tokens_details: { reasoning_tokens: 83 }
}

HTTP

Sample code

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "deepseek-v3.2",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you"
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "enable_thinking": true
}'

DashScope

Python

Sample code

import os
from dashscope import Generation

# Initialize the request parameters
messages = [{"role": "user", "content": "Who are you?"}]

completion = Generation.call(
    # If the environment variable is not configured, replace the following with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="deepseek-v3.2",
    messages=messages,
    result_format="message",  # Set the result format to message
    enable_thinking=True,
    stream=True,              # Enable streaming output
    incremental_output=True,  # Enable incremental output
)

reasoning_content = ""  # Full thinking process
answer_content = ""     # Full response
is_answering = False    # Indicates whether the response phase has started

print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    message = chunk.output.choices[0].message
    # Collect only the thinking content
    if "reasoning_content" in message:
        if not is_answering:
            print(message.reasoning_content, end="", flush=True)
        reasoning_content += message.reasoning_content

    # Start replying when content is received
    if message.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
            is_answering = True
        print(message.content, end="", flush=True)
        answer_content += message.content

print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
print(chunk.usage)

Response

====================Thinking process====================

Oh, the user is asking who I am. This is a very basic self-introduction question. I need to state my identity and functions concisely and clearly, avoiding complexity. I can start with my company background and core capabilities to help the user quickly understand.
Considering the user might be new, I can add some typical use cases and features, such as being free, having a long context, and file processing. I'll end with an open-ended invitation for help, maintaining a friendly attitude.
No need for too many technical details, the focus should be on ease of use and practicality.
====================Full response====================

Hello! I am DeepSeek, an AI assistant created by DeepSeek.

I am a text-only model. Although I do not support multimodal recognition, I have a file upload feature that can help you process files like images, txt, pdf, ppt, word, and excel by reading the text information for analysis. I am completely free to use, have a 128K context window, and support web search (you need to manually enable it).

My knowledge is current up to July 2024, and I will help you with enthusiasm and care. You can download my app from the official app store.

If you have any questions or need help, just ask! I'm happy to answer your questions and assist with various tasks. ✨
====================Token usage====================

{"input_tokens": 6, "output_tokens": 240, "total_tokens": 246, "output_tokens_details": {"reasoning_tokens": 92}}

Java

Sample code

Important

The DashScope Java SDK must be version 2.19.4 or later.

// The DashScope SDK version must be 2.19.4 or later.
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;

public class Main {
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;
    private static void handleGenerationResult(GenerationResult message) {
        String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();
        if (reasoning != null && !reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }
        if (content != null && !content.isEmpty()) {
            finalContent.append(content);
            if (!isFirstPrint) {
                System.out.println("\n====================Full response====================");
                isFirstPrint = true;
            }
            System.out.print(content);
        }
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // If the environment variable is not configured, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("deepseek-v3.2")
                .enableThinking(true)
                .incrementalOutput(true)
                .resultFormat("message")
                .messages(Arrays.asList(userMsg))
                .build();
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
    }
    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.err.println("An exception occurred: " + e.getMessage());
        }
    }
}

Response

====================Thinking process====================

Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================

I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. How can I help you?

HTTP

Sample code

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "deepseek-v3.2",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true,
        "result_format": "message"
    }
}'

Other features

Model	Multi-turn conversation	Function calling	Structured output	Partial mode	Context cache
deepseek-v3.2	Supported	Supported	Not supported	Not supported	Not supported
deepseek-v3.2-exp	Supported	Supported Supported only in non-thinking mode.	Not supported	Not supported	Not supported
deepseek-v3.1	Supported	Supported Supported only in non-thinking mode.	Not supported	Not supported	Not supported
deepseek-r1	Supported	Supported	Not supported	Not supported	Not supported
deepseek-r1-0528	Supported	Supported	Not supported	Not supported	Not supported
deepseek-v3	Supported	Supported	Not supported	Not supported	Not supported
Distilled model	Supported	Not supported	Not supported	Not supported	Not supported

Default parameter values

Model	temperature	top_p	repetition_penalty	presence_penalty
deepseek-v3.2	1.0	0.95	-	-
deepseek-v3.2-exp	0.6	0.95	1.0	-
deepseek-v3.1	0.6	0.95	1.0	-
deepseek-r1	0.6	0.95	-	1
deepseek-r1-0528	0.6	0.95	-	1
Distilled version	0.6	0.95	-	1
deepseek-v3	0.7	0.6	-	-

A hyphen (-) indicates that the parameter has no default value and cannot be set.
The deepseek-r1, deepseek-r1-0528, and distilled models do not support setting these parameters.

Billing

Billing is based on the number of input and output tokens. For pricing details, see Model list and pricing.

In thinking mode, the chain-of-thought is billed as output tokens.

FAQ

Can I upload images or documents to ask questions?

DeepSeek models support only text input. They do not support image or document input. Qwen-VL supports image input, and Qwen-Long supports document input.

How do I view token usage and the number of calls?

One hour after you call a model, go to the Model Observation page. Set query conditions, such as the time range and workspace. Find the target model in the Models area and click Monitor in the Actions column to view its call statistics. For more information, see Usage and performance monitoring.

Data is updated hourly. During peak hours, updates may be delayed by up to one hour.

Error codes

If an error occurs, see Error messages for solutions.