All Products
Search
Document Center

Alibaba Cloud Model Studio:DeepSeek

Last Updated:Nov 11, 2025

This topic describes how to call DeepSeek models on Alibaba Cloud Model Studio using an OpenAI compatible API or the DashScope SDK.

Important

This document applies only to the China (Beijing) region. To use these models, you must use an API key from the China (Beijing) region.

Model availability

  • deepseek-v3.2-exp and deepseek-v3.1 (Use a parameter to control whether the model thinks before it replies)

    deepseek-v3.2-exp and deepseek-v3.1 are hybrid thinking models. The thinking mode is disabled by default. In thinking mode, the response quality of deepseek-v3.1 is on par with deepseek-r1-0528. deepseek-v3.2-exp uses a sparse attention mechanism to improve training and inference efficiency for long text. It is less expensive than deepseek-v3.1.

    Use the enable_thinking parameter to control the thinking mode.
  • deepseek-r1 (Always thinks before replying)

    • deepseek-r1-0528, released in May 2025, is an upgraded version of the deepseek-r1 model that was released in January 2025. The new version shows significant improvement in complex reasoning tasks. It has an increased depth of thought during inference, which results in a longer response time.

      The deepseek-r1 model on Model Studio has been upgraded to version 0528.
    • The deepseek-r1-distill series of models are created by fine-tuning open-source large language models, such as Qwen and Llama, with training samples generated by deepseek-r1 through knowledge distillation.

  • deepseek-v3 (Does not think before replying)

    The deepseek-v3 model was pre-trained on 14.8 T tokens and excels in long text processing, code, math, encyclopedic knowledge, and Chinese language capabilities.

    This is the version released on December 26, 2024, not the version released on March 24, 2025.

In thinking mode, the model thinks before it replies. The thinking steps are displayed in the reasoning_content field. Compared to the non-thinking mode, the response time is longer, but the response quality is better.

We recommend selecting the deepseek-v3.2-exp model. It is the latest model from DeepSeek, features an optional thinking mode, has less restrictive rate limits, and is priced lower than deepseek-v3.1.

Model

Context window

Max input

Max chain-of-thought

Max response

(Tokens)

deepseek-v3.2-exp

685B full version

131,072

98,304

32,768

65,536

deepseek-v3.1

685B full version

deepseek-r1

685B full version

16,384

deepseek-r1-0528

685B full version

deepseek-v3

671B full version

131,072

-

Distilled models

Model

Context window

Max input

Max chain-of-thought

Max response

(Tokens)

deepseek-r1-distill-qwen-1.5b

Based on Qwen2.5-Math-1.5B

32,768

32,768

16,384

16,384

deepseek-r1-distill-qwen-7b

Based on Qwen2.5-Math-7B

deepseek-r1-distill-qwen-14b

Based on Qwen2.5-14B

deepseek-r1-distill-qwen-32b

Based on Qwen2.5-32B

deepseek-r1-distill-llama-8b

Based on Llama-3.1-8B

deepseek-r1-distill-llama-70b

Based on Llama-3.3-70B
Max chain-of-thought is the maximum number of tokens for the thinking process in thinking mode.
The models listed above are not integrated third-party services. They are all deployed on Model Studio servers.
For information about concurrent request limits, see DeepSeek rate limits.

Getting started

deepseek-v3.2-exp is the latest model in the DeepSeek series. Use the enable_thinking parameter to switch between thinking and non-thinking modes. The following code shows how to quickly call the deepseek-v3.2-exp model in thinking mode.

Before you begin, create an API key and export the API key as an environment variable. If you call the model using an SDK, install the OpenAI or DashScope SDK.

OpenAI compatible

Note

The enable_thinking parameter is not a standard OpenAI parameter. In the OpenAI Python SDK, you must pass this parameter in extra_body. In the Node.js SDK, you must pass it as a top-level parameter.

Python

Sample code

from openai import OpenAI
import os

# Initialize the OpenAI client
client = OpenAI(
    # If the environment variable is not configured, replace the following with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
    # This example uses deepseek-v3.2-exp. You can replace it with deepseek-v3.1, deepseek-v3, or deepseek-r1 as needed.
    model="deepseek-v3.2-exp",
    messages=messages,
    # Set enable_thinking in extra_body to enable thinking mode. This parameter is valid only for deepseek-v3.2-exp and deepseek-v3.1. Setting it for deepseek-v3 or deepseek-r1 does not cause an error.
    extra_body={"enable_thinking": True},
    stream=True,
    stream_options={
        "include_usage": True
    },
)

reasoning_content = ""  # Full thinking process
answer_content = ""  # Full response
is_answering = False  # Indicates whether the response phase has started
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    if not chunk.choices:
        print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
        print(chunk.usage)
        continue

    delta = chunk.choices[0].delta

    # Collect only the thinking content
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        if not is_answering:
            print(delta.reasoning_content, end="", flush=True)
        reasoning_content += delta.reasoning_content

    # Start replying when content is received
    if hasattr(delta, "content") and delta.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
            is_answering = True
        print(delta.content, end="", flush=True)
        answer_content += delta.content

Response

====================Thinking process====================

Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================

I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. How can I help you?
====================Token usage====================

CompletionUsage(completion_tokens=140, prompt_tokens=4, total_tokens=144, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=79, rejected_prediction_tokens=None), prompt_tokens_details=None)

Node.js

Sample code

import OpenAI from "openai";
import process from 'process';

// Initialize the OpenAI client
const openai = new OpenAI({
    // If the environment variable is not configured, replace the following with your Model Studio API key: apiKey: "sk-xxx"
    apiKey: process.env.DASHSCOPE_API_KEY, 
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = ''; // Full thinking process
let answerContent = ''; // Full response
let isAnswering = false; // Indicates whether the response phase has started

async function main() {
    try {
        const messages = [{ role: 'user', content: 'Who are you' }];
        
        const stream = await openai.chat.completions.create({
            // This example uses deepseek-v3.2-exp. You can replace it with deepseek-v3.1, deepseek-v3, or deepseek-r1 as needed.
            model: 'deepseek-v3.2-exp',
            messages,
            // Note: In the Node.js SDK, non-standard parameters such as enable_thinking are passed as top-level properties and do not need to be placed in extra_body.
            // This parameter is valid only for deepseek-v3.2-exp and deepseek-v3.1. Setting it for deepseek-v3 or deepseek-r1 does not cause an error.
            enable_thinking: true,
            stream: true,
            stream_options: {
                include_usage: true
            },
        });

        console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\n' + '='.repeat(20) + 'Token usage' + '='.repeat(20) + '\n');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;
            
            // Collect only the thinking content
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // Start replying when content is received
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

Response

====================Thinking process====================

Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================

I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. ✨ How can I help you?
====================Token usage====================

CompletionUsage(completion_tokens=140, prompt_tokens=4, total_tokens=144, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=79, rejected_prediction_tokens=None), prompt_tokens_details=None)

HTTP

Sample code

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "deepseek-v3.2-exp",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you"
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "enable_thinking": true
}'

DashScope

Python

Sample code

import os
from dashscope import Generation

# Initialize the request parameters
messages = [{"role": "user", "content": "Who are you?"}]

completion = Generation.call(
    # If the environment variable is not configured, replace the following with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # This example uses deepseek-v3.2-exp. You can replace it with deepseek-v3.1, deepseek-v3, or deepseek-r1 as needed.
    model="deepseek-v3.2-exp",
    messages=messages,
    result_format="message",  # Set the result format to message
    enable_thinking=True,     # Enable thinking mode. This parameter is valid only for deepseek-v3.2-exp and deepseek-v3.1. Setting it for deepseek-v3 or deepseek-r1 does not cause an error.
    stream=True,              # Enable streaming output
    incremental_output=True,  # Enable incremental output
)

reasoning_content = ""  # Full thinking process
answer_content = ""     # Full response
is_answering = False    # Indicates whether the response phase has started

print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    message = chunk.output.choices[0].message
    # Collect only the thinking content
    if "reasoning_content" in message:
        if not is_answering:
            print(message.reasoning_content, end="", flush=True)
        reasoning_content += message.reasoning_content

    # Start replying when content is received
    if message.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
            is_answering = True
        print(message.content, end="", flush=True)
        answer_content += message.content

print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
print(chunk.usage)

Response

====================Thinking process====================

Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================

I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. How can I help you?
====================Token usage====================

{"input_tokens": 5, "output_tokens": 167, "total_tokens": 172, "output_tokens_details": {"reasoning_tokens": 113}}

Java

Sample code

Important

The DashScope Java SDK must be version 2.19.4 or later.

// The DashScope SDK version must be 2.19.4 or later.
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;

public class Main {
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;
    private static void handleGenerationResult(GenerationResult message) {
        String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();
        if (reasoning != null && !reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }
        if (content != null && !content.isEmpty()) {
            finalContent.append(content);
            if (!isFirstPrint) {
                System.out.println("\n====================Full response====================");
                isFirstPrint = true;
            }
            System.out.print(content);
        }
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // If the environment variable is not configured, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // This example uses deepseek-v3.2-exp. You can replace it with deepseek-v3.1, deepseek-v3, or deepseek-r1 as needed.
                .model("deepseek-v3.2-exp")
                // Enable thinking mode. This parameter is valid only for deepseek-v3.2-exp and deepseek-v3.1. Setting it for deepseek-v3 or deepseek-r1 does not cause an error.
                .enableThinking(true)
                .incrementalOutput(true)
                .resultFormat("message")
                .messages(Arrays.asList(userMsg))
                .build();
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
    }
    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.err.println("An exception occurred: " + e.getMessage());
        }
    }
}

Response

====================Thinking process====================

Hmm, the user is asking a simple self-introduction question. This is a common query, so I need to state my identity and function clearly and quickly. I'll use a relaxed and friendly tone to introduce myself as DeepSeek-V3, created by DeepSeek. I can also mention the types of help I can provide, such as answering questions, chatting, and tutoring. Finally, I'll add an emoji to be more approachable. I should keep it concise and clear.
====================Full response====================

I am DeepSeek-V3, an intelligent assistant created by DeepSeek! I can help you answer various questions, provide suggestions, look up information, and even chat with you! Feel free to ask me anything about your studies, work, or daily life. How can I help you?

HTTP

Sample code

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "deepseek-v3.2-exp",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true,
        "result_format": "message"
    }
}'

Other features

Model

Multi-turn conversation

Function calling

Structured output

Prefix completion

Context cache

deepseek-v3.2-exp

Supported

Supported

Supported only in non-thinking mode.

Not supported

Not supported

Not supported

deepseek-v3.1

Supported

Supported

Supported only in non-thinking mode.

Not supported

Not supported

Not supported

deepseek-r1

Supported

Supported

Not supported

Not supported

Not supported

deepseek-r1-0528

Supported

Supported

Not supported

Not supported

Not supported

deepseek-v3

Supported

Supported

Not supported

Not supported

Not supported

Distilled models

Supported

Not supported

Not supported

Not supported

Not supported

Default parameter values

Model

temperature

top_p

repetition_penalty

presence_penalty

deepseek-v3.2-exp

0.6

0.95

1.0

-

deepseek-v3.1

0.6

0.95

1.0

-

deepseek-r1

0.6

0.95

-

1

deepseek-r1-0528

0.6

0.95

-

1

Distilled version

0.6

0.95

-

1

deepseek-v3

0.7

0.6

-

-

  • A hyphen (-) indicates that the parameter has no default value and cannot be configured.

  • The deepseek-r1, deepseek-r1-0528, and distilled models do not support setting these parameters.

Billing

Billing is based on the number of input and output tokens. For pricing details, see Models and pricing.

In thinking mode, the chain-of-thought is billed as output tokens.

FAQ

Can I upload images or documents to ask questions?

DeepSeek models support only text input. They do not support image or document input. The Qwen-VL model supports image input, and the Qwen-Long model supports document input.

How do I view token usage and the number of calls?

One hour after a model is called, you can go to the Model Observation page. On this page, set the query conditions, such as the time range and workspace, find the target model in the Models area, and then click Monitor in the Actions column to view the call statistics for the model. For more information, see the Usage and performance monitoring document.

Data is updated hourly. During peak hours, updates may be delayed by up to one hour.

image

Error codes

If an error occurs, see Error messages for solutions.