All Products
Search
Document Center

Alibaba Cloud Model Studio:Stream

Last Updated:Apr 10, 2025

In streaming output mode, the model generates and returns intermediate results in real-time instead of one final response. This reduces the wait time and request timeout risks.

Overview

In streaming output mode, the model returns real-time intermediate results. You can read as the model outputs, thereby shortening the wait for the model's response. This is particularly effective in reducing request timeout risks when dealing with lengthy outputs.

Error message for request timeout: Request timed out, please try again later. or Response timeout.

You can compare the performance of streaming output and non-streaming output:

⏱️ Wait time: 3 seconds
Stream Disabled
This is for reference only and no requests are actually sent.

How to use

Prerequisites

You must first obtain an API key and set the API key as an environment variable. If you need to use OpenAI SDK or DashScope SDK, you must install the SDK.

Get started

OpenAI compatible

To enable streaming output mode, simply set stream to true.

Python

By default, the streaming output does not return the amount of tokens used for the current request. You can enable this by setting stream_options to {"include_usage": True}, which will make the last returned chunk include the token usage for the current request.
In the future, we will set the default value of stream_options to {"include_usage": True}. As a result, the choices field of the last chunk will be an empty list. We recommend referencing the latest code in this topic and adding a conditional check in your business code using if chunk.choices:.
import os
from openai import OpenAI

client = OpenAI(
    # If the environment variable is not configured, replace the following line with: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen-plus", # qwen-plus is used as an example. You can use other models in the model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"}
    ],
    stream=True
    )
full_content = ""
print("Streaming output content is:")
for chunk in completion:
    # If stream_options.include_usage is True, the choices field of the last chunk is empty and need to be skipped. Obtain token usage from chunk.usage instead.
    if chunk.choices:
    full_content += chunk.choices[0].delta.content
    print(chunk.choices[0].delta.content)
print(f"Full content is: {full_content}")

Sample response

Streaming output content is:

I am a
large
language model
from Alibaba Cloud
. I am
called
Qwen.

Full content is: I am a large language model from Alibaba Cloud. I am called Qwen.

Node.js

By default, the streaming output does not return the amount of tokens used for the current request. You can enable this by setting stream_options to {"include_usage": true}, which will make the last returned chunk include the token usage for the current request.
In the future, we will set the default value of stream_options to {"include_usage": true}. As a result, the choices field of the last chunk will be an empty list. We recommend referencing the latest code in this topic and add if (Array.isArray(chunk.choices) && chunk.choices.length > 0 in your business code.
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // If the environment variable is not configured, replace the following line with: apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const completion = await openai.chat.completions.create({
    model: "qwen-plus", // qwen-plus is used as an example. You can use other models in the model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
    messages: [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"}
    ],
    stream: true,
    stream_options: {
        include_usage: true
    }
});

let fullContent = "";
console.log("Streaming output content is:")
for await (const chunk of completion) {
    // If stream_options.include_usage is true, the choices field of the last chunk is empty and need to be skipped. Obtain token usage from chunk.usage instead.
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        fullContent = fullContent + chunk.choices[0].delta.content;
        console.log(chunk.choices[0].delta.content);
    }
}
console.log("\nFull content is:")
console.log(fullContent);

Sample response

Streaming output content is:

I am a
large
language model
from Alibaba Cloud
. I am
called
Qwen

Full content is: 
I am a large language model from Alibaba Cloud. I am called Qwen.

cURL

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    }
}'

Sample response

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"a"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"large language"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"model from"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":", called Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":22,"completion_tokens":17,"total_tokens":39},"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: [DONE]

DashScope

For the Python SDK, set the stream parameter to True.

For the Java SDK, use the streamCall interface.

For HTTP, set the Header parameter X-DashScope-SSE to enable.

By default, the streaming output is non-incremental, meaning each return includes all previously generated content. To use incremental streaming output mode, set the incremental_output (incrementalOutput for Java) parameter to true.

Python

import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {'role':'system','content':'you are a helpful assistant'},
    {'role': 'user','content': 'Who are you?'}]
responses = Generation.call(
    # If the environment variable is not configured, replace the following line with: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen-plus",
    messages=messages,
    result_format='message',
    stream=True,
    # Incremental streaming data
    incremental_output=True
    )
full_content = ""
print("Streaming output content is:")
for response in responses:
    full_content += response.output.choices[0].message.content
    print(response.output.choices[0].message.content)
print(f"Full content is: {full_content}")

Sample response

Streaming output content is:

I am a
large
language model
from Alibaba Cloud
. I am
called
Qwen

Full content is: I am a large language model from Alibaba Cloud. I am called Qwen.

Java

import java.util.Arrays;
import java.lang.System;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.protocol.Protocol;

public class Main {
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder fullContent = new StringBuilder();
    private static void handleGenerationResult(GenerationResult message) {
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();
        fullContent.append(content);
        System.out.println(content);
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        System.out.println("Streaming output content is:");
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
        System.out.println("Full content is: " + fullContent.toString());
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // If the environment variable is not configured, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-plus")
                .messages(Arrays.asList(userMsg))
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                // Enable incremental streaming data
                .incrementalOutput(true)
                .build();
    }
    public static void main(String[] args) {
        try {
            Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
        } catch (ApiException | NoApiKeyException | InputRequiredException  e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
        System.exit(0);
    }
}

Sample response

Streaming output content is:

I am a
large
language model
from Alibaba Cloud
. I am
called
Qwen

Full content is: 
I am a large language model from Alibaba Cloud. I am called Qwen.

cURL

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen-plus",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters": {
        "result_format": "message",
        "incremental_output":true
    }
}'

Sample response

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"I am","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":23,"input_tokens":22,"output_tokens":1},"request_id":"xxx"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Qwen","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":24,"input_tokens":22,"output_tokens":2},"request_id":"xxx"}

id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":", an","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":25,"input_tokens":22,"output_tokens":3},"request_id":"xxx"}

id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"AI","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":30,"input_tokens":22,"output_tokens":8},"request_id":"xxx"}

id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"assistant developed by Alibaba","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":38,"input_tokens":22,"output_tokens":16},"request_id":"xxx"}

id:6
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Cloud. I am designed to answer various questions, provide information","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":46,"input_tokens":22,"output_tokens":24},"request_id":"xxx"}

id:7
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"and engage in conversations with users. How can I","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":54,"input_tokens":22,"output_tokens":32},"request_id":"xxx"}

id:8
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"assist you?","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":58,"input_tokens":22,"output_tokens":36},"request_id":"xxx"}

Error code

If the call failed and an error message is returned, see Error messages.

FAQ

Q1: Does the streaming output mode affect the model's response quality?

A1: No, enabling streaming output mode does not impact the quality of the response.

Q2: Is there an additional charge for using the streaming output mode feature?

A2: There is no extra charge. Streaming output mode is billed in the same manner as non-streaming output mode, based on the number of input and output tokens.