Qwen continues generation based on a prefix - Alibaba Cloud Model Studio

For tasks such as code completion and text continuation, the model must generate content from an existing text fragment, or prefix. Partial mode offers precise control to ensure that the model's output seamlessly follows the provided prefix. This improves the accuracy and control over the results.

How it works

Set the role of the last message in the messages array to assistant, provide a prefix in its content, and set the "partial": true parameter. The format of the messages array is as follows:

[
    {
        "role": "user",
        "content": "Complete this Fibonacci function. Do not add any other content."
    },
    {
        "role": "assistant",
        "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
        "partial": true
    }
]

The model starts generating text from the provided prefix.

Model availability

Qwen-Max series
qwen3-max, qwen3-max-2025-09-23, qwen3-max-preview (non-thinking mode), qwen-max, qwen-max-latest, and snapshot models from qwen-max-2025-01-25 or later
Qwen-Plus series (non-thinking mode)
qwen-plus, qwen-plus-latest, and snapshot models from qwen-plus-2025-01-25 or later
Qwen-Flash series (non-thinking mode)
qwen-flash, and snapshot models from qwen-flash-2025-07-28 or later
Qwen-Coder series
qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-480b-a35b-instruct, qwen3-coder-30b-a3b-instruct
Qwen-VL series
- qwen3-vl-plus series (non-thinking mode)
  qwen3-vl-plus, and snapshot models from qwen3-vl-plus-2025-09-23 or later
- qwen3-vl-flash series (non-thinking mode)
  qwen3-vl-flash, and snapshot models from qwen3-vl-flash-2025-10-15 or later
- qwen-vl-max series
  qwen-vl-max, qwen-vl-max-latest, and snapshot models from qwen-vl-max-2025-04-08 or later
- qwen-vl-plus series
  qwen-vl-plus, qwen-vl-plus-latest, and snapshot models from qwen-vl-plus-2025-01-25 or later
Qwen-Turbo series (non-thinking mode)
qwen-turbo, qwen-turbo-latest, and snapshot models from qwen-turbo-2024-11-01 or later
Qwen open-source series
Qwen3 open-source models (non-thinking mode), Qwen2.5 series text models, Qwen3-VL open-source models (non-thinking mode)

Important

Models in thinking mode do not support partial mode.

Getting started

Prerequisites

Ensure that you have created an API key and exported the API key as an environment variable. If you make calls using the OpenAI SDK or DashScope SDK, install the SDK. If you are a member of a workspace, ensure that the super administrator has completed the model authorization for that workspace.

Note

The DashScope Java SDK is not supported.

Example code

Code completion is a core application scenario for partial mode. The following example demonstrates how to use qwen3-coder-plus to complete a Python function.

OpenAI compatible

Python

import os
from openai import OpenAI

# 1. Initialize the client
client = OpenAI(
    # API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured the environment variable, replace this with your API key
    api_key=os.getenv("DASHSCOPE_API_KEY"), 
    # If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  
)
# 2. Define the code prefix to complete
prefix = """def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
"""

# 3. Send a partial mode request
# Note: The last message in the messages array has the role "assistant" and includes "partial": True
completion = client.chat.completions.create(
    model="qwen3-coder-plus",
    messages=[
        {"role": "user", "content": "Complete this Fibonacci function. Do not add any other content."},
        {"role": "assistant", "content": prefix, "partial": True},
    ],
)

# 4. Manually combine the prefix and the model-generated content
generated_code = completion.choices[0].message.content
complete_code = prefix + generated_code

print(complete_code)

Response

def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    // If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
    apiKey: process.env.DASHSCOPE_API_KEY,
    // If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});

// Define the code prefix to complete
const prefix = `def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
`;

const completion = await openai.chat.completions.create({
    model: "qwen3-coder-plus",  // Use a code model
    messages: [
        { role: "user", content: "Complete this Fibonacci function. Do not add any other content." },
        { role: "assistant", content: prefix, partial: true }
    ],
});

// Manually combine the prefix and the model-generated content
const generatedCode = completion.choices[0].message.content;
const completeCode = prefix + generatedCode;

console.log(completeCode);

curl

# ======= Important =======
# API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-coder-plus",
    "messages": [
        {
            "role": "user", 
            "content": "Complete this Fibonacci function. Do not add any other content."
        },
        {
            "role": "assistant",
            "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
            "partial": true
        }
    ]
}'

Response

{
    "choices": [
        {
            "message": {
                "content": "        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 48,
        "completion_tokens": 19,
        "total_tokens": 67,
        "prompt_tokens_details": {
            "cache_type": "implicit",
            "cached_tokens": 0
        }
    },
    "created": 1756800231,
    "system_fingerprint": null,
    "model": "qwen3-coder-plus",
    "id": "chatcmpl-d103b1cf-4bda-942f-92d6-d7ecabfeeccb"
}

DashScope

Python

import os
import dashscope

# If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Define the code prefix to complete
prefix = """def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
"""

messages = [
    {
        "role": "user", 
        "content": "Complete this Fibonacci function. Do not add any other content."
    },
    {
        "role": "assistant",
        "content": prefix,
        "partial": True
    }
]

response = dashscope.Generation.call(
    # API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen3-coder-plus',  # Use a code model
    messages=messages,
    result_format='message',  
)

# Manually combine the prefix and the model-generated content
generated_code = response.output.choices[0].message.content
complete_code = prefix + generated_code

print(complete_code)

Response

def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)

curl

# ======= Important =======
# API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Delete this comment before execution ===

curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-coder-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": "Complete this Fibonacci function. Do not add any other content."
            },
            {
                "role": "assistant",
                "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
                "partial": true
            }
        ]
    },
    "parameters": {
        "result_format": "message"
    }
}'

Return values

{
    "output": {
        "choices": [
            {
                "message": {
                    "content": "        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
                    "role": "assistant"
                },
                "finish_reason": "stop"
            }
        ]
    },
    "usage": {
        "total_tokens": 67,
        "output_tokens": 19,
        "input_tokens": 48,
        "prompt_tokens_details": {
            "cached_tokens": 0
        }
    },
    "request_id": "c61c62e5-cf97-90bc-a4ee-50e5e117b93f"
}

Usage examples

Pass an image or video

Qwen-VL models support partial mode for image or video inputs. This is useful for scenarios such as product descriptions, social media content creation, press release generation, and creative copywriting.

OpenAI compatible

Python

import os
from openai import OpenAI

client = OpenAI(
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    # API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

completion = client.chat.completions.create(
    model="qwen3-vl-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
                    },
                },
                {"type": "text", "text": "I want to post on social media. Help me write the copy."},
            ],
        },
        {
            "role": "assistant",
            "content": "I found a hidden gem of a coffee shop today",
            "partial": True,
        },
    ],
)
print(completion.choices[0].message.content)

Response

, and this tiramisu is a treat for the taste buds! Every bite is a perfect blend of coffee and cream, pure happiness~ #FoodShare #Tiramisu #CoffeeTime

I hope you like this copy! Let me know if you need any changes.

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
  // API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
  apiKey: process.env.DASHSCOPE_API_KEY,
  // If you use a model in the China (Beijing) region, replace the baseURL with https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen3-vl-plus",  
        messages: [
            {
                role: "user",
                content: [
                    {
                        type: "image_url",
                        image_url: {
                            "url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
                        }
                    },
                    {
                        type: "text",
                        text: "I want to post on social media. Help me write the copy."
                    }
                ]
            },
            {
                role: "assistant",
                content: "I found a hidden gem of a coffee shop today",
                "partial": true
            }
        ]
    });
    console.log(response.choices[0].message.content);
}

main();

curl

# ======= Important =======
# If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen3-vl-plus",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
          }
        },
        {
          "type": "text",
          "text": "I want to post on social media. Help me write the copy."
        }
      ]
    },
    {
      "role": "assistant",
      "content": "I found a hidden gem of a coffee shop today",
      "partial": true
    }
  ]
}'

Results

{
    "choices": [
        {
            "message": {
                "content": ", and this tiramisu is a treat for the taste buds! Every bite is a perfect blend of coffee and cream, pure happiness~ #FoodShare #Tiramisu #CoffeeTime\n\nI hope you like this copy! Let me know if you need any changes.",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 282,
        "completion_tokens": 56,
        "total_tokens": 338,
        "prompt_tokens_details": {
            "cached_tokens": 0
        }
    },
    "created": 1756802933,
    "system_fingerprint": null,
    "model": "qwen3-vl-plus",
    "id": "chatcmpl-5780cbb7-ebae-9c63-b098-f8cc49e321f0"
}

DashScope

Python

import os
import dashscope

# If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {
                "image": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
            },
            {"text": "I want to post on social media. Help me write the copy."},
        ],
    },
    {"role": "assistant", "content": "I found a hidden gem of a coffee shop today", "partial": True},
]

response = dashscope.MultiModalConversation.call(
    #If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key ="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"), 
    model="qwen3-vl-plus", 
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

Response

, and this tiramisu is a treat for the taste buds! Every bite is a perfect blend of coffee and cream, pure happiness~ #FoodShare #Tiramisu #CoffeeTime

I hope you like this copy! Let me know if you need any changes.

curl

# ======= Important =======
# If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {"role": "user",
             "content": [
               {"image": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"},
               {"text": "I want to post on social media. Help me write the copy."}]
            },
            {"role": "assistant",
             "content": "I found a hidden gem of a coffee shop today",
             "partial": true
            }
        ]
    }
}'

Response

{
    "output": {
        "choices": [
            {
                "message": {
                    "content": [
                        {
                            "text": ", and this tiramisu is a treat for the taste buds! Every bite is a perfect blend of coffee and cream, pure happiness~ #FoodShare #Tiramisu #CoffeeTime\n\nI hope you like this copy! Let me know if you need any changes."
                        }
                    ],
                    "role": "assistant"
                },
                "finish_reason": "stop"
            }
        ]
    },
    "usage": {
        "total_tokens": 339,
        "input_tokens_details": {
            "image_tokens": 258,
            "text_tokens": 24
        },
        "output_tokens": 57,
        "input_tokens": 282,
        "output_tokens_details": {
            "text_tokens": 57
        },
        "image_tokens": 258
    },
    "request_id": "c741328c-23dc-9286-bfa7-626a4092ca09"
}

Continue generation from incomplete output

If the max_tokens parameter is set too low, the model may return incomplete content. You can use partial mode to continue the generation and make the content semantically complete.

OpenAI compatible

Python

import os
from openai import OpenAI

client = OpenAI(
    # API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured the environment variable, replace this with your API key
    api_key=os.getenv("DASHSCOPE_API_KEY"), 
    # If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  
)

def chat_completion(messages,max_tokens=None):
    response = client.chat.completions.create(
        model="qwen-plus",
        messages=messages,
        max_tokens=max_tokens
    )
    print(f"###Stop reason: {response.choices[0].finish_reason}")
    
    return response.choices[0].message.content

# Example usage
messages = [{"role": "user", "content": "Write a short science fiction story"}]

# First call, set max_tokens to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)
# Add the response from the first call to an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})

# Second call
second_content = chat_completion(messages)
print("###Full content:")
print(first_content+second_content)

Response

A stop reason of length indicates that the max_tokens limit has been reached. A stop reason of stop indicates that the model's generation has ended naturally or that the stop parameter has been triggered.

###Stop reason: length
**"The End of Memory"**

In the distant future, Earth was no longer habitable. The atmosphere layer was polluted, the oceans had dried up, and cities had turned to ruins. Humanity was forced to migrate to a planet named "
###Stop reason: stop
###Full content:
**"The End of Memory"**

In the distant future, Earth was no longer habitable. The atmosphere layer was polluted, the oceans had dried up, and cities had turned to ruins. Humanity was forced to migrate to a planet named "Eden Star," a habitable planet with blue skies, fresh air, and endless resources.

However, Eden Star was not a true paradise. It had no human history, no past, and no memory.

......

**"If we forget who we are, are we still human?"**

--The End--

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    // API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    // If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
    apiKey: process.env.DASHSCOPE_API_KEY,
    // If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});

async function chatCompletion(messages, maxTokens = null) {
    const completion = await openai.chat.completions.create({
        model: "qwen-plus",
        messages: messages,
        max_tokens: maxTokens
    });
    
    console.log(`###Stop reason: ${completion.choices[0].finish_reason}`);
    return completion.choices[0].message.content;
}

// Example usage
async function main() {
    let messages = [{"role": "user", "content": "Write a short science fiction story"}];

    try {
        // First call, set max_tokens to 40
        const firstContent = await chatCompletion(messages, 40);
        console.log(firstContent);
        
        // Add the response from the first call to an assistant message and set partial=true
        messages.push({"role": "assistant", "content": firstContent, "partial": true});

        // Second call
        const secondContent = await chatCompletion(messages);
        console.log("###Full content:");
        console.log(firstContent + secondContent);
        
    } catch (error) {
        console.error('Execution error:', error);
    }
}

// Run the example
main();

DashScope

Python

Example code

import os
import dashscope
# If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def chat_completion(messages, max_tokens=None):
    response = dashscope.Generation.call(
        # API keys for different regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        model='qwen-plus',
        messages=messages,
        max_tokens=max_tokens,
        result_format='message',  
    )
    
    print(f"###Stop reason: {response.output.choices[0].finish_reason}")
    return response.output.choices[0].message.content

# Example usage
messages = [{"role": "user", "content": "Write a short science fiction story"}]

# First call, set max_tokens to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)

# Add the response from the first call to an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})

# Second call
second_content = chat_completion(messages)
print("###Full content:")
print(first_content + second_content)

Return result

###Stop reason: length
Title: **"Time Origami"**

---

In the year 2179, humanity finally mastered the technology of time travel. But this technology was not achieved through massive machines or complex energy fields, but rather a piece of
###Stop reason: stop
###Full content:
Title: **"Time Origami"**

---

In the year 2179, humanity finally mastered the technology of time travel. But this technology was not achieved through massive machines or complex energy fields, but rather a piece of paper.

A piece of paper that could fold time.

It was called "Time Origami," made from an unknown substance from an alien civilization. Scientists could not explain its principles. They only knew that by drawing a scene on the paper and folding it in a specific way, they could open a door to the past or the future.

......

"You are not the key to time, but a reminder that the future is always in our hands."

Then, I tore it into fragments.

---

**(The End)**

Billing

Billing is based on the number of input and output tokens in a request. The prefix is counted as input tokens.

Error codes

If a call fails, see Error messages for troubleshooting.