All Products
Search
Document Center

Alibaba Cloud Model Studio:Deployment overview

Last Updated:Mar 11, 2026

Deploy a model on dedicated inference resources to meet performance requirements such as high concurrency, low latency, and predictable traffic.

Important

This document applies only to the China (Beijing) region.

Supported models and pricing

Deployment uses Provisioned Throughput, billed by usage duration and Provisioned Throughput Unit (PTU).

Before you deploy, check the estimated hourly cost for each model in the Deployment console (Beijing).

ModelTypeContext window (input + output)Max input tokensPay-as-you-go (hourly)Subscription (daily)
Input (per 10k TPM)Output (per 1k TPM)Input (per 10k TPM)Output (per 1k TPM)
Qwen3-Max-2025-09-23Instruct128,000128,000$1.11$0.45$13.32$5.40
Qwen-Plus-2025-12-01Instruct$0.28$0.07$3.36$0.84
Qwen-Plus-2025-12-01Thinking$0.28$3.36
Qwen-Flash-2025-07-28Instruct/Thinking$0.06$0.06$0.72$0.72
Qwen3-VL-Plus-2025-09-23Instruct/Thinking$0.35$0.35$4.20$4.20
DeepSeek-v3.2Instruct/Thinking64,000$1.04$0.16$12.48$1.92

Model types:

  • Instruct: The model runs in non-thinking mode after deployment.

  • Thinking: The model runs in thinking mode after deployment.

To deploy models beyond this list, see available options in this deployment solution.

View token usage and call statistics for individual invocations in the Monitoring (Beijing) console.

Deploy a model

Note

If you get a permission error, see What to do if I get a permission error during deployment in the FAQ section.

  1. Go to the Deployment console (Beijing).

    Deployment console

  2. Select a model and billing method. Keep other settings at their defaults. Set a model name and start the deployment.

  3. When the deployment status shows Running, the model is ready.

Important

Billing starts as soon as the model is deployed.

Invoke a deployed model

After deployment, invoke the model through one of these APIs:

Set the model parameter to the Model Code shown in the Deployment console (Beijing).

Model Code in the deployment console

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # If you haven't configured an environment variable, replace the next line with: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="<your-deployed-model-code>",  # Replace with your Model Code from the deployment console
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"},
    ],
    extra_body={"enable_thinking": False},
)
print(completion)

DashScope

import os
import dashscope

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
]
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
response = dashscope.Generation.call(
    # If you haven't configured an environment variable, replace the next line with: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="<your-deployed-model-code>",  # Replace with your Model Code from the deployment console
    messages=messages,
    result_format="message",
    enable_thinking=False,
)
print(response)

Replace <your-deployed-model-code> with the Model Code from the deployment console.

Scale a deployed service

Click Scaling in the Deployment console (Beijing) to manually adjust the number of instances.

Deactivate a deployed service

  1. Go to the Deployment console (Beijing).

  2. Find the service and click Deactivate, then confirm.

Billing stops after deactivation.

Deactivate a deployed service

Billing

Billing methods

Important

You cannot change the billing method after you create the service. To switch, deactivate the deployed model and redeploy it.

Pay-as-you-goSubscription
Minimum billing unitPer minutePer day
ScalingSelf-service throughput adjustmentSelf-service throughput adjustment
AdvantagesStable throughput capacity, lower latency, and stronger resource certainty for high-load production environmentsStable throughput capacity, lower latency, and stronger resource certainty for high-load production environments. Supports auto-renewal.
Early terminationN/ADays already used are charged at 1.5x the standard rate
Overdue paymentResources remain active and billed for 24 hours, then released automaticallyN/A

Billing formula

Cost = Usage duration x (Input TPM unit price x Input TPM + Output TPM unit price x Output TPM)

Subscription lifecycle

  • Orders take effect immediately after payment and expire at 23:59 on day N. Orders placed after 22:00 have the expiration extended by one day.

  • After expiration, the service stops after a 2-hour grace period. Resources are retained for 14 hours before release.

  • Subscription orders cannot be terminated early.

Overflow handling

If input exceeds the maximum input token limit or the purchased TPM quota, calls automatically fall back to Model Studio's standard model invocation service. When this happens:

Monitor TPM statistics in the Monitoring (Beijing) console.

FAQ

Can I deploy my own models?

Not currently. Model Studio does not support uploading and deploying custom models at this time. Check the latest announcements for updates.

To deploy your own models, use Platform for AI (PAI).

What to do if I get a permission error during deployment

"Lack permissions for this module"

Grant the ModelDeploy-FullAccess permission to your account in the workspace's Permissions page.

Workspace permissions page

If you cannot proceed, contact your organization or IT administrator.

"Workspace xx does not have deployment privilege for model xx"

Go to the Workspaces page and add deployment permissions for the model to the workspace.

API error message: Workspace xxx does not have deployment privilege for model xxxx.
Workspace deployment privilege errorAdd model permissions to workspace

If you cannot resolve the error, contact your organization or IT administrator.