All Products
Search
Document Center

Alibaba Cloud Model Studio:Scale a deployment

Last Updated:Jun 06, 2026

Prerequisites

Update a model deployment

This operation adjusts the number of resource units for a dedicated service.

Endpoint

PUT https://dashscope-intl.aliyuncs.com/api/v1/deployments/{deployed_model}/scale

Request example

To scale a specified service, run the following command:

curl --request PUT "https://dashscope-intl.aliyuncs.com/api/v1/deployments/emo-35b3f106-sample01/scale" \
    --header "Authorization: Bearer ${DASHSCOPE_API_KEY}" \
    --header 'Content-Type: application/json' \
    --data '{
                "capacity":2
            }'

Request parameters

Parameter

Type

Location

Required

Description

deployed_model

String

path

Yes

The unique identifier of the deployment. Obtain this ID by calling the Create a deployment or List deployments API.

capacity

Number

body

Conditionally required

This parameter is applicable only when "plan": "mu" (model unit).

For feature availability, see Feature support for model unit deployment.

The new number of resource units. The value must be an integer multiple of base_capacity.

ptu_capacity

Object

body

Conditionally required

This parameter is applicable only when "plan": "ptu".

For feature availability, see Feature support for PTU deployment.

This parameter takes effect only when "plan": "ptu" is set.

Example: "ptu_capacity": { "input_tpm": 10000, "output_tpm": 1000 }.

ptu_capacity.input_tpm

Number

body

Maximum number of input tokens that the deployed model can process per minute (tpm).

ptu_capacity.output_tpm

Number

body

Maximum number of output tokens that the deployed model can generate per minute (tpm).

ptu_capacity.thinking_output_tpm

Number

body

Maximum number of pre-computation output tokens that the deployed model can generate per minute (tpm).

Response example

{
  "request_id": "6c6b7676-3fea-423b-bc26-c9e2337e1142",
  "output": {
    "deployed_model": "emo-35b3f106-sample01",
    "gmt_create": "2025-06-17T11:00:38",
    "gmt_modified": "2025-06-17T11:42:02.311",
    "status": "UPDATING",
    "model_name": "emo",
    "base_model": "emo",
    "base_capacity": 1,
    "capacity": 2,
    "ready_capacity": 1,
    "workspace_id": "llm-v71tlv3dezezp2en",
    "charge_type": "post_paid",
    "creator": "17580541***",
    "modifier": "17580541***"
  }
}

Response parameters

For response parameter descriptions, see Create a model deployment task.

Error responses

Response example

{
    "request_id": "ca218d57-b91b-46b2-bd35-c41c6287bcf4",
    "message": "Model: qwen-plus-20230703-cx7f not found!",
    "code": "NotFound"
}

Response parameters

Field

Type

Description

request_id

String

A unique ID for the request.

code

String

The error code.

message

String

The error message.

The following table describes common error codes for this API call.

Error code

Error message

Cause

NotFound

Model: xxx not found!

  • A non-existent model was specified for the deployment task.

  • The specified deployment does not exist.

Conflict

Deployed model xxx already exists, please specify a suffix.

A duplicate suffix was used when creating the deployment task.

InvalidParameter

Invalid capacity (xx), capacity must be larger than or equal to 0 and multiples of 1 and less than 1000!

An invalid value was specified for the capacity parameter.