Scale a deployment - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center

Prerequisites

You have read Introduction to model deployment and Deploy models using API, and understand the basic steps to deploy a model on the Alibaba Cloud Model Studio platform.
You have configured your Model Studio API key. For more information, see Get an API key.

Update a model deployment

This operation adjusts the number of resource units for a dedicated service.

Endpoint

PUT https://dashscope-intl.aliyuncs.com/api/v1/deployments/{deployed_model}/scale

Request example

To scale a specified service, run the following command:

curl --request PUT "https://dashscope-intl.aliyuncs.com/api/v1/deployments/emo-35b3f106-sample01/scale" \
    --header "Authorization: Bearer ${DASHSCOPE_API_KEY}" \
    --header 'Content-Type: application/json' \
    --data '{
                "capacity":2
            }'

Request parameters

Parameter	Type	Location	Required	Description
deployed_model	String	path	Yes	The unique identifier of the deployment. Obtain this ID by calling the Create a deployment or List deployments API.
capacity	Number	body	Conditionally required	This parameter is applicable only when `"plan": "mu"` (model unit). For feature availability, see Feature support for model unit deployment.	The new number of resource units. The value must be an integer multiple of `base_capacity`.
ptu_capacity	Object	body	Conditionally required	This parameter is applicable only when `"plan": "ptu"`. For feature availability, see Feature support for PTU deployment.	This parameter takes effect only when `"plan": "ptu"` is set. Example: `"ptu_capacity": { "input_tpm": 10000, "output_tpm": 1000 }`.
ptu_capacity.input_tpm	Number	body			Maximum number of input tokens that the deployed model can process per minute (tpm).
ptu_capacity.output_tpm	Number	body			Maximum number of output tokens that the deployed model can generate per minute (tpm).
ptu_capacity.thinking_output_tpm	Number	body			Maximum number of pre-computation output tokens that the deployed model can generate per minute (tpm).

Response example

{
  "request_id": "6c6b7676-3fea-423b-bc26-c9e2337e1142",
  "output": {
    "deployed_model": "emo-35b3f106-sample01",
    "gmt_create": "2025-06-17T11:00:38",
    "gmt_modified": "2025-06-17T11:42:02.311",
    "status": "UPDATING",
    "model_name": "emo",
    "base_model": "emo",
    "base_capacity": 1,
    "capacity": 2,
    "ready_capacity": 1,
    "workspace_id": "llm-v71tlv3dezezp2en",
    "charge_type": "post_paid",
    "creator": "17580541***",
    "modifier": "17580541***"
  }
}

Response parameters

For response parameter descriptions, see Create a model deployment task.

Error responses

Response example

{
    "request_id": "ca218d57-b91b-46b2-bd35-c41c6287bcf4",
    "message": "Model: qwen-plus-20230703-cx7f not found!",
    "code": "NotFound"
}

Response parameters

Field	Type	Description
request_id	String	A unique ID for the request.
code	String	The error code.
message	String	The error message.

The following table describes common error codes for this API call.

Error code	Error message	Cause
NotFound	Model: xxx not found!	A non-existent model was specified for the deployment task. The specified deployment does not exist.
Conflict	Deployed model xxx already exists, please specify a suffix.	A duplicate suffix was used when creating the deployment task.
InvalidParameter	Invalid capacity (xx), capacity must be larger than or equal to 0 and multiples of 1 and less than 1000!	An invalid value was specified for the `capacity` parameter.