Prerequisites
-
You have read Introduction to model deployment and Deploy models using API, and understand the basic steps to deploy a model on the Alibaba Cloud Model Studio platform.
-
You have configured your Model Studio API key. For more information, see Get an API key.
Update a model deployment
This operation adjusts the number of resource units for a dedicated service.
Endpoint
PUT https://dashscope-intl.aliyuncs.com/api/v1/deployments/{deployed_model}/scale
Request example
To scale a specified service, run the following command:
curl --request PUT "https://dashscope-intl.aliyuncs.com/api/v1/deployments/emo-35b3f106-sample01/scale" \
--header "Authorization: Bearer ${DASHSCOPE_API_KEY}" \
--header 'Content-Type: application/json' \
--data '{
"capacity":2
}'
Request parameters
|
Parameter |
Type |
Location |
Required |
Description |
|
|
deployed_model |
String |
path |
Yes |
The unique identifier of the deployment. Obtain this ID by calling the Create a deployment or List deployments API. |
|
|
capacity |
Number |
body |
Conditionally required |
This parameter is applicable only when For feature availability, see Feature support for model unit deployment. |
The new number of resource units. The value must be an integer multiple of |
|
ptu_capacity |
Object |
body |
Conditionally required |
This parameter is applicable only when For feature availability, see Feature support for PTU deployment. |
This parameter takes effect only when Example: |
|
ptu_capacity.input_tpm |
Number |
body |
Maximum number of input tokens that the deployed model can process per minute (tpm). |
||
|
ptu_capacity.output_tpm |
Number |
body |
Maximum number of output tokens that the deployed model can generate per minute (tpm). |
||
|
ptu_capacity.thinking_output_tpm |
Number |
body |
Maximum number of pre-computation output tokens that the deployed model can generate per minute (tpm). |
||
Response example
{
"request_id": "6c6b7676-3fea-423b-bc26-c9e2337e1142",
"output": {
"deployed_model": "emo-35b3f106-sample01",
"gmt_create": "2025-06-17T11:00:38",
"gmt_modified": "2025-06-17T11:42:02.311",
"status": "UPDATING",
"model_name": "emo",
"base_model": "emo",
"base_capacity": 1,
"capacity": 2,
"ready_capacity": 1,
"workspace_id": "llm-v71tlv3dezezp2en",
"charge_type": "post_paid",
"creator": "17580541***",
"modifier": "17580541***"
}
}
Response parameters
For response parameter descriptions, see Create a model deployment task.
Error responses
Response example
{
"request_id": "ca218d57-b91b-46b2-bd35-c41c6287bcf4",
"message": "Model: qwen-plus-20230703-cx7f not found!",
"code": "NotFound"
}
Response parameters
|
Field |
Type |
Description |
|
request_id |
String |
A unique ID for the request. |
|
code |
String |
The error code. |
|
message |
String |
The error message. |
The following table describes common error codes for this API call.
|
Error code |
Error message |
Cause |
|
NotFound |
Model: xxx not found! |
|
|
Conflict |
Deployed model xxx already exists, please specify a suffix. |
A duplicate suffix was used when creating the deployment task. |
|
InvalidParameter |
Invalid capacity (xx), capacity must be larger than or equal to 0 and multiples of 1 and less than 1000! |
An invalid value was specified for the |