Update the throttling settings for a specified deployment.
Prerequisites
-
You have read Introduction to model deployment and Use API for model deployment and are familiar with the basic steps for deploying models on the Model Studio platform.
-
You have an API key for Model Studio. For instructions, see Obtain an API key.
Update model deployment settings
For the model unit deployment method, only some models support modifying the rpm and tpm settings.
Endpoint
PUT https://dashscope-intl.aliyuncs.com/api/v1/deployments/{deployed_model}/update
Request example
Use the following command to update the throttling settings for a specified deployment:
curl -X PUT "https://dashscope-intl.aliyuncs.com/api/v1/deployments/{deployed_model}/update" \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"rpm_limit": 1000,
"tpm_limit": 200
}'
Request parameters
|
Parameter |
Type |
In |
Required |
Description |
|
deployed_model |
String |
path |
Yes |
The unique identifier for the deployment. Obtain it by calling the Create a deployment or List deployments operation. |
|
rpm_limit |
Number |
body |
At least one parameter is required. |
The maximum number of requests per minute (rpm). |
|
tpm_limit |
Number |
body |
The maximum number of tokens per minute (tpm). |
Response example
A successful request returns the following example response:
{
"request_id": "1d121fd9-876c-40ad-bc40-a9e68ef3b986",
"output":
{
"deployed_model": "qwen-plus-2025-12-01-b6d61c71",
"gmt_create": "2026-01-07T13:52:44",
"gmt_modified": "2026-01-07T14:01:41",
"status": "PENDING",
"model_name": "qwen-plus-2025-12-01",
"base_model": "qwen-plus-2025-12-01",
"base_capacity": 4,
"capacity": 4,
"ready_capacity": 0,
"workspace_id": "llm-8v53e*******",
"charge_type": "post_paid",
"creator": "16542902******",
"modifier": "16542902********",
"plan": "mu",
"model_unit_spec": "MU1",
"enable_thinking": true,
"max_context_length": 1,
"rpm_limit": 1000,
"tpm_limit": 200
}
}
Response parameters
For response parameter descriptions, see Create a model deployment task.
Errors
Response example
{
"request_id": "ca218d57-b91b-46b2-bd35-c41c6287bcf4",
"message": "Model: qwen-plus-20230703-cx7f not found!",
"code": "NotFound"
}
Response parameters
|
Parameter |
Type |
Description |
|
request_id |
String |
The unique ID of the request. |
|
code |
String |
The error code. |
|
message |
String |
The error message. |
If a request fails, the response may contain one of the following errors.
|
Error code |
Error message |
Cause |
|
NotFound |
Model: xxx not found! |
|
|
Conflict |
Deployed model xxx already exists, please specify a suffix. |
The specified suffix is already in use by another deployment. |
|
InvalidParameter |
Invalid capacity (xx), capacity must be larger than or equal to 0 and multiples of 1 and less than 1000! |
You specified an invalid capacity value when creating or updating a deployment. |