Update deployment throttling - Alibaba Cloud Model Studio

Update the throttling settings for a specified deployment.

Prerequisites

You have read Introduction to model deployment and Use API for model deployment and are familiar with the basic steps for deploying models on the Model Studio platform.
You have an API key for Model Studio. For instructions, see Obtain an API key.

Update model deployment settings

Note

For the model unit deployment method, only some models support modifying the rpm and tpm settings.

Endpoint

PUT https://dashscope-intl.aliyuncs.com/api/v1/deployments/{deployed_model}/update

Request example

Use the following command to update the throttling settings for a specified deployment:

curl -X PUT "https://dashscope-intl.aliyuncs.com/api/v1/deployments/{deployed_model}/update" \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "rpm_limit": 1000,
    "tpm_limit": 200
}'

Request parameters

Parameter	Type	In	Required	Description
deployed_model	String	path	Yes	The unique identifier for the deployment. Obtain it by calling the Create a deployment or List deployments operation.
rpm_limit	Number	body	At least one parameter is required.	The maximum number of requests per minute (rpm).
tpm_limit	Number	body	At least one parameter is required.	The maximum number of tokens per minute (tpm).

Response example

A successful request returns the following example response:

{
    "request_id": "1d121fd9-876c-40ad-bc40-a9e68ef3b986",
    "output":
    {
        "deployed_model": "qwen-plus-2025-12-01-b6d61c71",
        "gmt_create": "2026-01-07T13:52:44",
        "gmt_modified": "2026-01-07T14:01:41",
        "status": "PENDING",
        "model_name": "qwen-plus-2025-12-01",
        "base_model": "qwen-plus-2025-12-01",
        "base_capacity": 4,
        "capacity": 4,
        "ready_capacity": 0,
        "workspace_id": "llm-8v53e*******",
        "charge_type": "post_paid",
        "creator": "16542902******",
        "modifier": "16542902********",
        "plan": "mu",
        "model_unit_spec": "MU1",
        "enable_thinking": true,
        "max_context_length": 1,
        "rpm_limit": 1000,
        "tpm_limit": 200
    }
}

Response parameters

For response parameter descriptions, see Create a model deployment task.

Errors

Response example

{
    "request_id": "ca218d57-b91b-46b2-bd35-c41c6287bcf4",
    "message": "Model: qwen-plus-20230703-cx7f not found!",
    "code": "NotFound"
}

Response parameters

Parameter	Type	Description
request_id	String	The unique ID of the request.
code	String	The error code.
message	String	The error message.

If a request fails, the response may contain one of the following errors.

Error code	Error message	Cause
NotFound	Model: xxx not found!	You specified a non-existent model when creating a deployment. You specified a non-existent model when querying, updating, or deleting a deployment.
Conflict	Deployed model xxx already exists, please specify a suffix.	The specified suffix is already in use by another deployment.
InvalidParameter	Invalid capacity (xx), capacity must be larger than or equal to 0 and multiples of 1 and less than 1000!	You specified an invalid capacity value when creating or updating a deployment.