All Products
Search
Document Center

Alibaba Cloud Model Studio:Create deployment

Last Updated:Jun 06, 2026

Create a model deployment task.

Prerequisites

Model deployment

Endpoint

POST https://dashscope-intl.aliyuncs.com/api/v1/deployments

Request examples

Provisioned Throughput (PTU)

Note

After you run the deployment command, billing starts as soon as the service is successfully deployed, even if you do not use it. Before proceeding, we recommend you review the service billing rules.

The provisioned throughput billing method charges based on usage duration. This method is suitable for scenarios that require stable throughput, high concurrency, low latency, and predictable traffic. In this mode, the platform provisions both throughput/concurrency and generation speed, which you cannot adjust.

curl "https://dashscope-intl.aliyuncs.com/api/v1/deployments" \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "name": "my_qwen_flash",
    "model_name": "qwen-flash-2025-07-28",
    "plan": "ptu",
    "ptu_capacity": {
        "input_tpm": 10000,
	"output_tpm": 1000
    }
}'

Model unit

Note
  • After you run the deployment command, billing starts as soon as the service is successfully deployed, even if you do not use it. Before proceeding, we recommend you review the service billing rules.

  • Computing resources for the post-paid model unit plan are allocated on a first-come, first-served basis. If the purchase is unsuccessful, a full refund will be issued.

The model unit billing method charges you based on usage duration. This billing method is ideal for large-scale inference tasks after model finetuning, offering dedicated resources with flexible performance and cost adjustments. You can customize both throughput/concurrency and generation speed.

curl "https://dashscope-intl.aliyuncs.com/api/v1/deployments" \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "name": "my_qwen_plus",
    "model_name": "qwen-plus-2025-12-01",
    "plan": "mu",
    "deploy_spec": "MU1",
    "enable_thinking": true,
    "capacity": 4,
    "max_context_length": 10000,
    "rpm_limit": 500,
    "tpm_limit": 1000
}'

The model unit deployment mode supports the following additional settings:

Configuration

Details

Configure model inference mode

For some models, you can configure the inference mode, maximum context length, and other settings when deploying them using the Model Unit method.

  • Instruct - The model is deployed for inference in non-thinking mode.

  • Thinking - The model is deployed for inference in thinking mode.

Maximum context length

This setting is supported for the Model Unit deployment mode of some models. The maximum context length depends on the model type.

Service throttling

This setting is supported for the Model Unit deployment mode of some models. It lets you limit the RPM and TPM of model calls.

To learn how to configure these settings by using the API, see Create a model deployment task by using an API.

Token usage

With token usage billing, you are charged based on token usage. This method is suitable for cost-sensitive scenarios where concurrency and latency requirements are not critical. This mode offers the best price advantage; the platform provisions throughput/concurrency and generation speed, which you cannot adjust.

curl "https://dashscope-intl.aliyuncs.com/api/v1/deployments" \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model_name": "qwen3-8b-ft-202511132025-0260",
    "plan": "lora",
    "capacity": 1,
    "name": "qwen3-8b-ft"
}'
The capacity parameter is required but currently has no effect. To request scaling, go to the model deployment console and submit a form.

Request parameters

Parameter

Type

Location

Required

Description

model_name

String

body

Yes

The name of the model to deploy. This corresponds to the model ID in My Models. You can also get this ID from the output of the Create Training Job or Create Import Job operations.

plan

String

body

Yes

The deployment plan. The following billing methods are supported:

Billing method

Plan setting

Billing by model unit

"plan": "mu"

Billing by compute unit

"plan": "cu"

Provisioned throughput

"plan": "ptu"

LoRA shared deployment (billed by token usage)

"plan": "lora"

You can quickly find the supported deployment plans for a fine-tuned model in My Models.

Note

Fine-tuned CosyVoice models currently only support "plan": "mu".

name

String

body

Yes

The display name of the model in the console.

capacity

Integer

body

No

Required only when "plan": "mu" is specified. Specifies the number of resource units for the deployment. The value must be an integer multiple of base_capacity. The constraints vary based on the deploy_spec value. For example, for MU2, the value must be a multiple of 8, while for MU5, it can be 1. Example: "capacity": 1.

Note

CosyVoice models currently provide the following two deployment templates with corresponding capacity constraints:

  • single-node deployment: capacity must be an integer multiple of 1, such as 1, 2, 3, 4, or 5.

  • single-node deployment - flagship complex inference edition: capacity must be an integer multiple of 8, such as 8, 16, 24, or 32.

billing_method

String

body

No

Required only when "plan": "mu" is specified. Currently, only "POST_PAY" (Post-paid) is supported. Example: "billing_method": "POST_PAY".

deploy_spec

String

body

No

This setting is applicable only when "plan": "mu" is specified.

For details about feature support, see Feature support for model unit deployment.

This parameter is required when "plan": "mu" is specified. Example: "deploy_spec": "MU1".

Note

You can get this value from the template_id field returned by the Get Deployable Model List operation.

enable_thinking

Boolean

body

No

Supported by some models. You can set this to true or false.

max_context_length

Number

body

No

Supported by some models. Example: "max_context_length": 131072.

rpm_limit

Number

body

No

Supported by some models. Specifies the maximum number of requests per minute (RPM).

tpm_limit

Number

body

No

Supported by some models. Specifies the maximum number of tokens per minute (TPM).

ptu_capacity

Object

body

No

This setting is applicable only when "plan": "ptu" is specified.

For details about feature support, see Feature Support for PTU Deployment.

If you do not specify this parameter, the system defaults to 10,000 input_tpm and 1,000 output_tpm.

Example: "ptu_capacity": { "input_tpm": 10000, "output_tpm": 1000 }.

Example: "ptu_capacity": { "input_tpm": 10000, "output_tpm": 1000 }.

ptu_capacity.input_tpm

Number

body

No

Supported by all models. Specifies the maximum number of input tokens per minute (TPM).

ptu_capacity.output_tpm

Number

body

No

Supported by all models. Specifies the maximum number of output tokens per minute (TPM).

ptu_capacity.thinking_output_tpm

Number

body

No

Supported by some models. Specifies the maximum number of provisioned thinking output tokens per minute (TPM).

suffix

String

body

No

After a model is deployed, a new model name is generated. The suffix parameter specifies the suffix for this new name. It must be globally unique and have a maximum length of 8 characters. You can omit the suffix for the first deployment of a model. If you deploy the same model multiple times, you must specify a unique suffix for each deployment.

See the deployed_model output parameter for more information.

Supported models

View supported features and billing.

Billed by usage duration (Provisioned Throughput)

Fee = Usage duration × (Unit price for input TPM × Input TPM + Unit price for output TPM × Output TPM)

The pay-as-you-go billing method is charged hourly, and the unit price is listed in the "Sustained for 1 hour" column of the table below. The subscription billing method is charged daily, and the unit price is listed in the "Sustained for 1 day" column of the table below.

  • Subscription orders take effect immediately after payment. A subscription for N days is valid until 23:59 on the Nth day. If you place an order after 22:00, the expiration date is automatically extended by one day.

  • After a subscription order expires, the service stops after a 2-hour grace period. The resources are retained for 14 hours after the service stops and are then released.

  • You cannot terminate a subscription service in advance.

  • For the pay-as-you-go method, if your account has an overdue payment, the deployed resources are retained and continue to be billed for 24 hours before they are automatically released.

If the model input exceeds the maximum input tokens or the purchased TPM, the relevant calls automatically switch to the pay-as-you-go mode for the current model. In this case, inference performance may degrade. Rate limiting is controlled by the public traffic of the current snapshot model in the workspace. Fees are charged based on the standard rates for model calls (pay-as-you-go).

  • In this case, the API call returns a header that contains x-dashscope-ptu-overflow:true.

  • For TPM statistics, go to Model Monitoring (Beijing).

For the specific rules on fee reductions and refunds for scale-in scenarios (downgrades), see Refund rules for downgrades.

Singapore
Qwen

Model name

Model code

Maximum input tokens

Pay-as-you-go input

Per 10K TPM/hour

Pay-as-you-go output

Per 1K TPM/hour

Subscription input

Per 10K TPM/day

Subscription output

Per 1K TPM/day

Qwen3.6-Plus-2026-04-02

qwen3.6-plus-2026-04-02

128,000

$1.2

$0.72

$14.4

$8.64

Qwen3.5-Plus-2026-04-20

qwen3.5-plus-2026-04-20

128,000

$0.96

$0.576

$11.52

$6.912

Qwen-VL

Model name

Model code

Maximum input tokens

Pay-as-you-go input

Per 10K TPM/hour

Pay-as-you-go output

Per 1K TPM/hour

Subscription input

Per 10K TPM/day

Subscription output

Per 1K TPM/day

Qwen3-VL-Plus-2025-09-23

qwen3-vl-plus-2025-09-23

128,000

$0.48

$0.384

$5.76

$4.608

DeepSeek

Model name

Model code

Maximum input tokens

Pay-as-you-go input

Per 10K TPM/hour

Pay-as-you-go output

Per 1K TPM/hour

Subscription input

Per 10K TPM/day

Subscription output

Per 1K TPM/day

DeepSeek-v3.2

deepseek-v3.2

64,000

$2.05

$0.616

$24.62

$7.387

China (Beijing)
Qwen

Model name

Model code

Maximum input tokens

Pay-as-you-go input

Per 10K TPM/hour

Pay-as-you-go output

Per 1K TPM/hour

Subscription input

Per 10K TPM/day

Subscription output

Per 1K TPM/day

Qwen3.7-Max-2026-05-20

qwen3.7-max-2026-05-20

128,000

$3.96

$1.188

$47.53

$14.258

Qwen3.6-Flash-2026-04-16

qwen3.6-flash-2026-04-16

128,000

$0.4

$0.238

$4.75

$2.852

Qwen3.6-Plus-2026-04-02

qwen3.6-plus-2026-04-02

128,000

$0.67

$0.397

$7.93

$4.753

Qwen3.5-Plus-2026-04-20

qwen3.5-plus-2026-04-20

128,000

$0.26

$0.16

$3.17

$1.9

Qwen3-Max-2025-09-23

qwen3-max-2025-09-23

128,000

$1.11

$0.45

$13.32

$5.4

Qwen-Flash-2025-07-28

qwen-flash-2025-07-28

128,000

$0.06

$0.06

$0.72

$0.72

Qwen-Plus-2025-12-01

qwen-plus-2025-12-01

128,000

$0.28

Non-thinking: $0.07

Thinking: $0.28

$3.36

Non-thinking: $0.84

Thinking: $3.36

DeepSeek

Model name

Model code

Maximum input tokens

Pay-as-you-go input

Per 10K TPM/hour

Pay-as-you-go output

Per 1K TPM/hour

Subscription input

Per 10K TPM/day

Subscription output

Per 1K TPM/day

DeepSeek-v4-Pro

deepseek-v4-pro

64,000

$5.94

$1.188

$71.3

$14.26

DeepSeek-v3.2

deepseek-v3.2

64,000

$1.04

$0.16

$12.48

$1.92

DeepSeek-v3

deepseek-v3

64,000

$0.99

$0.396

$11.9

$4.75

Qwen-VL

Model name

Model code

Maximum input tokens

Pay-as-you-go input

Per 10K TPM/hour

Pay-as-you-go output

Per 1K TPM/hour

Subscription input

Per 10K TPM/day

Subscription output

Per 1K TPM/day

Qwen3-VL-Plus-2025-09-23

qwen3-vl-plus-2025-09-23

128,000

$0.35

$0.35

$4.2

$4.2

More models

Model name

Model code

Maximum input tokens

Pay-as-you-go input

Per 10K TPM/hour

Pay-as-you-go output

Per 1K TPM/hour

Subscription input

Per 10K TPM/day

Subscription output

Per 1K TPM/day

GLM-5.1

glm-5.1

64,000

$2.97

$1.19

$35.65

$14.26

Billed by usage duration (Model Unit)

Fee = Usage duration (hours) × Number of model units × Unit price per model unit

For the pay-as-you-go method, the 'Unit price per model unit' is the value in the 'Hourly price' column in the table below. For monthly subscriptions, the formula is: Number of months × Number of model units × Monthly price.

  • For a monthly subscription, if you cancel within the first month, the daily price (≈ monthly price / 30) is charged at a rate of 1.2 times the normal price. Usage for less than a day is billed as a full day.

Note

Computing power resources for the pay-as-you-go model unit method are allocated on a first-come, first-served basis. If the purchase is unsuccessful, you will receive a full refund.

Singapore
Text generation

Model name

Model code

Model unit specification

Hourly price ($)

Monthly price ($)

Qwen3.6-Plus-2026-04-02

qwen3.6-plus-2026-04-02

MU1 x 8

$88

$41,832

Qwen3.5-39B-A17B

qwen3.5-397b-a17b

MU2 x 8

$112

$52,392

Qwen3.5-35B-A3B

qwen3.5-35b-a3b

MU2 x 8

$112

$52,392

Qwen3-32B

qwen3-32b

MU1 x 4

$44

$20,916

MU2 x 8

$112

$52,392

Qwen3-14B

qwen3-14b

MU1 x 4

$44

$20,916

GLM-5.1

glm-5.1

MU2 x 8

$112

$52,392

DeepSeek-V4-Flash

deepseek-v4-flash

MU1 x 8

$88

$41,832

Multimodal

Model name

Model code

Model unit specification

Hourly price ($)

Monthly price ($)

Qwen3-VL-32B-Instruct

qwen3-vl-32b-instruct

MU2 x 8

$112

$52,392

Qwen3-VL-8B-Instruct

qwen3-vl-8b-instruct

MU1 x 2

$22

$10,458

Model type:

  • Instruct - The model is deployed for inference in non-thinking mode.

China (Beijing)
Text generation
Qwen

Model name

Model code

Model unit specification

Hourly price ($)

Monthly price ($)

Qwen3.6-35B-A3B

qwen3.6-35b-a3b

MU8 x 1

$6.464

$3,080.477

MU9 x 1

$7.014

$3,383.024

Qwen3.6-27B

qwen3.6-27b

MU9 x 1

$7.014

$3,383.024

Qwen3.6-Flash-2026-04-16

qwen3.6-flash-2026-04-16

MU1 x 2

$14.852

$7,183.564

Qwen3.6-Plus-2026-04-02

qwen3.6-plus-2026-04-02

MU1 x 8

$59.408

$28,734.256

Qwen3.5-397B-A17B

qwen3.5-397b-a17b

MU2 x 8

$69.312

$33,044.72

MU3 x 8

$150.72

$72,577.152

MU6 x 16

$55.008

$26,599.92

Qwen3.5-122B-A10B

qwen3.5-122b-a10b

MU1 x 4

$29.704

$14,367.128

MU2 x 8

$69.312

$33,044.72

MU6 x 16

$55.008

$26,599.92

MU9 x 2

$14.028

$6,766.048

Qwen3.5-35B-A3B

qwen3.5-35b-a3b

MU1 x 2

$14.852

$7,183.564

MU2 x 8

$69.312

$33,044.72

MU8 x 1

$6.464

$3,080.477

MU9 x 1

$7.014

$3,383.024

Qwen3.5-27B

qwen3.5-27b

MU1 x 2

$14.852

$7,183.564

MU9 x 1

$7.014

$3,383.024

Qwen3.5-9B

qwen3.5-9b

MU1 x 2

$14.852

$7,183.564

MU8 x 1

$6.464

$3,080.477

MU9 x 1

$7.014

$3,383.024

Qwen3.5-Flash-2026-02-23

qwen3.5-flash-2026-02-23

MU1 x 2

$14.852

$7,183.564

MU8 x 1 (model compression)

$6.464

$3,080.477

Qwen3.5-Plus-2026-02-15

qwen3.5-plus-2026-02-15

MU1 x 8

$59.408

$28,734.256

MU3 x 8

$150.72

$72,577.152

Qwen3-235B-A22B-Instruct

qwen3-235b-a22b-instruct-2507

MU1 x 4

$29.704

$14,367.128

MU2 x 8

$69.312

$33,044.72

Qwen3-Next-80B-A3B-Instruct

qwen3-next-80b-a3b-instruct

MU1 x 2

$14.852

$7,183.564

Qwen3-32B

qwen3-32b

MU1 x 4

$29.704

$14,367.128

MU6 x 4

$13.752

$6,649.98

Qwen3-30B-A3B

qwen3-30b-a3b

MU9 x 2

$14.028

$6,766.048

Qwen3-30B-A3B-Instruct-2507

qwen3-30b-a3b-instruct-2507

MU1 x 4

$29.704

$14,367.128

MU2 x 8

$69.312

$33,044.72

Qwen3-8B

qwen3-8b

MU1 x 2

$14.852

$7,183.564

MU2 x 2

$17.328

$8,261.18

MU5 x 1

$2.888

$1,394.329

Qwen3-4B

qwen3-4b

MU1 x 2

$14.852

$7,183.564

MU5 x 1

$2.888

$1,394.329

Qwen3-1.7B

qwen3-1.7b

MU1 x 2

$14.852

$7,183.564

MU5 x 1

$2.888

$1,394.329

Qwen3-Max-2025-09-23

qwen3-max-2025-09-23

MU2 x 8

$69.312

$33,044.72

MU3 x 8

$150.72

$72,577.152

Qwen2.5-72B

qwen2.5-72b-instruct

MU1 x 4

$29.704

$14,367.128

Qwen2.5-32B

qwen2.5-32b-instruct

MU1 x 4

$29.704

$14,367.128

Qwen2.5-14B

qwen2.5-14b-instruct

MU1 x 2

$14.852

$7,183.564

Qwen2.5-7B

qwen2.5-7b-instruct

MU1 x 2

$14.852

$7,183.564

MU5 x 1

$2.888

$1,394.329

Qwen2.5-3B-Instruct

qwen2.5-3b-instruct

MU5 x 1

$2.888

$1,394.329

Qwen-Flash-2025-07-28

qwen-flash-2025-07-28

MU1 x 4

$29.704

$14,367.128

Qwen-Plus-2025-07-28

qwen-plus-2025-07-28

MU1 x 4

$29.704

$14,367.128

Qwen-Plus-2025-12-01

qwen-plus-2025-12-01

MU1 x 4

$29.704

$14,367.128

GLM

Model name

Model code

Model unit specification

Hourly price ($)

Monthly price ($)

GLM-5

glm-5

MU3 x 8

$150.72

$72,577.152

GLM-4.7

glm-4.7

MU6 x 16

$55.008

$26,599.92

DeepSeek

Model name

Model code

Model unit specification

Hourly price ($)

Monthly price ($)

DeepSeek-V4-Flash

deepseek-v4-flash

MU1 x 8

$59.408

$28,734.256

DeepSeek-V3.2

deepseek-v3.2

MU2 x 8

$69.312

$33,044.72

Other models

Model name

Model code

Model unit specification

Hourly price ($)

Monthly price ($)

MiniMax-M2.5

MiniMax-M2.5

MU1 x 8

$59.408

$28,734.256

Kimi-K2.5

kimi-k2.5

MU2 x 8

$69.312

$33,044.72

Multimodal
Qwen-VL

Model name

Model code

Model unit specification

Hourly price ($)

Monthly price ($)

Qwen3-VL-235B-A22B-Instruct

qwen3-vl-235b-a22b-instruct

MU1 x 4

$29.704

$14,367.128

Qwen3-VL-235B-A22B-Thinking

qwen3-vl-235b-a22b-thinking

MU1 x 4

$29.704

$14,367.128

Qwen3-VL-32B-Instruct

qwen3-vl-32b-instruct

MU2 x 8

$69.312

$33,044.72

Qwen3-VL-8B-Instruct

qwen3-vl-8b-instruct

MU1 x 2

$14.852

$7,183.564

Qwen3-VL-Flash-2025-10-15

qwen3-vl-flash-2025-10-15

MU1 x 4

$29.704

$14,367.128

Qwen3-VL-Plus-2025-09-23

qwen3-vl-plus-2025-09-23

MU1 x 4

$29.704

$14,367.128

Qwen-VL-Max-2025-08-13

qwen-vl-max-2025-08-13

MU6 x 4

$13.752

$6,649.98

Qwen-VL-OCR-2025-11-20

qwen-vl-ocr-2025-11-20

MU6 x 4

$13.752

$6,649.98

Qwen Omni

Model name

Model code

Model unit specification

Hourly price ($)

Monthly price ($)

Qwen3.5-Omni-Flash

qwen3.5-omni-flash

MU8 x 1

$6.464

$3,080.477

MU9 x 1

$7.014

$3,383.024

Qwen3.5-Omni-Plus

qwen3.5-omni-plus

MU9 x 8

$56.112

$27,064.192

Model type:

  • Instruct - The model is deployed for inference in non-thinking mode.

  • Thinking - The model is deployed for inference in thinking mode.

By model token usage

Fee = Number of input tokens × Unit price for input + Number of output tokens × Unit price for output (Minimum billing unit: 1 token)

  • Billing by model token usage is supported only after you complete efficient SFT training on the following foundation models and create a custom model.

Singapore

Foundation model

Model code

Input

CNY/1,000 tokens

Output

CNY/1,000 tokens

Qwen3-14B

qwen3-14b

$0.00035

Non-thinking mode: $0.0014

Thinking mode: $0.0042

Response example

The command returns the following:

{
  "request_id": "f2ae64f7-83cc-410c-bc0b-840443f7eb86",
  "output": {
    "deployed_model": "emo-35b3f106-sample01",
    "gmt_create": "2025-06-17T11:00:38.68",
    "gmt_modified": "2025-06-17T11:00:38.68",
    "status": "PENDING",
    "model_name": "emo",
    "base_model": "emo",
    "base_capacity": 1,
    "capacity": 1,
    "ready_capacity": 0,
    "workspace_id": "llm-v71tlv3d***",
    "charge_type": "post_paid",
    "creator": "175805416***",
    "modifier": "175805416***"
  }
}

Response parameters

Parameter

Type

Description

request_id

String

The ID of the request.

output

Object

Details of the deployment task.

deployed_model

String

A unique identifier for the deployed model. This ID is used for API operations, such as querying deployment details, modifying deployment rate limiting, deployment scaling, and deleting deployments, and is also passed as an SDK parameter when you invoke the model.

gmt_create

String

The creation time of the deployment task.

gmt_modified

String

The last modification time of the deployment task.

status

String

The status of the deployment task.

  • PENDING: The task is being created.

  • UPDATING: The task is being updated.

  • RUNNING: The deployment task is running, and the deployed model can process requests.

  • STOPPED: The deployment task is stopped and is not billed.

  • DELETING: The task is being deleted.

  • FAILED: The creation or update of the task failed.

model_name

String

The name of the model used in the deployment task.

base_model

String

The ID of the base model used in the deployment task.

base_capacity

Number

The minimum number of resource units required to run the base model.

capacity

Number

The number of resource units used by the deployment task.

ready_capacity

Number

The number of resource units that are ready to process requests immediately. Resource initialization speed or hardware status can limit this value.

workspace_id

String

The ID of the deployment task's workspace.

charge_type

String

The billing method for the deployment task.

post_paid: Post-paid.

creator

String

The UID of the user who created the deployment task.

modifier

String

The UID of the user who last modified the deployment task.

plan

String

The billing model for the deployment task. This parameter is not returned for some billing models.

Returned only for Model Unit deployments.

model_unit_spec

String

The model unit specification.

enable_thinking

Boolean

Specifies if Thinking mode is enabled. This feature is only available for certain models.

max_context_length

Number

The maximum context length.

rpm_limit

String

The maximum number of requests per minute (RPM).

tpm_limit

Number

The maximum number of tokens per minute (TPM).

Returned only for provisioned throughput (PTU) deployments.

ptu_capacity

Object

This parameter takes effect only when "plan": "ptu" is set.

Example: "ptu_capacity": { "input_tpm": 10000, "output_tpm": 1000 }.

ptu_capacity.input_tpm

Number

The maximum number of input tokens per minute (TPM) for the deployed model. This feature is supported by all models.

ptu_capacity.output_tpm

Number

The maximum number of output tokens per minute (TPM) for the deployed model. This feature is supported by all models.

ptu_capacity.thinking_output_tpm

Number

The maximum number of thinking output tokens per minute (TPM) for the deployed model. This feature is only available for certain models.

Error response

Response example

{
    "request_id": "ca218d57-b91b-46b2-bd35-c41c6287bcf4",
    "message": "Model: qwen-plus-20230703-cx7f not found!",
    "code": "NotFound"
}

Response parameters

Parameter

Type

Description

request_id

String

The unique ID of the request.

code

String

The error code.

message

String

The error message.

The following errors can occur when a request fails:

Error code

Error message

Reason

NotFound

Model: xxx not found!

  • You are creating a deployment task with a model that does not exist.

  • You are querying, updating, or deleting a deployment task with a model that does not exist.

Conflict

Deployed model xxx already exists, please specify a suffix.

You are creating a deployment task with a suffix that is already in use.

InvalidParameter

Invalid capacity (xx), capacity must be larger than or equal to 0 and multiples of 1 and less than 1000!

You are creating or updating a deployment task with an invalid number of capacity units.