Pricing overview | Activating Alibaba Cloud Model Studio does not incur any fees. Fees for model inference (calling) are generated when you call models to perform tasks such as text generation, image generation, and speech synthesis. View bills: Go to the Bill Details and Cost Analysis pages. View statistics: Go to the Model Observation (Singapore or Beijing) page. | ||||||||||||||||||||||||
Billable items | Model inference (calling)
| ||||||||||||||||||||||||
Model inference (calling) | Billing overview & free quotaFor model call prices, see Models. For limits such as requests per minute (RPM) and tokens per minute (TPM), see Rate limits. Note A free quota is available only in the Singapore region. For more information about how to claim a free quota and view the remaining free quota, see New user free quota. On the Model Observation (Singapore or Beijing) page, view the number of calls and tokens consumed for a specific model. Subscription (savings plan)You can purchase one or more savings plans to offset inference fees incurred after your free quota is used up. After the savings plan is exhausted, the system will start deducting fees from your account balance. Large language model
Wan model
Batch discounts (Singapore region only)The Batch Inference (Batch API) service asynchronously processes large datasets at 50% of the cost of real-time calls. You can submit files through the console or the API to create batch tasks. The system processes data during off-peak hours and returns the results when the task is complete or the maximum wait time is reached. Supported modelsText generation models: qwen-max, qwen-plus, qwen-turbo LimitsBatch inference does not support services or discounts such as subscription (savings plan), free quota, or Context Cache. Context cache discountsIncludes implicit cache and explicit cache:
| ||||||||||||||||||||||||
FAQ | GeneralHow to pay or top up my account?Model calling fees are automatically deducted. Bills are generated hourly. For more information, see Introduction to payment methods. Subscription method: Model inference (call): Click here to purchase LLM inference savings plan. How to renew my service?After March 15, 2024, Model Studio upgraded its commercial services. All subscription services were changed to pay-as-you-go services. Therefore, you do not need to manually renew your services. The pay-as-you-go billing method is used automatically. How to stop billing?
You can set a monthly spending alert. Set the alert threshold to a low value. Alibaba Cloud will notify you when unexpected charges occur to help you avoid further losses. How to view the number of calls and tokens consumed?How are tokens calculated?Tokens are the basic units that a model uses to represent text. You can think of them as characters or words.
Different LLMs may use different methods to chunk tokens. You can use a SDK to view the token data chunked by a Qwen model on your local machine. View the token data chunked by a Qwen model: The local tokenizer helps estimate the number of tokens in your text. However, the result is for reference only and may not match the server-side count exactly. For more information about the Qwen tokenizer, see the tokenizer reference. What to do if a model call fails?Refer to the Error messages document for the corresponding solution. Billing rulesWhy does my free quota not decrease after I call a model? (Singapore only)The free quota data is updated hourly. During peak hours, there may be a delay of up to one hour. Therefore, you need to view the remaining quota one hour after the model call is complete. How are tokens that exceed the free quota billed? (Singapore only)You are billed based on the actual number of tokens consumed. Because the unit price (input or output cost) is per 1 million tokens, the formula is: Fee = Actual number of tokens consumed / 1,000,000 × Unit price. For example, the input cost of qwen-vl-max is $0.80 per 1 million tokens, and the remaining free quota is 50,000 tokens. In a call where the input is 50,400 tokens, the fee for tokens that exceed the free quota is 400 / 1,000,000 × $0.80. How are multi-turn conversations billed?In a multi-turn conversation, the input and output from historical conversations are billed as input tokens for the new turn. Are model applications charged?You are not charged for creating an application. However, if you call the application for a Q&A pair, you are charged a model calling fee based on the model that is called. Why is my LLM inference savings plan not used for deduction?If the free quota is not used up, no bill is generated and no fee is incurred. In this case, the savings plan is not used for deduction. The savings plan is used for deduction after the free quota is used up and a bill is generated. Overdue paymentsWhat are the impacts of an overdue payment?If your account has an overdue payment, you cannot make model calls even if you have a free quota (Singapore only) or resource plans. You can go to the Recharge page to top up your account. API call error: How to quickly resolve issues with service activation or overdue payments?1. Service not activated Use your Alibaba Cloud account to go to the Model Studio console (Singapore or Beijing) and activate the model service of Model Studio.
2. Insufficient account balance
3. Set a spending alert to prevent repeated errors
BillsAfter model inference, why can't I find the relevant bills on the Bill Details page? Possible reasons are:
How to view the costs of all Model Studio services?On the Cost Analysis page, set Cost Type to Pretax Amount, set Time Unit to Month, select a time range, and set Product Name to Alibaba Cloud Model Studio. You can then view the costs of Model Studio within the selected time range.
How to view the costs of the model inference service?On the Cost Analysis page, set Cost Type to Pretax Amount, set Time Granularity to Month, select a time range, and set Product Detail to Model Studio Foundation Model Inference. You can then view the total cost of model inference within the selected time range.
How to view the inference cost of a specific model?Take qwen-max as an example. On the Bill Details page, select a Billing Month. Set Commodity Name to Model Studio Foundation Model Inference and click Search. In the Instance ID column, find all instances related to qwen-max. Sum the pretax amounts for these instances to get the total inference fee for the qwen-max model in the selected billing cycle.
How to export and view the number of consumed tokens in a detailed bill?On the Bill Details page, set Statistics Item to Billable Item and export the bill. You can view the token usage in the bill.
How to reconcile detailed bills for models?Bills for model inference, deployment, and training that are generated after September 7, 2024 can be reconciled based on the ApiKeyID, workspace ID, model name, input/output type, calling channel, and tags of instances. On the Bill Details page, select a Billing Month. Set Commodity Name to Model Studio Foundation Model Inference and click Search. Download the search results to your local machine and reconcile the bills based on the content in the Instance ID column. A complete Asset/Resource Instance ID, such as A complete Instance ID, such as A complete instance tag, such as Go to the Model Studio API Key Management page and confirm the API key that corresponds to the ApiKeyID to reconcile bills based on the API key. Go to the Workspace Management (Singapore or Beijing) page and confirm the workspace that corresponds to the workspace ID to reconcile bills based on the workspace. Calling channels include app, bmp, and assistant-api. app indicates that the model is called through an application. bmp indicates that the model is called through the Playground (Singapore or Beijing). assistant-api indicates that the model is called through the assistant API.
How are pay-as-you-go bills settled?Pay-as-you-go cloud resource bills are Not settled in real time. Instead, the system first freezes the amount that is consumed but not yet settled from the account's available credit. At the beginning of the next month, after the final monthly bill is issued, the bill for the previous month is actually deducted. Cost controlHow to set an alert for high spending?You can set a monthly spending alert in the Expenses and Costs center.
How to limit the usage of model calls?
| ||||||||||||||||||||||||
Pricing overview | Activating Alibaba Cloud Model Studio does not incur any fees. Fees for model inference (calling) are generated when you call models to perform tasks such as text generation, image generation, and speech synthesis. View bills: Go to the Bill Details and Cost Analysis pages. View statistics: Go to the Model Observation (Singapore or Beijing) page. |
Billable items | |
Model inference (calling) | Subscription (savings plan)Batch discounts (Singapore region only)Context cache discounts |
FAQ | GeneralHow to pay or top up my account?How to renew my service?How to stop billing?How to view the number of calls and tokens consumed?How are tokens calculated?What to do if a model call fails?Billing rulesWhy does my free quota not decrease after I call a model? (Singapore only)How are tokens that exceed the free quota billed? (Singapore only)How are multi-turn conversations billed?Are model applications charged?Why is my LLM inference savings plan not used for deduction?Overdue paymentsWhat are the impacts of an overdue payment?API call error: How to quickly resolve issues with service activation or overdue payments?BillsHow to view the costs of all Model Studio services?How to view the costs of the model inference service?How to view the inference cost of a specific model?How to export and view the number of consumed tokens in a detailed bill?How to reconcile detailed bills for models?How are pay-as-you-go bills settled?Cost controlHow to set an alert for high spending?How to limit the usage of model calls? |








