View usage for Alibaba Cloud Model Studio models.
For more information about free quotas, see New user free quota. Manage free quota usage on the Free Quota page.
Availability
Regions: Singapore only. For more information, see Region and deployment mode.
Models: All models in the Model list.
View free quota usage
In the console
Go to the Model Usage: Free Quota page and select a model type tab to view the free quota usage for each model.
In the models table, locate specific models using search, sort, or filter. Turn on or off Free Quota Only individually or in a batch to manage free quota settings.
Free Quota Only: When enabled, the service automatically stops when the free quota is exhausted (returns a 403 error: AllocationQuota.FreeTierOnly). This prevents charges beyond the free quota. To continue using and pay for actual usage, keep this feature off. This feature is available only when your account has an unconsumed free quota. Once turned on, it cannot be turned off until the free quota fully runs out. Free quotas are billed at minute-level granularity, and the data displayed in the console may be delayed. Refer to the free quota value shown in the console.
View model usage
The console currently only supports free quota usage. To view detailed usage statistics, go to the Bill Details page and export the bill to view token usage.
Usage units
In Model Studio, usage statistics for different models are:
Type | Subcategory | Unit | Billing (invocation) |
Large language model | Billed by the number of tokens for input and output. | ||
Visual model | Images | Billed by the number of images successfully generated. | |
Seconds | Billed by the number of video seconds successfully generated. | ||
Speech model | Second, character, or token | May be billed by audio duration (seconds), corresponding text characters, or tokens, depending on the model. | |
Omni-modal model | Token | Text is billed by token count. Other modalities (audio, image, video) are billed by their corresponding token counts. | |
Embedding model | Token | Billed by the number of tokens in the input text. | |
Text embedding |
Going live
Recommendations for managing model usage:
Control model output length: Limit the maximum length of content generated per invocation, and thus control costs, by reasonably limiting the thinking length and setting the
max_tokensparameter.Select models based on task type: For simple jobs such as categorization and summary, prioritize lower-cost, lightweight models (such as
qwen-turbo) over powerful but more expensive models (such asqwen-max).Monitoring and alerting: Monitor usage trends. Set usage alerts to receive timely notifications for abnormal usage.
Optimize prompts: Concise and clear prompts improve model output quality and reduce unnecessary input tokens.
Use batch inference: For non-real-time, large-batch processing tasks, batch inference is often more cost-effective than real-time invocation.
Glossary
Noun | Explanation |
Token | Large language models process input and output in tokens. A token can be:
Based on experience, on average, 1 Chinese character corresponds to approximately 1.5-2 Tokens; 1 English letter corresponds to approximately 0.25 Tokens; 1 English word corresponds to approximately 1.3 Tokens:
Each model has a maximum input and output token count (see Model list). Exceeding this limit causes failure. |
Real-time | Refers to all direct and indirect invocations of a model, primarily covering the following scenarios:
|
Batches | Large-scale data processing performed offline for scenarios that do not require real-time responses, using the OpenAI compatible - Batch (file input) API. |
FAQ
Q: How do I view the total token usage for my Alibaba Cloud account?
A: Use your Alibaba Cloud account to access the Bill Details page and export the bill.