Model Usage Statistics & Free Quota Overview - Model Studio

View usage for Alibaba Cloud Model Studio models.

For more information about free quotas, see New user free quota. Manage free quota usage on the Free Quota page.

Availability

Regions: Singapore only. For more information, see Region and deployment mode.
Models: All models in the Model list.

View free quota usage

In the console

Go to the Model Usage: Free Quota page and select a model type tab to view the free quota usage for each model.
In the models table, locate specific models using search, sort, or filter. Turn on or off Free Quota Only individually or in a batch to manage free quota settings.

Note

Free Quota Only: When enabled, the service automatically stops when the free quota is exhausted (returns a 403 error: AllocationQuota.FreeTierOnly). This prevents charges beyond the free quota. To continue using and pay for actual usage, keep this feature off. This feature is available only when your account has an unconsumed free quota. Once turned on, it cannot be turned off until the free quota fully runs out. Free quotas are billed at minute-level granularity, and the data displayed in the console may be delayed. Refer to the free quota value shown in the console.

View model usage

The console currently only supports free quota usage. To view detailed usage statistics, go to the Bill Details page and export the bill to view token usage.

Usage units

In Model Studio, usage statistics for different models are:

Type	Subcategory	Unit	Billing (invocation)
Large language model	Text generation	Token	Billed by the number of tokens for input and output.
	Deep thinking
	Visual understanding
Visual model	Image generation	Images	Billed by the number of images successfully generated.
	Video generation	Seconds	Billed by the number of video seconds successfully generated.
Speech model	Speech synthesis	Second, character, or token	May be billed by audio duration (seconds), corresponding text characters, or tokens, depending on the model.
	Real-time speech synthesis
	Audio file recognition
	Real-time speech recognition
	Audio and video translation
Omni-modal model	Omni-modal	Token	Text is billed by token count. Other modalities (audio, image, video) are billed by their corresponding token counts.
	Real-time multimodal
Embedding model	Multimodal embedding	Token	Billed by the number of tokens in the input text.
	Text embedding

Going live

Recommendations for managing model usage:

Control model output length: Limit the maximum length of content generated per invocation, and thus control costs, by reasonably limiting the thinking length and setting the max_tokens parameter.
Select models based on task type: For simple jobs such as categorization and summary, prioritize lower-cost, lightweight models (such as qwen-turbo) over powerful but more expensive models (such as qwen-max).
Monitoring and alerting: Monitor usage trends. Set usage alerts to receive timely notifications for abnormal usage.
Optimize prompts: Concise and clear prompts improve model output quality and reduce unnecessary input tokens.
Use batch inference: For non-real-time, large-batch processing tasks, batch inference is often more cost-effective than real-time invocation.

Glossary

Noun	Explanation
Token	Large language models process input and output in tokens. A token can be: Single character: such as `A`, `I` Complete word: such as `large`, `Model` Part of a long word: A long word is often split into multiple tokens. This splitting process is called tokenization. Based on experience, on average, 1 Chinese character corresponds to approximately 1.5-2 Tokens; 1 English letter corresponds to approximately 0.25 Tokens; 1 English word corresponds to approximately 1.3 Tokens: `Alibaba Cloud Model Studio`: approximately 4-5 Tokens `Hello World`: approximately 2 Tokens Each model has a maximum input and output token count (see Model list). Exceeding this limit causes failure.
Real-time	Refers to all direct and indirect invocations of a model, primarily covering the following scenarios: API invocation Model experience Model Studio applications (agent/workflow/agent orchestration applications, and nodes involving model invocation, such as LLM nodes, intent categorization nodes, and agent group nodes) in test and published states Assistant API invocation Application Invocation
Batches	Large-scale data processing performed offline for scenarios that do not require real-time responses, using the OpenAI compatible - Batch (file input) API.

FAQ

Q: How do I view the total token usage for my Alibaba Cloud account?

A: Use your Alibaba Cloud account to access the Bill Details page and export the bill.