All Products
Search
Document Center

Alibaba Cloud Model Studio:Model usage

Last Updated:Mar 24, 2026

View usage for Alibaba Cloud Model Studio models.

For more information about free quotas, see New user free quota. Manage free quota usage on the Free Quota page.

Availability

View free quota usage

In the console

  1. Go to the Model Usage: Free Quota page and select a model type tab to view the free quota usage for each model.

  2. In the models table, locate specific models using search, sort, or filter. Turn on or off Free Quota Only individually or in a batch to manage free quota settings.

Note

Free Quota Only: When enabled, the service automatically stops when the free quota is exhausted (returns a 403 error: AllocationQuota.FreeTierOnly). This prevents charges beyond the free quota. To continue using and pay for actual usage, keep this feature off. This feature is available only when your account has an unconsumed free quota. Once turned on, it cannot be turned off until the free quota fully runs out. Free quotas are billed at minute-level granularity, and the data displayed in the console may be delayed. Refer to the free quota value shown in the console.

View model usage

The console currently only supports free quota usage. To view detailed usage statistics, go to the Bill Details page and export the bill to view token usage.

Usage units

In Model Studio, usage statistics for different models are:

Type

Subcategory

Unit

Billing (invocation)

Large language model

Text generation

Token

Billed by the number of tokens for input and output.

Deep thinking

Visual understanding

Visual model

Image generation

Images

Billed by the number of images successfully generated.

Video generation

Seconds

Billed by the number of video seconds successfully generated.

Speech model

Speech synthesis

Second, character, or token

May be billed by audio duration (seconds), corresponding text characters, or tokens, depending on the model.

Real-time speech synthesis

Audio file recognition

Real-time speech recognition

Audio and video translation

Omni-modal model

Omni-modal

Token

Text is billed by token count. Other modalities (audio, image, video) are billed by their corresponding token counts.

Real-time multimodal

Embedding model

Multimodal embedding

Token

Billed by the number of tokens in the input text.

Text embedding

Going live

Recommendations for managing model usage:

  • Control model output length: Limit the maximum length of content generated per invocation, and thus control costs, by reasonably limiting the thinking length and setting the max_tokens parameter.

  • Select models based on task type: For simple jobs such as categorization and summary, prioritize lower-cost, lightweight models (such as qwen-turbo) over powerful but more expensive models (such as qwen-max).

  • Monitoring and alerting: Monitor usage trends. Set usage alerts to receive timely notifications for abnormal usage.

  • Optimize prompts: Concise and clear prompts improve model output quality and reduce unnecessary input tokens.

  • Use batch inference: For non-real-time, large-batch processing tasks, batch inference is often more cost-effective than real-time invocation.

Glossary

Noun

Explanation

Token

Large language models process input and output in tokens. A token can be:

  • Single character: such as A, I

  • Complete word: such as large, Model

  • Part of a long word: A long word is often split into multiple tokens. This splitting process is called tokenization.

Based on experience, on average, 1 Chinese character corresponds to approximately 1.5-2 Tokens; 1 English letter corresponds to approximately 0.25 Tokens; 1 English word corresponds to approximately 1.3 Tokens:

  • Alibaba Cloud Model Studio: approximately 4-5 Tokens

  • Hello World: approximately 2 Tokens

Each model has a maximum input and output token count (see Model list). Exceeding this limit causes failure.

Real-time

Refers to all direct and indirect invocations of a model, primarily covering the following scenarios:

Batches

Large-scale data processing performed offline for scenarios that do not require real-time responses, using the OpenAI compatible - Batch (file input) API.

FAQ

Q: How do I view the total token usage for my Alibaba Cloud account?

A: Use your Alibaba Cloud account to access the Bill Details page and export the bill.