Activate Model Studio in the Singapore region to receive free quota for each model.
Free quota is only available for models in the Singapore region. Other regions do not offer free quota.
Rules
Validity period
Free quota is valid for 30 to 90 days from activation or model approval. After expiration or depletion, continued inference incurs charges.
Starting from 3:00 UTC on September 8, 2025, the validity period for first-time activations is adjusted to 90 days. Users who activated the service before this date are not affected. For more information, see Validity period change for new user free quota.
Scope
Free quota only offsets real-time inference costs. It does not offset fees for:
-
Custom models (fine-tuned and deployed models)
Notes
Free quota is shared across the account and all RAM users.
Example: Total quota for qwen-max is 1,000,000 tokens. If the account uses 100,000 tokens and a RAM user uses 200,000 tokens, the remaining quota is 700,000 tokens.
Get your free quota
Go to the Model Studio console - Singapore region. Accept Terms of Service to activate and receive your free quota. Free quota is only available for the Singapore region. Other regions do not offer free quota.
If Terms of Service don't appear, you've already activated and received free quota.
View remaining quota
View remaining free quota using either method.
Method 1: Usage page
On the Model Usage page, click the Free Quota tab to view remaining quota and validity period for all models.
Method 2: Models page
-
After you activate Model Studio, go to the Models page (Singapore) in the console. Click the target model to view the remaining quota on its product page.
24,098/1,000,000: 24,098 tokens remaining of 1,000,000 total.

Use your quota
Real-time calls (Singapore region) automatically use free quota. For more information, see Get started with Model Studio.
Prevent overage charges
By default, calls continue after quota exhaustion and incur charges. Enable Free Quota Only to block calls when quota depletes, returning error AllocationQuota.FreeTierOnly.
How to enable
Method 1: Usage page
For a single model:
-
On the Model Usage page in the console, click the Free Quota tab.
-
Find the target model in the list and turn on the Free Quota Only switch in the Actions column. (This switch is only available for models that still have a free quota.)
In batch:
-
On the Model Usage page in the console, click the Free Quota tab.
-
Click Free Quota Only Batch Operation and select Batch Enable from the drop-down menu.
-
Check the target models and click Batch Enable. To enable this feature for all eligible models that do not have it enabled, click Enable for All Models.
-
In the confirmation dialog box, click Enable Free Quota Only.

Method 2: Enable on the Models page
Take Qwen3-Coder-Plus as an example. Go to the Qwen3-Coder-Plus product page (Singapore region) and turn on the Free Quota Only switch.

If the switch isn't displayed, the quota is exhausted, expired, or the model doesn't offer free quota.
How to disable
This feature defaults to disabled. Once enabled, you can disable Free Quota Only only when the console shows quota exhausted.
Console quota updates hourly (not real-time).
FAQ
Are there notifications when the free quota runs out?
No, there is no alert when quota runs out.
What happens when the free quota is used up?
If Free quota only is not enabled, calls continue and excess tokens are billed per Models pricing. Charges deduct from your account and may cause overdue status.
Overdue status blocks all calls, even with remaining quota.
Before calling models, check quota and set up budget management.
Why am I being charged?
Possible reasons:
-
You used a model without free quota (e.g., qwen-max and qwen-max-latest have separate quotas).
-
Free quota doesn't cover OpenAI compatible - Batch (file input) fees.
-
Console updates hourly, so displayed quota may lag actual usage. Check again later for current status.
To confirm billing, see How can I check which model incurred charges? and How can I view model call records?.
How can I check which model incurred charges?
One hour after calling a model, on the Bill Details page, select Billing Month, set Commodity Name to Model Studio Foundation Model Inference, and click Search. View charged models in the Instance ID column.

How can I view model call records?
One hour after you call a model, go to the Monitoring (Singapore or Beijing) page. Set the query conditions, such as the time range and workspace. Then, in the Models area, find the target model and click Monitor in the Actions column to view the model's call statistics. For more information, see the Monitoring document.
Data is updated hourly. During peak periods, there may be an hour-level latency.

How to avoid unexpected charges?
After quota exhaustion, charges deduct from your balance. To reduce unexpected charges:
-
Go to the API-Key (Singapore) or API-Key (Beijing) page and delete all API keys to prevent further calls and charges.

-
Set a spending limit alert to receive email notifications when monthly spending exceeds the threshold.
-

Why did my call fail even though I have remaining quota?
An overdue balance blocks all calls, even with remaining quota.
Why can't I see my free quota and its validity period?
If the quota column shows No free quota or the Free Quota area is missing, the quota has expired.
The Beijing region does not offer a free quota.
