When you use the model service in Alibaba Cloud Model Studio, choose the correct region and deployment mode. They affect the service's response speed, cost, available models, and default rate limits.
Region: Determines the access point (endpoint/base URL) of your model service and the storage location for static data, such as prompts and model outputs.
Deployment mode: Determines where model inference is computed.
Currently, regions and deployment modes are pre-set by the system and cannot be freely combined.
Choose a region
Consider the following:
Proximity: Choose the region closest to your primary callers. This usually reduces network latency and improves the model's response speed.
Available platform features: Model Studio offers different features in different regions, see the following table.
Category
Feature
China (Beijing)
US
Singapore
Usage
Real-time inference
Batch inference
Playground
Management
Model monitoring
Model alerting
Transmission security
Permission management
Optimization
Model fine-tuning
Supported regions
Name | ID | Data storage location |
Singapore |
| Singapore |
US (Virginia) |
| Virginia |
China (Beijing) |
| Beijing |
Choose a deployment mode
Different deployment modes vary in their supported models, billing for model calls, and rate limits. Consider the following recommendations:
Global: Choose this mode to use the global computing power resource pool. This improves model availability and increases the default rate limits.
International: Choose this mode to use compute resources outside Mainland China.
US: Choose this mode if you want data processing and inference to occur only within the United States.
Mainland China: Choose this mode to use compute resources and process data within Mainland China.
Supported deployment modes
Deployment mode | Data storage region | Inference compute scope | Cross-border computing |
Global | US (Virginia) | Global | Yes (You are responsible for ensuring the legality of cross-border data processing.) |
International | Singapore | Global (excluding Mainland China) | Yes (You are responsible for ensuring the legality of cross-border data processing.) |
United States | US (Virginia) | United States only | No |
Mainland China | China (Beijing) | Mainland China only | No |
In Global and International modes, the frontend endpoint in your selected region receives cross-region inference requests. Static data, such as prompts and model outputs, is processed only temporarily during inference. This data is not saved as persistent storage in the compute node's region. All data is encrypted during transmission.
Switch regions and deployment modes
Go to the Model Studio console and click the region icon
in the upper-right corner of the page.Select an option based on your situation:
To use Mainland China mode: Select the China (Beijing) region.
To use US mode or Global mode: Select the US region. The deployment mode depends on the model name you use (check in the model list).
Model name with a "-us" suffix (uses US mode): For example,
qwen-flash-usandqwen-plus-us.Model name without a "-us" suffix (uses Global mode): For example,
qwen-flashandqwen-plus.
To use International mode: Select the Singapore region.