All Products
Search
Document Center

Alibaba Cloud Model Studio:Select region and deployment mode

Last Updated:Jan 04, 2026

When you use the model service in Alibaba Cloud Model Studio, choose the correct region and deployment mode. They affect the service's response speed, cost, available models, and default rate limits.

  • Region: Determines the access point (endpoint/base URL) of your model service and the storage location for static data, such as prompts and model outputs.

  • Deployment mode: Determines where model inference is computed.

Currently, regions and deployment modes are pre-set by the system and cannot be freely combined.

Choose a region

Consider the following:

  • Proximity: Choose the region closest to your primary callers. This usually reduces network latency and improves the model's response speed.

  • Available platform features: Model Studio offers different features in different regions, see the following table.

    Category

    Feature

    China (Beijing)

    US

    Singapore

    Usage

    Real-time inference

    Supported

    Supported

    Supported

    Batch inference

    Supported

    Not supported

    Supported

    Playground

    Supported

    Supported

    Supported

    Management

    Model monitoring

    Supported

    Supported

    Supported

    Model alerting

    Supported

    Not supported

    Supported

    Transmission security

    Supported

    Supported

    Supported

    Permission management

    Supported

    Supported

    Supported

    Optimization

    Model fine-tuning

    Supported

    Supported

    Supported

Supported regions

Name

ID

Data storage location

Singapore

ap-southeast-1

Singapore

US (Virginia)

us-east-1

Virginia

China (Beijing)

cn-beijing

Beijing

Choose a deployment mode

Different deployment modes vary in their supported models, billing for model calls, and rate limits. Consider the following recommendations:

  • Global: Choose this mode to use the global computing power resource pool. This improves model availability and increases the default rate limits.

  • International: Choose this mode to use compute resources outside Mainland China.

  • US: Choose this mode if you want data processing and inference to occur only within the United States.

  • Mainland China: Choose this mode to use compute resources and process data within Mainland China.

Supported deployment modes

Deployment mode

Data storage region

Inference compute scope

Cross-border computing

Global

US (Virginia)

Global

Yes (You are responsible for ensuring the legality of cross-border data processing.)

International

Singapore

Global (excluding Mainland China)

Yes (You are responsible for ensuring the legality of cross-border data processing.)

United States

US (Virginia)

United States only

No

Mainland China

China (Beijing)

Mainland China only

No

Important

In Global and International modes, the frontend endpoint in your selected region receives cross-region inference requests. Static data, such as prompts and model outputs, is processed only temporarily during inference. This data is not saved as persistent storage in the compute node's region. All data is encrypted during transmission.

Switch regions and deployment modes

  1. Go to the Model Studio console and click the region icon image in the upper-right corner of the page.

  2. Select an option based on your situation:

    • To use Mainland China mode: Select the China (Beijing) region.

    • To use US mode or Global mode: Select the US region. The deployment mode depends on the model name you use (check in the model list).

      • Model name with a "-us" suffix (uses US mode): For example, qwen-flash-us and qwen-plus-us.

      • Model name without a "-us" suffix (uses Global mode): For example, qwen-flash and qwen-plus.

    • To use International mode: Select the Singapore region.

References

Data Processing Addendum