How to choose a deployment mode - Alibaba Cloud Model Studio

The deployment mode determines the computing region for model inference and the location where static data is stored. Choose the appropriate deployment mode to optimize network latency and ensure data processing complies with applicable boundaries.

Compare deployment modes

The deployment mode determines the available computing power and the execution region for model inference. The region also determines where static data is stored. Currently, these two are pre-bound and do not support independent configuration.

To reduce network latency and improve model response speed, select a deployment mode that corresponds to a region near your primary users or business applications:

Deployment mode	Bound region (data storage)	Model inference computing scope
Global	US (Virginia)	Global dynamic scheduling
International	Singapore	Global dynamic scheduling (excluding Mainland China)
United States	US (Virginia)	Limited to the US
Mainland China	China (Beijing)	Limited to Mainland China

Important

In the Global and International deployment modes, you are responsible for ensuring the legality of cross-border data processing. These modes involve cross-border computation. Cross-region inference requests are received by the frontend endpoint of the selected region. Static data, such as prompt inputs and model outputs, generated during a model invocation is processed only transiently and is not persistently stored in the region where the compute node resides. Data is encrypted in transit.

Usage

Models in Global deployment mode

Before using these models, configure the request address, API key, and model name:

Request address (base URL): The global deployment mode is bound to the US (Virginia) region. Use the dashscope-us.aliyuncs.com domain name. The following are examples of request addresses. For other APIs, see the corresponding documentation:
- OpenAI Chat Completions API: https://dashscope-us.aliyuncs.com/compatible-mode/v1
- DashScope: https://dashscope-us.aliyuncs.com/api/v1
API key: Go to the Key Management (Virginia).
Model name: See Model list and choose a model for the global deployment mode.

Models in International deployment mode

Before using these models, configure the request address, API key, and model name:

Request address (base URL): The international deployment mode is bound to the Singapore region. Use the dashscope-intl.aliyuncs.com domain name. The following are examples of request addresses. For other APIs, see the corresponding documentation:
- OpenAI Chat Completions API: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
- DashScope: https://dashscope-intl.aliyuncs.com/api/v1
API key: Go to the Key Management (Singapore).
Model name: See Model list and choose a model for the international deployment mode.

Models in United States deployment mode

Before using these models, configure the request address, API key, and model name:

Request address (base URL): The US deployment mode is bound to the US (Virginia) region. Use the dashscope-us.aliyuncs.com domain name. The following are examples of request addresses. For other APIs, see the corresponding documentation:
- OpenAI Chat Completions API: https://dashscope-us.aliyuncs.com/compatible-mode/v1
- DashScope: https://dashscope-us.aliyuncs.com/api/v1
API key: Go to the Key Management (Virginia).
Model name: See Model list and choose a model for the US deployment mode (with the -us suffix).

Models in Mainland China deployment mode

Before using these models, configure the request address, API key, and model name:

Request address (base URL): The Mainland China deployment mode is bound to the China (Beijing) region. Use the dashscope.aliyuncs.com domain name. The following are examples of request addresses. For other APIs, see the corresponding documentation:
- OpenAI Chat Completions API: https://dashscope.aliyuncs.com/compatible-mode/v1
- DashScope: https://dashscope.aliyuncs.com/api/v1
API key: Go to the Key Management (Beijing).
Model name: See Model list and choose a model for the Mainland China deployment mode.

Asynchronous tasks

For asynchronous tasks, such as image generation and video generation, you must use the same service domain name and API key for all subsequent operations as were used when creating the task. Otherwise, an error occurs.

The following example shows how to create an image generation task and query its result in global deployment mode:

# Create task (global deployment mode, service domain name dashscope-us.aliyuncs.com)
curl --location 'https://dashscope-us.aliyuncs.com/api/v1/services/aigc/image-generation/generation' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'X-DashScope-Async: enable' \
--data '{
    "model": "wan2.6-t2i",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": "A flower shop with exquisite windows, beautiful wooden doors, and flowers."
                    }
                ]
            }
        ]
    },
    "parameters": {
        "n": 1
    }
}'

# Response example: {"output":{"task_id":"abc123..."},"request_id":"..."}

# Query task (must use the same service domain name)
curl -X GET https://dashscope-us.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

# [Error] Querying with a different service domain name causes an error
curl -X GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Region information

A region is the physical location of the node where you access Alibaba Cloud Model Studio services. The region IDs are:

Singapore: ap-southeast-1
US (Virginia): us-east-1
China (Beijing): cn-beijing

The platform features supported in each region are:

Section	Feature	Singapore	US (Virginia)	China (Beijing)
Usage	Real-time inference	Supported	Supported	Supported
	Batch inference	Supported	Not supported	Supported
	Playground	Supported	Supported	Supported
Management	Monitoring	Supported	Supported	Supported
	Alerting	Supported	Not supported	Supported
	Transmission security	Supported	Supported	Supported
	Permission management	Supported	Supported	Supported
Optimization	Fine-tuning	Supported	Supported	Supported

References

Data processing and cross-border transfer terms
Qwen API reference
Model list: View supported models and context information for each deployment mode.
Model invocation pricing: View price differences for each deployment mode.
Rate limits: View RPM and TPM limits for each deployment mode.
Get an API key: Create and manage API keys for each deployment mode.