Function Compute provides CPU and GPU-accelerated instances. CPU instances handle general-purpose workloads such as web services and data processing. GPU-accelerated instances handle compute-intensive workloads where parallelism matters: artificial intelligence (AI) inference, deep learning, audio and video processing, and image editing.
Choose an instance type
CPU functions run exclusively on Elastic Instances. GPU functions support all three instance types: Elastic Instance, Provisioned Instance, and Mixed Mode. You can switch between types at any time without service interruption.
The following table summarizes the key differences:
| Elastic Instance | Provisioned Instance | Mixed Mode | |
|---|---|---|---|
| Billing model | Pay-as-you-go | Subscription | Pay-as-you-go + subscription |
| Cold starts | Yes (mitigatable) | No | Partial |
| Best for | Variable traffic; cost-sensitive workloads | High-utilization, latency-sensitive, or stable-billing workloads | Workloads with significant traffic spikes on a stable baseline |
| GPU functions | Yes | Yes (Ada, Ada.2, Ada.3, Hopper, Xpu.1 series only) | Yes |
| CPU functions | Yes | No | No |
Provisioned Instances can only be bound to GPU functions in the Ada, Ada.2, Ada.3, Hopper, or Xpu.1 series.
Elastic Instance
When the minimum number of instances is set to 0, instances scale with request volume and are released when idle. This pay-as-you-go model charges only for active compute time.
Consider Elastic Instances when:
Traffic is variable or unpredictable
Cost efficiency matters more than guaranteed capacity
You are running CPU functions (the only supported instance type for CPU)
Cold starts: Cold starts can occur. To reduce cold-start latency, set the minimum number of instances to 1 or more. This pre-allocates elastic resources so instances are ready to handle incoming requests immediately.
Billing: Charges apply for active Elastic Instances and instances in Shallow Hibernation. When the minimum number of instances is set to 1 or more, enable Shallow Hibernation. In this state, vCPU resources are not charged, and GPU resources are billed at one-fifth the active rate. For details on active and Shallow Hibernation states, see Elastic Instance.
Provisioned Instance
Purchase a Provisioned Resource Pool in advance, then allocate a specific number and type of provisioned instances to your function. This delivers predictable costs and guaranteed capacity.
Consider Provisioned Instances when:
Resource utilization is consistently high
Latency requirements are strict and cold starts are unacceptable
Billing must be stable and predictable
You are running GPU functions in the Ada, Ada.2, Ada.3, Hopper, or Xpu.1 series
After purchasing a monthly provisioned resource pool, the platform also grants a boost instance quota at no extra charge.
Cold starts: No cold starts. Requests within your allocated capacity receive real-time responses. Maximum concurrent requests = (number of allocated Provisioned Instances) × (instance concurrency) + boost instance quota. Requests that exceed this limit are throttled.
Billing: The total subscription fee for all purchased provisioned resource pools. The boost instance quota is not billed.
Mixed Mode
Mixed Mode applies to GPU functions only. The provisioned resource pool handles steady-state traffic with no cold starts. When requests exceed provisioned capacity, the system auto-scales by launching Elastic Instances to absorb bursts.
Consider Mixed Mode when:
Traffic has a stable baseline with significant spikes
You need guaranteed capacity for steady-state traffic but want elastic overflow for bursts
You are running GPU functions
Cold starts: Partial. Requests handled within the provisioned pool run without cold starts. New Elastic Instances launched during scale-out do experience cold starts.
Billing: Both subscription and pay-as-you-go apply:
Provisioned portion: Billed against the purchased Provisioned Resource Pool quota.
Elastic portion: Instances launched beyond the provisioned quota are billed on a pay-as-you-go basis at the same rates as active and Shallow Hibernation Elastic Instances.
Instance specifications
CPU instances
| vCPU (cores) | Memory (MB) | Max code package size (GB) | Max execution duration (s) | Disk size | Max bandwidth (Gbps) |
|---|---|---|---|---|---|
| 0.05–16 (multiples of 0.05) | 128–32768 (multiples of 64) | 10 | 86400 | 512 MB (default) or 10 GB | 5 |
The vCPU-to-memory ratio must be between 1:1 and 1:4 (GB). For example, 4 vCPUs require between 4 GB and 16 GB of memory.
GPU instances
Hardware specifications
See Supported GPU instance families in ACS.
| Instance type | GPU memory | FP16 | FP32 | Max cards per instance |
|---|---|---|---|---|
| fc.gpu.tesla.1 | 16 GB | 65 TFLOPS | 8 TFLOPS | 4 |
| fc.gpu.ampere.1 | 24 GB | 125 TFLOPS | 31.2 TFLOPS | 8 |
| fc.gpu.ada.1 | 48 GB | 119 TFLOPS | 60 TFLOPS | — |
| fc.gpu.ada.2 | 24 GB | 166 TFLOPS | 83 TFLOPS | — |
| fc.gpu.ada.3 | 48 GB | 148 TFLOPS | 73.5 TFLOPS | — |
| fc.gpu.hopper.1 | 96 GB | 148 TFLOPS | 44 TFLOPS | — |
| fc.gpu.hopper.2 | 141 GB | 148 TFLOPS | 44 TFLOPS | — |
| fc.gpu.blackwell.1 | 32 GB | 104.8 TFLOPS | 104.8 TFLOPS | — |
| fc.gpu.xpu.1 | 96 GB | 123 TFLOPS | 61.5 TFLOPS | 16 |
- Setting the instance type to
g1is equivalent tofc.gpu.tesla.1. - Tesla series instances are available in: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.
- Ada series instances are available in: China (Beijing), China (Hangzhou), China (Shanghai), China (Shenzhen), Singapore, and US (Virginia).
vCPU and memory configuration (per card)
For multi-card instances: total vCPUs = vCPUs per card × number of cards; total memory = memory per card × number of cards.
| Instance type | vCPU per card | Memory per card |
|---|---|---|
| fc.gpu.tesla.1 | 4 cores | 4–16 GB (4096–16384 MB), in 4 GB increments |
| 8 cores | 8–32 GB (8192–32768 MB) | |
| 16 cores | 16–64 GB (16384–65536 MB) | |
| fc.gpu.ampere.1 | 8 cores | 8–32 GB (8192–32768 MB) |
| 16 cores | 16–64 GB (16384–65536 MB) | |
| fc.gpu.ada.1, fc.gpu.ada.2, fc.gpu.ada.3 | 4 cores | 16–32 GB (16384–32768 MB) |
| 8 cores | 32–64 GB (32768–65536 MB) | |
| 16 cores | 64–120 GB (65536–122880 MB) | |
| fc.gpu.hopper.1 | 4 cores | 16–32 GB (16384–32768 MB) |
| 8 cores | 32–64 GB (32768–65536 MB) | |
| 16 cores | 64–96 GB (65536–98304 MB) | |
| 24 cores | 96–120 GB (98304–122880 MB) | |
| fc.gpu.hopper.2 | 4 cores | 16–32 GB (16384–32768 MB) |
| 8 cores | 32–64 GB (32768–65536 MB) | |
| 16 cores | 64–128 GB (65536–131072 MB) | |
| 24 cores | 96–248 GB (98304–253952 MB) | |
| fc.gpu.blackwell.1 | 4 cores | 16–32 GB (16384–32768 MB) |
| 8 cores | 32–64 GB (32768–65536 MB) | |
| 16 cores | 64–120 GB (65536–122880 MB) | |
| 24 cores | 96–184 GB (98304–188416 MB) | |
| fc.gpu.xpu.1 | 4 cores | 16–48 GB (16384–49152 MB) |
| 8 cores | 32–96 GB (32768–98304 MB) | |
| 12 cores | 48–120 GB (49152–122880 MB) |
Common GPU instance limits
| Max image size | Max execution duration (s) | Disk size | Max bandwidth (Gbps) |
|---|---|---|---|
| 15 GB (all ACR editions) | 86400 | 512 MB, or 10–200 GB in 10 GB increments | 5 |
GPU concurrency and regional quotas
An Ada.1 GPU has 48 GB of memory, and a Tesla series GPU has 16 GB of memory. Function Compute allocates the full memory of a GPU card to a single GPU container. With a default regional quota of 30 GPU cards:
At instance concurrency = 1, up to 30 inference requests run concurrently per region.
At instance concurrency = 5, up to 150 inference requests run concurrently per region.
Instance concurrency
Configure instance concurrency to run multiple tasks on a single instance, sharing CPU and memory resources. This improves resource utilization and reduces costs compared to running one task per instance. For configuration steps, see Configure instance concurrency.
How execution duration is measured
Execution duration for single-instance, single-concurrency
Single-instance, single request: Duration is measured from when the request arrives at the instance to when execution completes.
Execution duration for single-instance, multiple-concurrency
Single-instance, concurrent requests: Duration is measured from when the first request arrives to when the last request completes. Sharing a single instance across multiple requests reduces total billed duration.
What's next
For billing details, see Billing overview.
To specify the instance type via API, use the
instanceTypeparameter in CreateFunction.To set the instance type and specifications in the console, see Function creation.