All Products
Search
Document Center

Function Compute:Instance types and specifications

Last Updated:Apr 01, 2026

Function Compute provides CPU and GPU-accelerated instances. CPU instances handle general-purpose workloads such as web services and data processing. GPU-accelerated instances handle compute-intensive workloads where parallelism matters: artificial intelligence (AI) inference, deep learning, audio and video processing, and image editing.

Choose an instance type

CPU functions run exclusively on Elastic Instances. GPU functions support all three instance types: Elastic Instance, Provisioned Instance, and Mixed Mode. You can switch between types at any time without service interruption.

The following table summarizes the key differences:

Elastic InstanceProvisioned InstanceMixed Mode
Billing modelPay-as-you-goSubscriptionPay-as-you-go + subscription
Cold startsYes (mitigatable)NoPartial
Best forVariable traffic; cost-sensitive workloadsHigh-utilization, latency-sensitive, or stable-billing workloadsWorkloads with significant traffic spikes on a stable baseline
GPU functionsYesYes (Ada, Ada.2, Ada.3, Hopper, Xpu.1 series only)Yes
CPU functionsYesNoNo
Provisioned Instances can only be bound to GPU functions in the Ada, Ada.2, Ada.3, Hopper, or Xpu.1 series.

Elastic Instance

When the minimum number of instances is set to 0, instances scale with request volume and are released when idle. This pay-as-you-go model charges only for active compute time.

Consider Elastic Instances when:

  • Traffic is variable or unpredictable

  • Cost efficiency matters more than guaranteed capacity

  • You are running CPU functions (the only supported instance type for CPU)

Cold starts: Cold starts can occur. To reduce cold-start latency, set the minimum number of instances to 1 or more. This pre-allocates elastic resources so instances are ready to handle incoming requests immediately.

Billing: Charges apply for active Elastic Instances and instances in Shallow Hibernation. When the minimum number of instances is set to 1 or more, enable Shallow Hibernation. In this state, vCPU resources are not charged, and GPU resources are billed at one-fifth the active rate. For details on active and Shallow Hibernation states, see Elastic Instance.

Provisioned Instance

Purchase a Provisioned Resource Pool in advance, then allocate a specific number and type of provisioned instances to your function. This delivers predictable costs and guaranteed capacity.

Consider Provisioned Instances when:

  • Resource utilization is consistently high

  • Latency requirements are strict and cold starts are unacceptable

  • Billing must be stable and predictable

  • You are running GPU functions in the Ada, Ada.2, Ada.3, Hopper, or Xpu.1 series

After purchasing a monthly provisioned resource pool, the platform also grants a boost instance quota at no extra charge.

Cold starts: No cold starts. Requests within your allocated capacity receive real-time responses. Maximum concurrent requests = (number of allocated Provisioned Instances) × (instance concurrency) + boost instance quota. Requests that exceed this limit are throttled.

Billing: The total subscription fee for all purchased provisioned resource pools. The boost instance quota is not billed.

Mixed Mode

Mixed Mode applies to GPU functions only. The provisioned resource pool handles steady-state traffic with no cold starts. When requests exceed provisioned capacity, the system auto-scales by launching Elastic Instances to absorb bursts.

Consider Mixed Mode when:

  • Traffic has a stable baseline with significant spikes

  • You need guaranteed capacity for steady-state traffic but want elastic overflow for bursts

  • You are running GPU functions

Cold starts: Partial. Requests handled within the provisioned pool run without cold starts. New Elastic Instances launched during scale-out do experience cold starts.

Billing: Both subscription and pay-as-you-go apply:

  • Provisioned portion: Billed against the purchased Provisioned Resource Pool quota.

  • Elastic portion: Instances launched beyond the provisioned quota are billed on a pay-as-you-go basis at the same rates as active and Shallow Hibernation Elastic Instances.

Instance specifications

CPU instances

vCPU (cores)Memory (MB)Max code package size (GB)Max execution duration (s)Disk sizeMax bandwidth (Gbps)
0.05–16 (multiples of 0.05)128–32768 (multiples of 64)1086400512 MB (default) or 10 GB5
The vCPU-to-memory ratio must be between 1:1 and 1:4 (GB). For example, 4 vCPUs require between 4 GB and 16 GB of memory.

GPU instances

Hardware specifications

See Supported GPU instance families in ACS.
Instance typeGPU memoryFP16FP32Max cards per instance
fc.gpu.tesla.116 GB65 TFLOPS8 TFLOPS4
fc.gpu.ampere.124 GB125 TFLOPS31.2 TFLOPS8
fc.gpu.ada.148 GB119 TFLOPS60 TFLOPS
fc.gpu.ada.224 GB166 TFLOPS83 TFLOPS
fc.gpu.ada.348 GB148 TFLOPS73.5 TFLOPS
fc.gpu.hopper.196 GB148 TFLOPS44 TFLOPS
fc.gpu.hopper.2141 GB148 TFLOPS44 TFLOPS
fc.gpu.blackwell.132 GB104.8 TFLOPS104.8 TFLOPS
fc.gpu.xpu.196 GB123 TFLOPS61.5 TFLOPS16
Note
  • Setting the instance type to g1 is equivalent to fc.gpu.tesla.1.
  • Tesla series instances are available in: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.
  • Ada series instances are available in: China (Beijing), China (Hangzhou), China (Shanghai), China (Shenzhen), Singapore, and US (Virginia).

vCPU and memory configuration (per card)

For multi-card instances: total vCPUs = vCPUs per card × number of cards; total memory = memory per card × number of cards.

Instance typevCPU per cardMemory per card
fc.gpu.tesla.14 cores4–16 GB (4096–16384 MB), in 4 GB increments
8 cores8–32 GB (8192–32768 MB)
16 cores16–64 GB (16384–65536 MB)
fc.gpu.ampere.18 cores8–32 GB (8192–32768 MB)
16 cores16–64 GB (16384–65536 MB)
fc.gpu.ada.1, fc.gpu.ada.2, fc.gpu.ada.34 cores16–32 GB (16384–32768 MB)
8 cores32–64 GB (32768–65536 MB)
16 cores64–120 GB (65536–122880 MB)
fc.gpu.hopper.14 cores16–32 GB (16384–32768 MB)
8 cores32–64 GB (32768–65536 MB)
16 cores64–96 GB (65536–98304 MB)
24 cores96–120 GB (98304–122880 MB)
fc.gpu.hopper.24 cores16–32 GB (16384–32768 MB)
8 cores32–64 GB (32768–65536 MB)
16 cores64–128 GB (65536–131072 MB)
24 cores96–248 GB (98304–253952 MB)
fc.gpu.blackwell.14 cores16–32 GB (16384–32768 MB)
8 cores32–64 GB (32768–65536 MB)
16 cores64–120 GB (65536–122880 MB)
24 cores96–184 GB (98304–188416 MB)
fc.gpu.xpu.14 cores16–48 GB (16384–49152 MB)
8 cores32–96 GB (32768–98304 MB)
12 cores48–120 GB (49152–122880 MB)

Common GPU instance limits

Max image sizeMax execution duration (s)Disk sizeMax bandwidth (Gbps)
15 GB (all ACR editions)86400512 MB, or 10–200 GB in 10 GB increments5

GPU concurrency and regional quotas

An Ada.1 GPU has 48 GB of memory, and a Tesla series GPU has 16 GB of memory. Function Compute allocates the full memory of a GPU card to a single GPU container. With a default regional quota of 30 GPU cards:

  • At instance concurrency = 1, up to 30 inference requests run concurrently per region.

  • At instance concurrency = 5, up to 150 inference requests run concurrently per region.

Instance concurrency

Configure instance concurrency to run multiple tasks on a single instance, sharing CPU and memory resources. This improves resource utilization and reduces costs compared to running one task per instance. For configuration steps, see Configure instance concurrency.

How execution duration is measured

Execution duration for single-instance, single-concurrency

Single-instance, single request: Duration is measured from when the request arrives at the instance to when execution completes.

image

Execution duration for single-instance, multiple-concurrency

Single-instance, concurrent requests: Duration is measured from when the first request arrives to when the last request completes. Sharing a single instance across multiple requests reduces total billed duration.

image

What's next