Instance types and specifications

Basic CPU instances are usually sufficient for general-purpose computing scenarios on Function Compute, such as web services and data processing. However, for scenarios that require large-scale parallel computing or deep learning, such as audio and video processing, artificial intelligence (AI) inference, and image editing, GPU-accelerated instances can significantly improve computing efficiency.

For GPU instances, Function Compute offers three instance types: elastic instances, resident instances, and resident + elastic instances (hybrid mode). You can select the instance type and specifications that best suit your business requirements to ensure stable operations while maximizing resource utilization and performance.

Instance type selection

CPU functions support only Elastic Instances. GPU Functions support three instance types, which you can switch between at any time without service interruption.

Decision guide

Use the following questions to find the right instance type:

Is your workload latency-sensitive and interactive? For example, a real-time chatbot or image generation API. If yes, use Provisioned Instances to eliminate cold starts and guarantee response times.
Does your traffic follow a predictable baseline with occasional spikes? If yes, use Mixed Mode (Provisioned + Elastic Instances) to maintain stable baseline capacity while absorbing traffic bursts.
Is your traffic variable, bursty, or low-frequency? If yes, use Elastic Instances and pay only for active usage.

Instance type comparison

	Elastic Instance	Provisioned Instance	Provisioned + Elastic (Mixed Mode)
Applies to	CPU functions (only option); GPU Functions	GPU Functions only	GPU Functions only
Cold start	Yes, if minimum instances = 0. Set minimum instances to 1 or more to pre-allocate resources and reduce cold starts.	None. All requests within allocated capacity get a real-time response.	Partial. Requests within the provisioned pool have no cold start; elastic scale-out instances do.
Billing model	Pay-as-you-go	Subscription	Subscription (provisioned portion) + pay-as-you-go (elastic portion)
Best suited for	Variable or low-frequency traffic; cost-sensitive workloads	Latency-sensitive or stable traffic workloads	Workloads with a predictable baseline and unpredictable traffic bursts

Elastic Instance

Elastic Instances scale automatically with request volume and are released when idle. Setting the minimum number of instances to 0 gives you a pure pay-as-you-go model — you pay only for active usage.

Cold start behavior: Cold starts occur when instances scale from zero. To reduce cold start latency, set the minimum number of instances to 1 or more. This pre-allocates elastic resources so instances are ready to handle incoming requests quickly.

Billing: Costs include charges for instances in both the active and Shallow Hibernation states. In Shallow Hibernation, vCPU resources are not charged and GPU resources are billed at one-fifth of the active rate. If you set the minimum number of instances to 1 or more, enable Shallow Hibernation to reduce idle costs.

Use Elastic Instances when:

Your traffic is variable, bursty, or low-frequency
You want to pay only for actual usage
Your workload can tolerate occasional cold start latency (or you mitigate it with a minimum instance count)

Provisioned Instance

Provisioned Instances apply only to GPU Functions. Purchase a Provisioned Resource Pool in advance, then allocate a specific number and type of instances to your function. This eliminates cold starts within your allocated capacity and gives you predictable, fixed costs.

After purchasing a monthly provisioned resource pool, the platform provides an additional boost instance quota at no extra charge.After purchasing a monthly provisioned resource pool, the platform allocates a certain quota of boost instances in addition to your subscription-based provisioned instances. This boost instance quota is not billed.

Cold start behavior: None. All requests within your allocated capacity receive a real-time response. Maximum concurrent requests = (Number of allocated Provisioned Instances) × (Instance concurrency) + boost instance quota+ the boost instance quota. Requests that exceed this limit are throttled.

Billing: The total subscription fee for all purchased Provisioned Resource Pools. Boost instances are not billed.. The boost instance quota is not billed

Provisioned Instances are available only for GPU Functions in the Ada, Ada.2, Ada.3, Hopper, or Xpu.1 series.

Use Provisioned Instances when:

Your workload is latency-sensitive and interactive (for example, a real-time chatbot or image generation API)
Your traffic is steady and predictable
You need guaranteed capacity and consistent response times

Provisioned + Elastic Instances (Mixed Mode)

Mixed Mode applies only to GPU Functions. It combines Provisioned and Elastic Instances: the provisioned pool handles steady-state traffic first, and elastic instances automatically scale out when requests exceed the provisioned capacity. This gives you a guaranteed baseline with the flexibility to absorb sudden traffic bursts.

Cold start behavior: Partial. Requests handled within the provisioned pool have no cold start. Requests that trigger auto-scaling to new elastic instances experience a cold start.

Billing: The provisioned portion is billed against your purchased Provisioned Resource Pool quota. Elastic instances launched beyond the provisioned quota are billed on a pay-as-you-go basis, at the same rates as active and Shallow Hibernation elastic instances.

Use Mixed Mode when:

Your traffic has a predictable baseline but occasional spikes
You want stable performance for normal load with the ability to handle burst traffic
You need a balance between cost predictability and scaling flexibility

Instance specifications

CPU instances

vCPU (core)

Memory size (MB)

Maximum code package size (GB)

Maximum function execution duration (s)

Maximum disk size (GB)

Maximum bandwidth (Gbps)

0.05 to 16

Note: The value must be a multiple of 0.05.

128 to 32768

Note: The value must be a multiple of 64.

86400

Valid values:

512 MB. This is the default value.
10 GB.

Note

The ratio of vCPUs to memory size (in GB) must be between 1:1 and 1:4.

GPU instance hardware specifications

Instance type	GPU memory	FP16 computing power	FP32 computing power	Max cards per instance
fc.gpu.tesla.1	16 GB	65 TFLOPS	8 TFLOPS	4 cards
fc.gpu.ampere.1	24 GB	125 TFLOPS	31.2 TFLOPS	8 cards
fc.gpu.ada.1	48 GB	119 TFLOPS	60 TFLOPS
fc.gpu.ada.2	24 GB	166 TFLOPS	83 TFLOPS
fc.gpu.ada.3	48 GB	148 TFLOPS	73.5 TFLOPS
fc.gpu.hopper.1	96 GB	148 TFLOPS	44 TFLOPS
fc.gpu.hopper.2	141 GB	148 TFLOPS	44 TFLOPS
fc.gpu.blackwell.1	32 GB	104.8 TFLOPS	104.8 TFLOPS
fc.gpu.xpu.1	96 GB	123 TFLOPS	61.5 TFLOPS	16 cards

vCPU and memory configuration rules for GPU instances

Note

Formula for multi-card resources: Total vCPUs = vCPUs per card × Number of cards, and Total memory = Memory per card × Number of cards.

Instance type	vCPU (per card)	Memory range per card	Memory increment
fc.gpu.tesla.1	4 cores	4 to 16 GB (4,096 to 16,384 MB)	4 GB (4,096 MB)
	8 cores	8 to 32 GB (8,192 to 32,768 MB)
	16 cores	16 to 64 GB (16,384 to 65,536 MB)
fc.gpu.ampere.1	8 cores	8 to 32 GB (8,192 to 32,768 MB)
fc.gpu.ampere.1	16 cores	16 to 32 GB (16,384 to 32,768 MB)
fc.gpu.ada.1 fc.gpu.ada.2 fc.gpu.ada.3	4 cores	16 to 32 GB (16,384 to 32,768 MB)
	8 cores	32 to 64 GB (32,768 to 65,536 MB)
	16 cores	64 to 120 GB (65,536 to 122,880 MB)
fc.gpu.hopper.1	4 cores	16 to 32 GB (16,384 to 32,768 MB)
	8 cores	32 to 64 GB (32,768 to 65,536 MB)
	16 cores	64 to 96 GB (65,536 to 98,304 MB)
	24 cores	96 to 120 GB (98,304 to 122,880 MB)
fc.gpu.hopper.2	4 cores	16 to 32 GB (16,384 to 32,768 MB)
	8 cores	32 to 64 GB (32,768 to 65,536 MB)
	16 cores	64 to 128 GB (65,536 to 131,072 MB)
	24 cores	96 to 248 GB (98,304 to 253,952 MB)
fc.gpu.blackwell.1	4 cores	16 to 32 GB (16,384 to 32,768 MB)
	8 cores	32 to 64 GB (32,768 to 65,536 MB)
	16 cores	64 to 120 GB (65,536 to 122,880 MB)
	24 cores	96 to 184 GB (98,304 to 188,416 MB)
fc.gpu.xpu.1	4 cores	16 to 48 GB (16,384 to 49,152 MB)
	8 cores	32 to 96 GB (32,768 to 98,304 MB)
	12 cores	48 to 120 GB (49,152 to 122,880 MB)

GPU-accelerated instances also support the following resource specifications.

Image size (GB)

Maximum function execution duration (s)

Disk size

Maximum bandwidth (Gbps)

ACR Enterprise Edition (Standard Edition): 15

ACR Enterprise Edition (Premium Edition): 15

ACR Enterprise Edition (Basic Edition): 15

ACR Personal Edition (Free): 15

86400

512 MB
10 GB to 200 GB, in 10 GB increments

Note

Setting the instance type to g1 is equivalent to setting it to fc.gpu.tesla.1.
Tesla series GPU-accelerated instances are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.
Ada series GPU-accelerated instances are supported in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), China (Shenzhen), Singapore, and US (Virginia).

Relationship between GPU instance specifications and instance concurrency

An Ada.1 GPU has 48 GB of memory, and a Tesla series GPU has 16 GB of memory. Function Compute allocates the full memory of a GPU card to a single GPU container. Because the default GPU card quota is a maximum of 30 per region, a maximum of 30 GPU containers can run simultaneously in that region.

If the instance concurrency of a GPU function is 1, the function can process up to 30 inference requests concurrently in a region.
If the instance concurrency of a GPU function is 5, the function can process up to 150 inference requests concurrently in a region.

Single-instance concurrency

To improve resource utilization, you can configure single-instance concurrency based on your application's resource requirements. In this configuration, multiple tasks can run on a single instance and share CPU and memory resources, which improves overall resource utilization. For more information, see Configure instance concurrency.

Execution duration for single-instance, single-concurrency

When an instance executes a single request, the execution duration is measured from when the request arrives at the instance to when the request execution is complete.

Execution duration for single-instance, multiple-concurrency

When an instance executes multiple requests concurrently, the execution duration is measured from the time the first request arrives at the instance to the time the last request is completed. This resource reuse helps save costs.

References

For more information about the billing methods and billable items of Function Compute, see Billing overview.
When you use an API to create a function, you can use the instanceType parameter to specify the instance type. For more information, see CreateFunction.
To learn how to specify the instance type and specifications in the console, see Function creation.