All Products
Search
Document Center

Function Compute:Instance types and specifications

Last Updated:Mar 21, 2026

Basic CPU instances are usually sufficient for general-purpose computing scenarios on Function Compute, such as web services and data processing. However, for scenarios that require large-scale parallel computing or deep learning, such as audio and video processing, artificial intelligence (AI) inference, and image editing, GPU-accelerated instances can significantly improve computing efficiency.

Function Compute offers three instance types for GPU workloads: Elastic Instance, Resident Instance, and Mixed Mode. Choose the type and specifications that fit your needs to balance resource utilization, performance, and stability.

Instance type selection

For CPU functions, only Elastic Instances are supported. For GPU Functions, you can choose the most suitable instance type based on your business's resource utilization, latency sensitivity, and cost stability requirements. You can switch between the three instance types at any time without service interruption.

Note

You can bind Provisioned Instances only to GPU Functions that belong to the Ada, Ada.2, Ada.3, Hopper, or Xpu.1 series.

Elastic Instance

If you set the minimum number of of instances for a function to 0, instances automatically scale based on the request volume and are released when there are no requests. This enables a pay-as-you-go model in which you are charged only for what you use, maximizing cost savings. Higher request frequency leads to better resource utilization and greater cost savings compared to virtual machines.

Cold start behavior

Yes, cold starts can occur. For latency-sensitive workloads, you can mitigate cold starts by setting the minimum number of instances to 1 or higher. This pre-allocates elastic resources, allowing instances to be activated quickly to handle incoming requests.

Billing (Pay-as-you-go)

Function costs include charges for active Elastic Instances and Elastic Instances in Shallow Hibernation. If you set the minimum number of instances to 1 or more, we recommend enabling the Shallow Hibernation mode. In this state, you are not charged for vCPU resources, and GPU resources are billed at only one-fifth of the active rate, significantly lowering costs compared to active elastic instances.

For more information about the use cases for active and Shallow Hibernation states, see Elastic Instance.

Provisioned Instance

This instance type applies only to GPU Functions. You purchase a Provisioned Resource Pool in advance and then allocate a specific number and type of provisioned instances to your function. This approach provides predictable, fixed costs and is ideal for workloads with high resource utilization, strict latency requirements, or stable billing requirements.

After purchasing a monthly provisioned resource pool, the platform allocates a certain quota of boost instances in addition to your subscription-based provisioned instances. This boost instance quota is not billed.

Cold start behavior

No, there are no cold starts. When you use Provisioned Instances, requests within your allocated capacity receive a real-time response. The maximum number of concurrent requests a function can handle is calculated as: (Number of allocated Provisioned Instances) × (Instance concurrency)+ the boost instance quota. Any requests that exceed this limit are throttled.

Billing (Subscription)

The function cost is the total subscription fee for all purchased provisioned resource pools. The boost instance quota is not billed.

Provisioned Instance and Elastic Instance (Mixed Mode)

This mode applies only to GPU Functions. It combines the benefits of Provisioned Instances and Elastic Instances, making it ideal for workloads with significant traffic fluctuations. The system first uses the provisioned resource pool to handle steady-state traffic. When requests exceed the capacity of the provisioned pool, the system automatically scales out by launching elastic instances. This approach guarantees stable baseline capacity while effectively managing sudden traffic bursts.

Cold start behavior

Partially. Requests handled by the provisioned resource pool (up to the minimum number of instances) are processed in real-time with no cold starts. However, when traffic triggers auto-scaling and new elastic instances are launched, those new instances will experience a cold start.

Billing

The cost in Mixed Mode consists of both subscription and pay-as-you-go components:

  • Provisioned portion: Billed against your purchased Provisioned Resource Pool quota.

  • Elastic portion: Instances launched beyond the provisioned quota are billed on a pay-as-you-go basis, with the same rates as active and shallow hibernation elastic instances.

Instance specifications

  • CPU instances

    vCPU (core)

    Memory size (MB)

    Maximum code package size (GB)

    Maximum function execution duration (s)

    Maximum disk size (GB)

    Maximum bandwidth (Gbps)

    0.05 to 16

    Note: The value must be a multiple of 0.05.

    128 to 32768

    Note: The value must be a multiple of 64.

    10

    86400

    10

    Valid values:

    • 512 MB. This is the default value.

    • 10 GB.

    5

    Note

    The ratio of vCPUs to memory size (in GB) must be between 1:1 and 1:4.

  • GPU instance hardware specifications

    Instance type

    GPU Memory

    FP16 computing power

    FP32 computing power

    Max cards per instance

    fc.gpu.tesla.1

    16 GB

    65 TFLOPS

    8 TFLOPS

    4 cards

    fc.gpu.ampere.1

    24 GB

    125 TFLOPS

    31.2 TFLOPS

    8 cards

    fc.gpu.ada.1

    48 GB

    119 TFLOPS

    60 TFLOPS

    fc.gpu.ada.2

    24 GB

    166 TFLOPS

    83 TFLOPS

    fc.gpu.ada.3

    48 GB

    148 TFLOPS

    73.5 TFLOPS

    fc.gpu.hopper.1

    96 GB

    148 TFLOPS

    44 TFLOPS

    fc.gpu.hopper.2

    141 GB

    148 TFLOPS

    44 TFLOPS

    fc.gpu.blackwell.1

    32 GB

    104.8 TFLOPS

    104.8 TFLOPS

    fc.gpu.xpu.1

    96 GB

    123 TFLOPS

    61.5 TFLOPS

    16 cards

  • vCPU and memory configuration for GPU instances

    Note

    Multi-card resource formula: Total vCPUs = vCPUs per card × Number of cards; Total memory = Memory per card × Number of cards.

    Instance type

    vCPU (per card)

    Memory range (per card)

    Memory increment

    fc.gpu.tesla.1

    4 cores

    4 to 16 GB (4096 to 16384 MB)

    4 GB (4096 MB)

    8 cores

    8 to 32 GB (8192 to 32768 MB)

    16 cores

    16 to 64 GB (16384 to 65536 MB)

    fc.gpu.ampere.1

    8 cores

    8 to 32 GB (8192 to 32768 MB)

    16 cores

    16 to 64 GB (16384 to 65536 MB)

    fc.gpu.ada.1

    fc.gpu.ada.2

    fc.gpu.ada.3

    4 cores

    16 to 32 GB (16384 to 32768 MB)

    8 cores

    32 to 64 GB (32768 to 65536 MB)

    16 cores

    64 to 120 GB (65536 to 122880 MB)

    fc.gpu.hopper.1

    4 cores

    16 to 32 GB (16384 to 32768 MB)

    8 cores

    32 to 64 GB (32768 to 65536 MB)

    16 cores

    64 to 96 GB (65536 to 98304 MB)

    24 cores

    96 to 120 GB (98304 to 122880 MB)

    fc.gpu.hopper.2

    4 cores

    16 to 32 GB (16384 to 32768 MB)

    8 cores

    32 to 64 GB (32768 to 65536 MB)

    16 cores

    64 to 128 GB (65536 to 131072 MB)

    24 cores

    96 to 248 GB (98304 to 253952 MB)

    fc.gpu.blackwell.1

    4 cores

    16 to 32 GB (16384 to 32768 MB)

    8 cores

    32 to 64 GB (32768 to 65536 MB)

    16 cores

    64 to 120 GB (65536 to 122880 MB)

    24 cores

    96 to 184 GB (98304 to 188416 MB)

    fc.gpu.xpu.1

    4 cores

    16 to 48 GB (16384 to 49152 MB)

    8 cores

    32 to 96 GB (32768 to 98304 MB)

    12 cores

    48 to 120 GB (49152 to 122880 MB)

  • GPU-accelerated instances also support the following resource specifications.

    Image size (GB)

    Maximum function execution duration (s)

    Disk size (GB)

    Maximum bandwidth (Gbps)

    ACR Enterprise Edition (Standard Edition): 15

    ACR Enterprise Edition (Premium Edition): 15

    ACR Enterprise Edition (Basic Edition): 15

    ACR Personal Edition (Free): 15

    86400

    • 512 MB

    • 10 GB to 200 GB, in 10 GB increments

    5

    Note
    • Setting the instance type to g1 is equivalent to setting it to fc.gpu.tesla.1.

    • Tesla series GPU-accelerated instances are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.

    • Ada series GPU-accelerated instances are supported in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), China (Shenzhen), Singapore, and US (Virginia).

Relationship between GPU instance specifications and instance concurrency

An Ada.1 GPU has 48 GB of memory, and a Tesla series GPU has 16 GB of memory. Function Compute allocates the full memory of a GPU card to a single GPU container. Because the default GPU card quota is a maximum of 30 per region, a maximum of 30 GPU containers can run simultaneously in that region.

  • If the instance concurrency of a GPU function is 1, the function can process up to 30 inference requests concurrently in a region.

  • If the instance concurrency of a GPU function is 5, the function can process up to 150 inference requests concurrently in a region.

Single-instance concurrency

To improve resource utilization, you can configure single-instance concurrency based on your application's resource requirements. In this configuration, multiple tasks can run on a single instance and share CPU and memory resources, which improves overall resource utilization. For more information, see Configure instance concurrency.

Execution duration for single-instance, single-concurrency

When an instance executes a single request, the execution duration is measured from when the request arrives at the instance to when the request execution is complete.

image

Execution duration for single-instance, multiple-concurrency

When an instance executes multiple requests concurrently, the execution duration is measured from the time the first request arrives at the instance to the time the last request is completed. This resource reuse helps save costs.

image

References

  • For more information about the billing methods and billable items of Function Compute, see Billing overview.

  • When you use an API to create a function, you can use the instanceType parameter to specify the instance type. For more information, see CreateFunction.

  • To learn how to specify the instance type and specifications in the console, see Function creation.