All Products
Search
Document Center

Function Compute:Instance types and usage modes

Last Updated:Feb 07, 2025

Function Compute provides CPU instances and GPU-accelerated instances. Both types of instances can be used in on-demand and provisioned mode. On-demand instances are billed based on actual execution durations. You can use on-demand instances together with the instance concurrency feature to improve resource utilization. Billing of a provisioned instance begins when Function Compute starts the provisioned instance and ends when you release the instance. Provisioned instances can effectively mitigate cold starts. This topic describes the types, usage modes, billing methods, and specifications of function instances in Function Compute.

Instance types

  • CPU instances: the basic instance type of Function Compute. CPU instances are suitable for scenarios with traffic spikes or compute-intensive workloads.

  • GPU-accelerated instance: instances that use the Turing architecture for GPU acceleration. GPU-accelerated instances are mainly used to process audio and video files, AI workloads, and images. Instances of this type accelerate business by offloading loads to GPU hardware.

    For more information about the best practices for GPU-accelerated instances in different scenarios, see the following topics:

    Important
    • GPU-accelerated instances can only be deployed using container images.

    • When you use GPU-accelerated instances, you can join the DingTalk group (group ID: 64970014484) and provide the following information for technical support:

      • Your organization name, such as your company name.

      • The ID of your Alibaba Cloud account.

      • The region in which you want to use GPU-accelerated instances. Example: China (Shenzhen).

      • Your contact information, such as your mobile number, email address, or DingTalk account.

Instance modes

Both CPU instances and GPU-accelerated instances support on-demand mode and provisioned mode. This section describes the two modes.

On-demand mode

Introduction

On-demand instances are allocated and released by Function Compute. Function Compute automatically adjusts the number of instances in response to the volume of function invocations it receives. It creates instances when invocations increase and eliminates excess ones when invocations decrease. That is to say, the creation of on-demand instances is triggered by requests. On-demand instances are destroyed if no requests are submitted for processing for a period of time (usually 3 to 5 minutes). The first time you invoke an on-demand instance, you must wait for the cold start of the instance to complete.

By default, each Alibaba Cloud account can run up to 100 instances in a region. The actual quota displayed on the General Quotas page in the Quota Center console prevails. You can also apply for a quota adjustment in the Quota Center console.

Billing method

The billing duration for an on-demand instance starts when a request is received and continues until the request has been completely processed. Each on-demand instance can process one or more requests at a time. For more information, see Configure instance concurrency.

No instances are allocated if no requests are submitted for processing, and therefore no fees are generated. You are charged only when your function is invoked. For more information about pricing and billing, see Billing overview.

Note

You can use the instance concurrency feature based on your business requirements to improve resource utilization. If you use this solution, the CPU and memory are preemptively shared when multiple tasks are executed on one instance at the same time. This way, resource utilization is improved.

Instance concurrency = 1

In on-demand mode, the billing duration starts when a request arrives at an instance and ends when the request is completely processed.

image

Instance concurrency > 1

In this case, the measurement of an on-demand instance's execution duration starts when the first request is received and ends when the last request is completely processed. The instance concurrency feature reuses resources, helping save costs.

image

Provisioned mode

Introduction

In provisioned mode, you are in charge of the allocation and release of function instances. Provisioned instances are retained unless you release them. Invocation requests are preferentially distributed to provisioned instances. If provisioned instances are not enough to process all requests, Function Compute allocates on-demand instances to process the excess ones. For more information about how to delete a provisioned instance, see Configure auto scaling rules.

Note

Provisioned instances help mitigate cold starts. You can specify a fixed number of provisioned instances based on your business budget. Additionally, you can configure scheduled auto scaling policies based on your service's traffic patterns or choose threshold-based scaling when your service does not exhibit distinct traffic patterns. Once provisioned instances are used, the average cold start latency is significantly reduced.

Idle mode

CPU instances

The states of CPU instances are classified into the active state and the idle state based on whether vCPU resources are allocated to the instances. By default, the idle mode feature is enabled.

  • Active instances

    Instances are considered active if they are processing requests or if the idle mode feature is disabled for them. If you disable the idle mode feature, vCPUs are allocated to provisioned instances regardless of whether the instances are processing requests or not. This way, the instances are considered active at all times and can therefore continue processing background tasks.

  • Idle instances

    Provisioned instances for which the idle mode feature is enabled enter the idle state when they are not processing requests. Function Compute freezes the vCPUs of the instances when they are not processing requests. Instances in the idle state incur no charges, which saves costs. If a PreFreeze hook is configured for an instance, the instance enters the idle state after the PreFreeze hook is executed. Otherwise, the instance immediately enters the idle state when it finishes processing requests. For more information about instance states, see Function instance lifecycle.

You can choose whether to enable the idle mode feature based on your business requirements.

  • Costs

    If you want to use provisioned instances to mitigate cold starts and hope to save costs, we recommend that you enable the idle mode feature. Not only does this feature minimize cold starts, but it also allows you to pay only for the memory and disk resources consumed by the provisioned instances once they enter the idle state.

  • Background tasks

    If your function needs to run background tasks, we recommend that you do not enable the idle mode feature. The following items provide example scenarios:

    • Some application frameworks rely on built-in schedulers or background features. Some dependent middleware needs to regularly report heartbeats.

    • Some asynchronous operations are performed using Goroutine lightweight threads in Go, asynchronous functions in Node.js, or asynchronous threads in Java.

GPU-accelerated instances

The states of GPU-accelerated instances are classified into the active state and the idle state based on whether GPU resources are allocated to the instances. By default, the idle mode feature is enabled.

  • Active instances

    Instances are considered active if they are processing requests or if the idle mode feature is disabled for them. Once the idle mode feature is enabled, Function Compute freezes the GPUs of the instances when they are not processing requests.

  • Idle instances

    Provisioned instances for which the idle mode feature is enabled enter the idle state when they are not processing requests.

Billing method

  • Active instances

    The billing of provisioned instances starts when they are created and ends when they are released. Provisioned instances are requested and released by you. Therefore, until you release them, they continue to incur charges. Additionally, if the idle mode feature is not enabled, the charges are based on the unit prices of active instances at all times.

    image
  • Idle instances

    If the idle mode feature is enabled, the provisioned instances enter the idle state when they are not processing requests. The prices of idle instances are much lower than those of active instances. For more information, see Conversion factors.

Instance specifications

  • CPU instances

    The following table describes the specifications of CPU instances. Configure your instances as needed.

    vCPU

    Memory size (MB)

    Maximum code package size (GB)

    Maximum function execution duration (second)

    Maximum disk size (GB)

    Maximum bandwidth (Gbit/s)

    0.05 to 16

    Note: The value must be a multiple of 0.05.

    128 to 32768

    Note: The value must be a multiple of 64.

    10

    86400

    10

    Valid values:

    • 512 MB. This is the default value.

    • 10 GB.

    5

    Note

    The ratio of vCPU to memory capacity (in GB) must be from 1:1 to 1:4.

  • GPU-accelerated instances

    The following table describes the specifications of GPU-accelerated instances. Configure your instances as needed.

    Note

    An fc.gpu.tesla.1 GPU-accelerated instance offers similar performance to an instance that uses NVIDIA T4 GPUs.

    Instance type

    Full GPU size (GB)

    Computing power of full GPUs (TFLOPS)

    Available specifications

    On-demand mode supported or not

    Regular provisioned mode supported or not

    Idle provisioned mode supported or not

    FP16

    FP32

    vGPU memory (MB)

    vGPU computing power (GPU)

    vCPU

    Memory size (MB)

    fc.gpu.tesla.1

    16

    65

    8

    Valid values: 1024 to 16384 (1 GB to 16 GB)

    Note: The value must be a multiple of 1024.

    The value is calculated based on the following formula: vGPU computing power = vGPU memory (in GB)/16 × full-GPU computing power. For example, if you set the vGPU memory to 5 GB, the maximum available vGPU computing power is 5/16 × full-GPU computing power.

    Note: The computing power is automatically allocated by Function Compute.

    Valid values: 0.05 to the value of [vGPU memory (in GB)/2].

    Note: The value must be a multiple of 0.05. For more information, see GPU specifications.

    Valid values: 128 to the value of [vGPU memory (in GB) x 2048].

    Note: The value must be a multiple of 64. For more information, see GPU specifications.

    Y

    Y

    Y

    fc.gpu.ada.1

    48

    119

    60

    49152 (48 GB)

    Note: Only the 48 GB vGPU memory specification is supported.

    By default, the computing power of a full GPU is allocated.

    Note: The computing power is automatically allocated by Function Compute.

    8

    Note: Only the 8-vCPU specification is supported.

    65536 (64 GB)

    Note: Only the 64 GB memory specification is supported.

    N

    Y

    Y

  • The GPU-accelerated instances of Function Compute also support the following resource specifications.

    Image size (GB)

    Maximum function execution duration (second)

    Maximum disk size (GB)

    Maximum bandwidth (Gbit/s)

    Container Registry Enterprise Edition (Standard Edition): 15

    Container Registry Enterprise Edition (Advanced Edition): 15

    Container Registry Enterprise Edition (Basic Edition): 15

    Container Registry Personal Edition (free): 15

    86400

    10

    5

    Note
    • Setting the instance type to g1 achieves the same effect as setting the instance type to fc.gpu.tesla.1.

    • GPU-accelerated instances of Tesla series GPUs are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.

    • GPU-accelerated instances of Ada series GPUs are supported in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen).

GPU specifications

Expand to view the details of fc.gpu.tesla.1.

vGPU memory (MB)

vCPU

Maximum memory size (GB)

Memory size (MB)

1024

0.05–0.5

2

128–2048

2048

0.05–1

4

128–4096

3072

0.05–1.5

6

128–6144

4096

0.05–2

8

128–8192

5120

0.05–2.5

10

128–10240

6144

0.05–3

12

128–12288

7168

0.05–3.5

14

128–14336

8192

0.05–4

16

128–16384

9216

0.05–4.5

18

128–18432

10240

0.05–5

20

128–20480

11264

0.05–5.5

22

128–22528

12288

0.05–6

24

128–24576

13312

0.05–6.5

26

128–26624

14336

0.05–7

28

128–28672

15360

0.05–7.5

30

128–30720

16384

0.05–8

32

128–32768

Relationship between vGPU memory and region-level instance concurrency

  • A Tesla series GPU has a total memory capacity of 16 GB. If you set the vGPU memory to 1 GB, you can run 16 GPU containers simultaneously on one GPU of this series. By default, the total number of GPUs in a region is limited to 30. Therefore, at any given time, a maximum of 480 Tesla series GPU containers can run within a region.

    • If you set the instance concurrency of your GPU function to 1, a maximum of 480 inference requests can be concurrently processed by your function in a region.

    • If you set the instance concurrency of your GPU function to 5, a maximum of 2,400 inference requests can be concurrently processed by your function in a region.

  • An Ada series GPU has a total memory capacity of 48 GB, and can carry only one GPU container (the vGPU memory can only be set to 48 GB). By default, the total number of GPUs in a region is limited to 30. Therefore, at any given time, a maximum of 30 Ada series GPU containers can run within a region.

    • If you set the instance concurrency of your GPU function to 1, a maximum of 30 inference requests can be concurrently processed by your function in a region.

    • If you set the instance concurrency of your GPU function to 5, a maximum of 150 inference requests can be concurrently processed by your function in a region.

References

  • You can enable the idle mode feature when you configure auto scaling rules. For more information, see Configure auto scaling rules.

  • For more information about the billing methods and billable items of Function Compute, see Billing overview.

  • When you call an API operation to create a function, you can use the instanceType parameter to specify the instance type. For more information, see CreateFunction.

  • For more information about how to specify the type and specifications of an instance in the Function Compute console, see Manage functions.