Usage modes and specifications of elastic instances and GPU-accelerated instances - Function Compute (2.0)

Function Compute provides elastic instances and GPU-accelerated instances. Both types of instances can be used in the on-demand mode and the provisioned mode. On-demand instances are billed based on actual execution duration. You can use on-demand instances together with the instance concurrency feature to improve resource utilization. Billing of a provisioned instance starts when Function Compute starts the provisioned instance and ends when you release the instance. Provisioned instances can effectively mitigate cold starts. This topic describes types, usage modes, billing methods, and specifications of function instances in Function Compute.

Instance types

Elastic instance: the basic instance type of Function Compute. Elastic instances are suitable for scenarios with bursty traffic or compute-intensive workloads.
GPU-accelerated instance: Instances that use the Ampere and Turing architectures for GPU acceleration. GPU-accelerated instances are mainly used in audio and video processing, Artificial Intelligence (AI), and image processing scenarios. Instances of this type accelerate business by offloading workloads to GPU hardware.
The following topics describe best practices for GPU-accelerated instances in different scenarios:
Important
- GPU-accelerated instances can be deployed only by using container images.
- For optimal user experience, join the DingTalk group 11721331 for technical support when you use GPU-accelerated instances. Provide the following information when you join the DingTalk group:
  - Your organization name, such as your company name.
  - The ID of your Alibaba Cloud account.
  - The region where you want to use GPU-accelerated instances. Example: China (Shenzhen).
  - Your contact information, such as your mobile number, email address, or DingTalk account.

Usage modes

GPU-accelerated instances and elastic instances support the on-demand mode and the provisioned mode. This section describes the two usage modes.

On-demand mode

Introduction

On-demand instances are allocated and released by Function Compute. Function Compute automatically scales instances based on the number of function invocations. Function Compute creates instances when the number of function invocations increases, and destroys excess instances when the number of function invocations decreases. On-demand instances are automatically created upon requests. On-demand instances are destroyed if no requests are sent for processing for a period of time (usually 3 to 5 minutes). The first time you invoke an on-demand instance, you must wait for the cold start of the instance to complete.

By default, a maximum of 300 on-demand instances can be created for an Alibaba Cloud account in each region. If you want to increase the upper limit, join the DingTalk group 11721331 for technical support.

Billing rules

Billing of an on-demand instance starts when requests are sent to the instance for processing and ends when the requests are processed. An instance can process one or more requests at a time. For more information, see Configure instance concurrency.

No instance is allocated if no request is submitted for processing, and therefore no fees are generated. In the on-demand mode, you are charged only when your function is invoked. For more information about pricing and billing, see Billing overview.

Note

You can use the instance concurrency feature based on your business requirements to improve resource utilization. In this case, multiple tasks preemptively share CPU and memory resources on your instance to improve resource utilization.

Instance concurrency = 1

Measurement of execution duration starts when requests arrive at an instance and ends when the requests are completely executed.

Instance concurrency > 1

Measurement of execution duration starts when the first request is submitted to an instance and ends when the last request is completely executed. You can reuse resources to concurrently process multiple requests. This way, resource costs can be reduced.

Provisioned mode

Introduction

In the provisioned mode, you can manage the allocation and release of function instances. Provisioned instances are retained unless you release them. Provisioned instances take precedence over on-demand instances. If provisioned instances are not enough to process all requests, Function Compute allocates on-demand instances to process the remaining requests. For more information about how to delete a provisioned instance, see Configure auto scaling rules.

Note

If cold starts are an issue for you, we recommend that you use provisioned instances. You can specify a fixed number of provisioned instances or configure a scheduled auto scaling policy or metric-based auto scaling policy based on factors such as your resource budget, traffic fluctuations of your business, and resource usage thresholds. The average cold start latency of instances is significantly reduced when provisioned instances are used.

Idle mode

Elastic instances

Elastic instances are classified into active instances and idle instances based on whether the instances are allocated vCPUs. By default, the idle mode feature is disabled.

Active instances
Instances are considered active if they are processing requests or if Idle Mode is disabled for them. If you disable Idle Mode, vCPUs are always allocated to provisioned instances regardless of whether the instances are processing requests or not. This way, provisioned instances are always active, even when no requests are being processed. Running of background tasks is not affected.
Idle instances
Provisioned instances for which Idle Mode is enabled are considered idle when they are not processing requests. Function Compute freezes vCPUs of provisioned instances when the instances are not processing requests. The instances enter the idle state. You are not charged for vCPU resources when the instances are in the idle state, which helps you save costs. If a PreFreeze hook is configured for an instance, the instance enters the idle state when the PreFreeze hook is executed. Otherwise, the instance immediately enters the idle state when it finishes processing requests. For more information about instance states, see Function instance lifecycle.

You can choose whether to enable the idle mode feature based on your business requirements.

Costs
If you want to use provisioned instances to eliminate cold starts and hope to save costs, we recommend that you enable the idle mode feature. This feature allows you to pay only for memory and disk resources of provisioned instances if provisioned instances are in the idle state, and requests can be responded without cold starts.
Background tasks
If your function needs to run background tasks, we recommend that you do not enable the idle mode feature. The following items provide example scenarios:
- Some application frameworks rely on the built-in scheduler or background features. Some dependent middleware needs to regularly report heartbeats.
- Some asynchronous operations are performed by using Goroutine lightweight threads that use Go, the async functions that use Node.js, or the asynchronous threads that use Java.

GPU-accelerated instances

States of GPU-accelerated instances can be classified into active and idle states based on whether they are allocated vCPU and GPU resources. By default, the idle mode feature is disabled for GPU-accelerated instances.

Active instances
Instances are considered active if they are processing requests or if Idle Mode is disabled for them. After you enable Idle Mode, Function Compute freezes vCPUs and GPUs of provisioned instances when the instances are not processing requests. The instances enter the idle state.
Idle instances
Provisioned instances for which Idle Mode is enabled are considered idle when they are not processing requests.
Note
To use the idle mode, your GPU-accelerated instance must be configured with 16 GB (T4) or 24 GB (A10) of GPU. You can join the DingTalk group 11721331 and request to use the idle mode feature.

Billing rules

Active instances
The billing of provisioned instances starts when the provisioned instances are created and ends when the provisioned instances are released, regardless of whether they have processed requests or not. For more information about the pricing and billing of Function Compute, see Billing overview.
Idle instances
- Elastic instances: You are not charged for vCPUs of instances in the idle state. For more information, see Billing overview.
- GPU instances: Refer to Billing overview.

Instance specifications

Elastic instances

The following table describes specifications of elastic instances. You can configure instance specifications based on your business requirements.

vCPUs

Memory size (MB)

Maximum code package size (GB)

Maximum function execution duration (seconds)

Maximum disk size (GB)

Maximum bandwidth (Gbit/s)

0.05 to 16.

The value must be a multiple of 0.05.

128 to 32768.

The value must be a multiple of 64.

86400

Valid values:

512 MB. This is the default value.
10 GB.

Note

The ratio of vCPU to memory capacity (in GB) is 1: N. N must be a value that ranges from 1 to 4.

GPU-accelerated instances

The following table describes specifications of GPU-accelerated instances. You can configure instance specifications based on your business requirements.

Instance specification

Card type

vGPU memory (MB)

vGPU computing power (card)

vCPUs

Memory size (MB)

fc.gpu.tesla.1

Tesla T4

Valid values: 1024 to 16384 (1 GB to 16 GB).

The value must be a multiple of 1,024 MB.

The value is calculated based on the following formula: vGPU memory (GB)/16. For example, if you set the vGPU memory to 5 GB, you can use up to 5/16 memory cards.

The computing power is automatically allocated by Function Compute and does not need to be manually allocated.

Valid values: 0.05 to the value of [vGPU memory (GB)/2].

The value must be a multiple of 0.05. For more information, see GPU specifications.

Valid values: 128 to the value of [vGPU memory (GB) x 2,048].

The value must be a multiple of 64. For more information, see GPU specifications.

fc.gpu.ampere.1

Ampere A10

Valid values: 1024 to 24576 (1 GB to 24 GB)

The value must be a multiple of 1,024 MB.

The value is calculated based on the following formula: vGPU memory (GB)/24. For example, if you set the vGPU memory to 5 GB, you can use up to 5/24 memory cards.

The computing power is automatically allocated by Function Compute and does not need to be manually allocated.

Valid values: 0.05 to the value of [vGPU memory (GB)/3].

The value must be a multiple of 0.05. For more information, see GPU specifications.

Valid values: 128 to the value of [vGPU memory (GB) x 4096)/3].

The value must be a multiple of 64. For more information, see GPU specifications.

GPU-accelerated instances of Function Compute also support the following resource specifications.

Image size (GB)

Maximum function execution duration (seconds)

Maximum disk size (GB)

Maximum bandwidth (Gbit/s)

Container Registry Enterprise Edition (Standard Edition): 10

Container Registry Enterprise Edition (Advanced Edition): 10

Container Registry Enterprise Edition (Basic Edition): 10

Container Registry Personal Edition (free): 10

86400

Note

Specifying the instance type as g1 achieves the same effect as selecting the fc.gpu.tesla.1 instance specification.
GPU-accelerated instances of the T4 type are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.
GPU-accelerated instances of the A10 type are supported in the following regions: China (Hangzhou), China (Shanghai), Japan (Tokyo), and Singapore.

GPU specifications

Expand to view details of fc.gpu.tesla.1.

vGPU memory (MB)	vCPUs	Maximum memory size (GB)	Memory size (MB)
1024	0.05 to 0.5	2	128 to 2048
2048	0.05 to 1	4	128 to 4096
3072	0.05 to 1.5	6	128 to 6144
4096	0.05 to 2	8	128 to 8192
5120	0.05 to 2.5	10	128 to 10240
6144	0.05 to 3	12	128 to 12288
7168	0.05 to 3.5	14	128 to 14336
8192	0.05 to 4	16	128 to 16384
9216	0.05 to 4.5	18	128 to 18432
10240	0.05 to 5	20	128 to 20480
11264	0.05 to 5.5	22	128 to 22528
12288	0.05 to 6	24	128 to 24576
13312	0.05 to 6.5	26	128 to 26624
14336	0.05 to 7	28	128 to 28672
15360	0.05 to 7.5	30	128 to 30720
16384	0.05 to 8	32	128 to 32768

Expand to view details of fc.gpu.ampere.1.

vGPU memory (MB)	vCPUs	Maximum memory size (GB)	Memory size (MB)
1024	0.05 to 0.3	1.3125	128 to 1344
2048	0.05 to 0.65	2.625	128 to 2688
3072	0.05 to 1	4	128 to 4096
4096	0.05 to 1.3	5.3125	128 to 5440
5120	0.05 to 1.65	6.625	128 to 6784
6144	0.05 to 2	8	128 to 8192
7168	0.05 to 2.3	9.3125	128 to 9536
8192	0.05 to 2.65	10.625	128 to 10880
9216	0.05 to 3	12	128 to 12288
10240	0.05 to 3.3	13.3125	128 to 13632
11264	0.05 to 3.65	14.625	128 to 14976
12288	0.05 to 4	16	128 to 16384
13312	0.05 to 4.3	17.3125	128 to 17728
14336	0.05 to 4.65	18.625	128 to 19072
15360	0.05 to 5	20	128 to 20480
16384	0.05 to 5.3	21.3125	128 to 21824
17408	0.05 to 5.65	22.625	128 to 23168
18432	0.05 to 6	24	128 to 24576
19456	0.05 to 6.3	25.3125	128 to 25920
20480	0.05 to 6.65	26.625	128 to 27264
21504	0.05 to 7	28	128 to 28672
22528	0.05 to 7.3	29.3125	128 to 30016
23552	0.05 to 7.65	30.625	128 to 31360
24576	0.05 to 8	32	128 to 32768

Additional information

You can enable the idle mode feature when you configure auto scaling rules. For more information, see Configure auto scaling rules.
For more information about the billing methods and billable items of Function Compute, see Billing overview.
When you call an API operation to create a function, you can use the instanceType parameter to specify an instance type. For more information, see CreateFunction.
For more information about how to specify type and specifications of instances in the Function Compute console, see Manage functions.