Function Compute provides CPU instances and GPU-accelerated instances. Both types of instances can be used in on-demand and provisioned mode. On-demand instances are billed based on actual execution durations. You can use on-demand instances together with the instance concurrency feature to improve resource utilization. Billing of a provisioned instance begins when Function Compute starts the provisioned instance and ends when you release the instance. Provisioned instances can effectively mitigate cold starts. This topic describes the types, usage modes, billing methods, and specifications of function instances in Function Compute.
Instance types
CPU instances: the basic instance type of Function Compute. CPU instances are suitable for scenarios with traffic spikes or compute-intensive workloads.
GPU-accelerated instance: instances that use the Turing architecture for GPU acceleration. GPU-accelerated instances are mainly used to process audio and video files, AI workloads, and images. Instances of this type accelerate business by offloading loads to GPU hardware.
For more information about the best practices for GPU-accelerated instances in different scenarios, see the following topics:
ImportantGPU-accelerated instances can only be deployed using container images.
When you use GPU-accelerated instances, you can join the DingTalk group (group ID: 64970014484) and provide the following information for technical support:
Your organization name, such as your company name.
The ID of your Alibaba Cloud account.
The region in which you want to use GPU-accelerated instances. Example: China (Shenzhen).
Your contact information, such as your mobile number, email address, or DingTalk account.
Instance modes
Both CPU instances and GPU-accelerated instances support on-demand mode and provisioned mode. This section describes the two modes.
On-demand mode
Introduction
On-demand instances are allocated and released by Function Compute. Function Compute automatically adjusts the number of instances in response to the volume of function invocations it receives. It creates instances when invocations increase and eliminates excess ones when invocations decrease. That is to say, the creation of on-demand instances is triggered by requests. On-demand instances are destroyed if no requests are submitted for processing for a period of time (usually 3 to 5 minutes). The first time you invoke an on-demand instance, you must wait for the cold start of the instance to complete.
By default, each Alibaba Cloud account can run up to 100 instances in a region. The actual quota displayed on the General Quotas page in the Quota Center console prevails. You can also apply for a quota adjustment in the Quota Center console.
Billing method
The billing duration for an on-demand instance starts when a request is received and continues until the request has been completely processed. Each on-demand instance can process one or more requests at a time. For more information, see Configure instance concurrency.
No instances are allocated if no requests are submitted for processing, and therefore no fees are generated. You are charged only when your function is invoked. For more information about pricing and billing, see Billing overview.
You can use the instance concurrency feature based on your business requirements to improve resource utilization. If you use this solution, the CPU and memory are preemptively shared when multiple tasks are executed on one instance at the same time. This way, resource utilization is improved.
Instance concurrency = 1
In on-demand mode, the billing duration starts when a request arrives at an instance and ends when the request is completely processed.
Instance concurrency > 1
In this case, the measurement of an on-demand instance's execution duration starts when the first request is received and ends when the last request is completely processed. The instance concurrency feature reuses resources, helping save costs.
Provisioned mode
Introduction
In provisioned mode, you are in charge of the allocation and release of function instances. Provisioned instances are retained unless you release them. Invocation requests are preferentially distributed to provisioned instances. If provisioned instances are not enough to process all requests, Function Compute allocates on-demand instances to process the excess ones. For more information about how to delete a provisioned instance, see Configure auto scaling rules.
Provisioned instances help mitigate cold starts. You can specify a fixed number of provisioned instances based on your business budget. Additionally, you can configure scheduled auto scaling policies based on your service's traffic patterns or choose threshold-based scaling when your service does not exhibit distinct traffic patterns. Once provisioned instances are used, the average cold start latency is significantly reduced.
Idle mode
CPU instances
The states of CPU instances are classified into the active state and the idle state based on whether vCPU resources are allocated to the instances. By default, the idle mode feature is enabled.
Active instances
Instances are considered active if they are processing requests or if the idle mode feature is disabled for them. If you disable the idle mode feature, vCPUs are allocated to provisioned instances regardless of whether the instances are processing requests or not. This way, the instances are considered active at all times and can therefore continue processing background tasks.
Idle instances
Provisioned instances for which the idle mode feature is enabled enter the idle state when they are not processing requests. Function Compute freezes the vCPUs of the instances when they are not processing requests. Instances in the idle state incur no charges, which saves costs. If a PreFreeze hook is configured for an instance, the instance enters the idle state after the PreFreeze hook is executed. Otherwise, the instance immediately enters the idle state when it finishes processing requests. For more information about instance states, see Function instance lifecycle.
You can choose whether to enable the idle mode feature based on your business requirements.
Costs
If you want to use provisioned instances to mitigate cold starts and hope to save costs, we recommend that you enable the idle mode feature. Not only does this feature minimize cold starts, but it also allows you to pay only for the memory and disk resources consumed by the provisioned instances once they enter the idle state.
Background tasks
If your function needs to run background tasks, we recommend that you do not enable the idle mode feature. The following items provide example scenarios:
Some application frameworks rely on built-in schedulers or background features. Some dependent middleware needs to regularly report heartbeats.
Some asynchronous operations are performed using Goroutine lightweight threads in Go, asynchronous functions in Node.js, or asynchronous threads in Java.
GPU-accelerated instances
The states of GPU-accelerated instances are classified into the active state and the idle state based on whether GPU resources are allocated to the instances. By default, the idle mode feature is enabled.
Active instances
Instances are considered active if they are processing requests or if the idle mode feature is disabled for them. Once the idle mode feature is enabled, Function Compute freezes the GPUs of the instances when they are not processing requests.
Idle instances
Provisioned instances for which the idle mode feature is enabled enter the idle state when they are not processing requests.
Billing method
Active instances
The billing of provisioned instances starts when they are created and ends when they are released. Provisioned instances are requested and released by you. Therefore, until you release them, they continue to incur charges. Additionally, if the idle mode feature is not enabled, the charges are based on the unit prices of active instances at all times.
Idle instances
If the idle mode feature is enabled, the provisioned instances enter the idle state when they are not processing requests. The prices of idle instances are much lower than those of active instances. For more information, see Conversion factors.
Instance specifications
CPU instances
The following table describes the specifications of CPU instances. Configure your instances as needed.
vCPU
Memory size (MB)
Maximum code package size (GB)
Maximum function execution duration (second)
Maximum disk size (GB)
Maximum bandwidth (Gbit/s)
0.05 to 16
Note: The value must be a multiple of 0.05.
128 to 32768
Note: The value must be a multiple of 64.
10
86400
10
Valid values:
512 MB. This is the default value.
10 GB.
5
NoteThe ratio of vCPU to memory capacity (in GB) must be from 1:1 to 1:4.
GPU-accelerated instances
The following table describes the specifications of GPU-accelerated instances. Configure your instances as needed.
NoteAn fc.gpu.tesla.1 GPU-accelerated instance offers similar performance to an instance that uses NVIDIA T4 GPUs.
Instance type
Full GPU size (GB)
Computing power of full GPUs (TFLOPS)
Available specifications
On-demand mode supported or not
Regular provisioned mode supported or not
Idle provisioned mode supported or not
FP16
FP32
vGPU memory (MB)
vGPU computing power (GPU)
vCPU
Memory size (MB)
fc.gpu.tesla.1
16
65
8
Valid values: 1024 to 16384 (1 GB to 16 GB)
Note: The value must be a multiple of 1024.
The value is calculated based on the following formula: vGPU computing power = vGPU memory (in GB)/16 × full-GPU computing power. For example, if you set the vGPU memory to 5 GB, the maximum available vGPU computing power is 5/16 × full-GPU computing power.
Note: The computing power is automatically allocated by Function Compute.
Valid values: 0.05 to the value of [vGPU memory (in GB)/2].
Note: The value must be a multiple of 0.05. For more information, see GPU specifications.
Valid values: 128 to the value of [vGPU memory (in GB) x 2048].
Note: The value must be a multiple of 64. For more information, see GPU specifications.
Y
Y
Y
fc.gpu.ada.1
48
119
60
49152 (48 GB)
Note: Only the 48 GB vGPU memory specification is supported.
By default, the computing power of a full GPU is allocated.
Note: The computing power is automatically allocated by Function Compute.
8
Note: Only the 8-vCPU specification is supported.
65536 (64 GB)
Note: Only the 64 GB memory specification is supported.
N
Y
Y
The GPU-accelerated instances of Function Compute also support the following resource specifications.
Image size (GB)
Maximum function execution duration (second)
Maximum disk size (GB)
Maximum bandwidth (Gbit/s)
Container Registry Enterprise Edition (Standard Edition): 15
Container Registry Enterprise Edition (Advanced Edition): 15
Container Registry Enterprise Edition (Basic Edition): 15
Container Registry Personal Edition (free): 15
86400
10
5
NoteSetting the instance type to g1 achieves the same effect as setting the instance type to fc.gpu.tesla.1.
GPU-accelerated instances of Tesla series GPUs are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.
GPU-accelerated instances of Ada series GPUs are supported in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen).
GPU specifications
References
You can enable the idle mode feature when you configure auto scaling rules. For more information, see Configure auto scaling rules.
For more information about the billing methods and billable items of Function Compute, see Billing overview.
When you call an API operation to create a function, you can use the
instanceType
parameter to specify the instance type. For more information, see CreateFunction.For more information about how to specify the type and specifications of an instance in the Function Compute console, see Manage functions.