Function Compute provides elastic instances and GPU-accelerated instances. This topic describes the instance types, specifications, usage notes, and usage modes of instances.
Instance types
- Elastic instance: The basic instance type of Function Compute. Elastic instances are suitable for scenarios in which burst traffic occurs and compute-intensive scenarios.
- GPU-accelerated instances: Instances that uses the Ampere and Turing architectures for acceleration in GPU scenarios. GPU-accelerated instances are mainly used in scenarios such as audio and video processing, AI, and image processing. Service loads in different scenarios are accelerated by using hardware-based accelerators to improve the efficiency of service processing. The following topics describe best practices for using GPU-accelerated instances in different scenarios:Important
- GPU-accelerated instances can be deployed only by using container images.
- For the best user experience, join the DingTalk group 11721331) and provide the following information:
- The name of the organization, such as the name of your company.
- The ID of your Alibaba Cloud account.
- The region where you want to use GPU-accelerated instances, such as China (Shenzhen).
- The contact information, such as your mobile number, email address, or DingTalk account.
Instance specifications
- Elastic instances
The following table describes the specifications of elastic instances. You can select instance specifications based on your business requirements.
vCPU Memory size (MB) Maximum code package size (GB) Maximum function execution duration (s) Maximum disk size (GB) Maximum bandwidth (Gbit/s) 0.05 to 16 The value must be a multiple of the 0.05.
128 to 32768 The value must be a multiple of 64.
10 86400 10 Valid values:- 512 MB. This is the default value.
- 10 GB.
5 Note The ratio of vCPU capacity to memory capacity (in GB) ranges from 1:1 to 1:4. - GPU-accelerated instances
The following table describes the specifications of GPU-accelerated instances. You can select instance specifications based on your business requirements.
Instance specifications Card type vGPU memory (GB) vGPU computing power (card) vCPU (core) Memory size (MB) fc.gpu.tesla.1 Tesla T4 1 to 16 The value must be an integer. Unit: GB.
The value is calculated based on the following formula: vGPU memory (GB)/16. For example, if you set the vGPU memory to 5 GB, you can use up to 5/16 memory cards. Description: The computing resources are automatically allocated by Function Compute. You do not need to manually allocate the computing resources.
Valid values: 0.05 to the value of [vGPU memory (GB)/2]. The value must be a multiple of 0.05. For more information, see GPU specifications.
Valid values: 128 to the value of [vGPU memory (GB) x 2048]. The value must be a multiple of 64. For more information, see GPU specifications.
fc.gpu.ampere.1 Ampere A10 1 to 24 The value must be an integer. Unit: GB.
The value is calculated based on the following formula: vGPU memory (GB)/24. For example, if you set the vGPU memory to 5 GB, you can use up to 5/24 memory cards. Description: The computing resources are automatically allocated by Function Compute. You do not need to manually allocate the computing resources.
Valid values: 0.05 to the value of [vGPU memory (GB)/3]. The value must be a multiple of 0.05. For more information, see GPU specifications.
Valid values: 128 to the value of [vGPU memory (GB) x 4096)/3]. The value must be a multiple of 64. For more information, see GPU specifications.
Function Compute GPU-accelerated instances also support the following resource specifications.
Image size (GB) Maximum function execution duration (s) Maximum disk size (GB) Maximum bandwidth (Gbit/s) Container Registry Enterprise Edition (Standard Edition): 10
Container Registry Enterprise Edition (Advanced Edition): 10
Container Registry Enterprise Edition (Basic Edition): 10
Container Registry Personal Edition (free): 10
86400 10 5 Note- Setting the instance type to g1 is equivalent to setting the instance type to fc.gpu.tesla.1.
- GPU-accelerated instances of the T4 type are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), and US (Virginia).
- GPU-accelerated instances of the A10 type are supported in the following regions: China (Hangzhou), China (Shanghai), China (Shenzhen), Japan (Tokyo), and US (Virginia).
GPU specifications
fc.gpu.tesla.1
. vGPU memory (GB) | vCPU (core) | Maximum memory size (GB) | Memory size (MB) |
---|---|---|---|
1 | 0.05 to 0.5 | 2 | 128 to 2048 |
2 | 0.05 to 1 | 4 | 128 to 4096 |
3 | 0.05 to 1.5 | 6 | 128 to 6144 |
4 | 0.05 to 2 | 8 | 128 to 8192 |
5 | 0.05 to 2.5 | 10 | 128 to 10240 |
6 | 0.05 to 3 | 12 | 128 to 12288 |
7 | 0.05 to 3.5 | 14 | 128 to 14336 |
8 | 0.05 to 4 | 16 | 128 to 16384 |
9 | 0.05 to 4.5 | 18 | 128 to 18432 |
10 | 0.05 to 5 | 20 | 128 to 20480 |
11 | 0.05 to 5.5 | 22 | 128 to 22528 |
12 | 0.05 to 6 | 24 | 128 to 24576 |
13 | 0.05 to 6.5 | 26 | 128 to 26624 |
14 | 0.05 to 7 | 28 | 128 to 28672 |
15 | 0.05 to 7.5 | 30 | 128 to 30720 |
16 | 0.05 to 8 | 32 | 128 to 32768 |
fc.gpu.ampere.1
. vGPU memory (GB) | vCPU (core) | Maximum memory size (GB) | Memory size (MB) |
---|---|---|---|
1 | 0.05 to 0.3 | 1.3125 | 128 to 1344 |
2 | 0.05 to 0.65 | 2.625 | 128 to 2688 |
3 | 0.05 to 1 | 4 | 128 to 4096 |
4 | 0.05 to 1.3 | 5.3125 | 128 to 5440 |
5 | 0.05 to 1.65 | 6.625 | 128 to 6784 |
6 | 0.05 to 2 | 8 | 128 to 8192 |
7 | 0.05 to 2.3 | 9.3125 | 128 to 9536 |
8 | 0.05 to 2.65 | 10.625 | 128 to 10880 |
9 | 0.05 to 3 | 12 | 128 to 12288 |
10 | 0.05 to 3.3 | 13.3125 | 128 to 13632 |
11 | 0.05 to 3.65 | 14.625 | 128 to 14976 |
12 | 0.05 to 4 | 16 | 128 to 16384 |
13 | 0.05 to 4.3 | 17.3125 | 128 to 17728 |
14 | 0.05 to 4.65 | 18.625 | 128 to 19072 |
15 | 0.05 to 5 | 20 | 128 to 20480 |
16 | 0.05 to 5.3 | 21.3125 | 128 to 21824 |
17 | 0.05 to 5.65 | 22.625 | 128 to 23168 |
18 | 0.05 to 6 | 24 | 128 to 24576 |
19 | 0.05 to 6.3 | 25.3125 | 128 to 25920 |
20 | 0.05 to 6.65 | 26.625 | 128 to 27264 |
21 | 0.05 to 7 | 28 | 128 to 28672 |
22 | 0.05 to 7.3 | 29.3125 | 128 to 30016 |
23 | 0.05 to 7.65 | 30.625 | 128 to 31360 |
24 | 0.05 to 8 | 32 | 128 to 32768 |
Usage notes
If you want to reduce the cold start duration or improve resource utilization, you can use the following solutions.
- Provisioned mode: the ideal solution to resolve the cold start issue. You can reserve a fixed number of instances based on your resource budget, reserve resources for a specified period of time based on business fluctuations, or select an auto scaling policy based on usage thresholds. The average cold start latency of instances is significantly reduced when the provisioned mode is used.
- High concurrency for a single instance: the ideal solution to improve resource utilization of instances. We recommend that you configure high concurrency for instances based on the resource demands of your business. If you use this solution, the CPU and memory are preemptively shared when multiple tasks are executed on one instance at the same time. This way, resource utilization is improved.
Instance modes
GPU-accelerated instances and elastic instances support the following usage modes.
On-demand mode
In on-demand mode, Function Compute automatically allocates and releases instances for functions. In this mode, the billed execution duration starts from the time when a request is sent to execute a function and ends at a time when the request is completely executed. An on-demand instance can process one or more requests at a time. For more information, see Configure instance concurrency.
- Execution duration of functions by a single instance that processes a single request at a time When an on-demand instance processes a single request, the billed execution duration starts from the time when the request arrives at the instance to the time when the request is completely executed.
- Execution duration of functions by a single instance that concurrently processes multiple requests at a time
If you use an on-demand instance to concurrently process multiple requests, the billed execution duration starts from the time when the first request arrives at the instance to the time when the last request is completely executed. You can reuse resources to concurrently process multiple requests. This way, resource costs can be reduced.
Provisioned mode

Additional information
- For more information about the billing methods, see Billing overview.
- For more information about how to use SDKs to configure and change an instance for a function, see Set the instance type for a function.
- For more information about how to specify an instance type and the instance specifications in the Function Compute console, see Manage functions.