Basic CPU instances are usually sufficient for general-purpose computing scenarios on Function Compute, such as web services and data processing. However, for scenarios that require large-scale parallel computing or deep learning, such as audio and video processing, artificial intelligence (AI) inference, and image editing, GPU-accelerated instances can significantly improve computing efficiency.
For GPU-accelerated instances, Function Compute provides two instance types: elastic instances and resident instances. You can choose the instance type and specifications that best fit your business needs to maximize resource utilization and performance while ensuring your services run reliably.
Instance type selection
CPU functions support only elastic instances. For GPU functions, you can choose between elastic instances and provisioned instances as needed for resource utilization, latency, and cost stability. For a detailed selection guide, see the following flowchart.
You can bind provisioned instances only to GPU functions that belong to the Ada, Ada.2, Ada.3, Hopper, or Xpu.1 series.
Elastic instances
If you set the minimum number of instances for a function to 0, instances automatically scale based on the request volume and are released when there are no requests. This means you are billed based on usage and pay nothing when the function is not in use, which maximizes cost savings. The more frequent the business requests, the higher the resource utilization and the greater the cost savings compared to using elastic virtual machines.
Are there cold starts?
Yes. For latency-sensitive businesses, you can set the minimum number of instances to 1 or more to mitigate cold starts. This method pre-allocates elastic resources. When a request arrives, the instance is quickly activated to execute the request.
Billing (Pay-as-you-go)
The usage cost of a function is the sum of the fees for active elastic instances and shallow hibernation (formerly idle) elastic instances. If you set the minimum number of instances to 1 or more, you can enable the shallow hibernation mode. In the shallow hibernation state, vCPU usage is free, and GPU usage is billed at only 20% of the regular rate. This cost is much lower than that of active elastic instances.
For more information about the scenarios for active and shallow hibernation elastic instances, see Elastic instances.
Provisioned instances
This instance type applies only to GPU functions. You can purchase a provisioned resource pool in advance and then allocate a specific number and type of provisioned instances to a function from the resource pool. This method provides predictable and fixed usage costs and is suitable for scenarios with high resource utilization, strict latency requirements, or a need for stable costs.
Are there cold starts?
No. When you use provisioned instances, the maximum number of requests that a function can process simultaneously is determined by the following formula: = Number of allocated provisioned instances × Instance concurrency. Requests that exceed this limit are throttled. Requests within the limit receive a real-time response, which completely eliminates cold starts.
Billing (Subscription)
The function cost is the total subscription fee for all purchased provisioned resource pools.
Instance specifications
CPU instances
vCPU (core)
Memory size (MB)
Maximum code package size (GB)
Maximum function execution duration (s)
Maximum disk size (GB)
Maximum bandwidth (Gbps)
0.05 to 16
Note: The value must be a multiple of 0.05.
128 to 32768
Note: The value must be a multiple of 64.
10
86400
10
Valid values:
512 MB. This is the default value.
10 GB.
5
NoteThe ratio of vCPUs to memory size (in GB) must be between 1:1 and 1:4.
GPU-accelerated instances
NoteThe fc.gpu.tesla.1 instance type provides performance comparable to an NVIDIA T4 GPU.
Instance type
Supported
instance types
Full card GPU memory (GB)
Full card computing power (TFLOPS)
Optional chunking specifications
FP16 computing power
FP32 computing power
vGPU memory (MB)
vGPU computing power (card)
vCPU (core)
Memory size (MB)
fc.gpu.tesla.1
Elastic instance
16
65
8
16384 (16 GB)
Note: Only full card memory is supported. If you purchase multiple cards, all resources are multiplied by the number of cards.
Full card computing power is allocated by default.
Note: The computing power is automatically allocated by Function Compute and does not need to be manually configured.
The value ranges from 0.05 to (vGPU memory in GB / 2).
Note: The value must be a multiple of 0.05.
The value ranges from 128 to (vGPU memory in GB × 2048).
Note: The value must be a multiple of 64.
fc.gpu.ada.1
Elastic instance
Resident instance
48
119
60
49152 (48 GB)
Note: Only full card memory is supported. If you purchase multiple cards, all resources are multiplied by the number of cards.
Valid values: 4, 8, or 16.
Valid values: 32768, 65536, or 98304.
fc.gpu.ada.2
Elastic instance
Resident instance
24
166
83
24576 (24 GB)
Note: Only full card memory is supported. If you purchase multiple cards, all resources are multiplied by the number of cards.
Valid values: 8 or 16.
Valid values: 32768 or 65536.
fc.gpu.ada.3
Elastic instance
Resident instance
48
148
73.54
49152 (48 GB)
Note: Only full card memory is supported. If you purchase multiple cards, all resources are multiplied by the number of cards.
Valid values: 8 or 16.
Valid values: 65536 or 98304.
fc.gpu.hopper.1
Elastic instance
Resident instance
96
148
44
98304 (96 GB)
Note: Only full card memory is supported. If you purchase multiple cards, all resources are multiplied by the number of cards.
16
Valid value: 98304.
fc.gpu.xpu.1
Elastic instance
Resident instance
96
123
61.5
98304 (96 GB)
Note: Only full card memory is supported. If you purchase multiple cards, all resources are multiplied by the number of cards.
16
Valid value: 98304.
GPU-accelerated instances also support the following resource specifications.
Image size (GB)
Maximum function execution duration (s)
Maximum disk size (GB)
Maximum bandwidth (Gbps)
ACR Enterprise Edition (Standard Edition): 15
ACR Enterprise Edition (Premium Edition): 15
ACR Enterprise Edition (Basic Edition): 15
ACR Personal Edition (Free): 15
86400
10
5
NoteSetting the instance type to g1 is equivalent to setting it to fc.gpu.tesla.1.
Tesla series GPU-accelerated instances are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.
Ada series GPU-accelerated instances are supported in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), China (Shenzhen), Singapore, and US (Virginia).
Relationship between GPU instance specifications and instance concurrency
An Ada.1 GPU has 48 GB of memory, and a Tesla series GPU has 16 GB of memory. Function Compute allocates the full memory of a GPU card to a single GPU container. Because the default GPU card quota is a maximum of 30 per region, a maximum of 30 GPU containers can run simultaneously in that region.
If the instance concurrency of a GPU function is 1, the function can process up to 30 inference requests concurrently in a region.
If the instance concurrency of a GPU function is 5, the function can process up to 150 inference requests concurrently in a region.
Single-instance concurrency
To improve resource utilization, you can configure single-instance concurrency based on your application's resource requirements. In this configuration, multiple tasks can run on a single instance and share CPU and memory resources, which improves overall resource utilization. For more information, see Configure instance concurrency.
Execution duration for single-instance, single-concurrency
When an instance executes a single request, the execution duration is measured from when the request arrives at the instance to when the request execution is complete.
Execution duration for single-instance, multiple-concurrency
When an instance executes multiple requests concurrently, the execution duration is measured from the time the first request arrives at the instance to the time the last request is completed. This resource reuse helps save costs.
References
For more information about the billing methods and billable items of Function Compute, see Billing overview.
When you use an API to create a function, you can use the
instanceTypeparameter to specify the instance type. For more information, see CreateFunction.To learn how to specify the instance type and specifications in the console, see Create a function.