Create a GPU function by using a container image to improve development and delivery efficiency - Function Compute

You can deploy applications that require GPU-accelerated instances as functions using container images. This approach is ideal for popular AI projects, such as Stable Diffusion WebUI, ComfyUI, retrieval-augmented generation (RAG), and TensorRT. Using container images to deliver functions improves development and delivery efficiency.

Create a function

Log on to the Function Compute console. In the navigation pane on the left, choose Function Management > Function List.
In the top menu bar, select a region. On the Function List page, click Create Function.
In the dialog box that appears, select GPU Function and then click Create GPU Function.

On the Create GPU Function page, set the following parameters and then click Create.

Basic Configurations: Enter a Function Name. The name must be unique within the same Alibaba Cloud account and region, and must follow the naming conventions.

Elastic Configurations: Select an instance type. You cannot use provisioned instances and on-demand instances at the same time. After the function is created, you cannot change the instance type.

On-demand instances

Configuration Item	Description	Example
Instance Type	Select On-demand Instance. Instances scale automatically based on request volume and are released when there are no requests. You are billed for what you use.	On-demand Instance
GPU Card Type	Select a GPU card type. For more information about the specifications supported by different card types, see Instance types and specifications.	Ada series
Specifications	Set the GPU Memory, vCPU, Memory, and Disk specifications for the function based on your business needs. After you set the specifications, the usage of each resource is calculated by multiplying the specification by the duration of use. For more information, see Billing overview. Note All directories on the disk are writable. The disk space is shared. The disk is tied to the instance lifetime of the underlying function. When the system reclaims the instance, data on the disk is lost. If you need persistent storage, you can mount a NAS file system or an OSS bucket. For more information, see Configure a NAS file system and Configure Object Storage Service.	GPU Memory: 48 GB vCPU: 8 vCPU Memory: 64 GB Disk: 512 MB (not billed, Function Compute provides a free quota of 10 GB disk space)
Minimum Instances	If your business is latency-sensitive, after you select Elastic Instance, we recommend that you set the minimum number of instances to 1 or greater to lock resources in advance and reduce cold start latency. Note After you set Minimum Instances to 1 or more, if no elastic policy for the minimum number of instances is configured or if no elastic policy is active for a period, the current minimum number of instances is the value you set here. If multiple elastic policies are configured, the system calculates the Minimum Number Of Instances required when each policy is triggered. The system then uses the highest value among the active policies as the current Minimum Number Of Instances. For more information, see How is the current minimum number of instances calculated?.	1
Concurrency Per Instance	You can configure multiple concurrent requests for a single GPU function instance. This means a single instance can process multiple requests simultaneously. For more information, see Configure concurrency per instance.

Provisioned instances

Configuration Item	Description	Example
Instance Type	Select Provisioned Instance. Instances are allocated to the function from a pre-purchased provisioned resource pool. Provisioned instances are recommended for scenarios where predictable costs, low latency, and high resource utilization are important to ensure business stability.	Provisioned Instance
Provisioned Resource Pool	A provisioned resource pool is a pool of provisioned instances that can be allocated to the target function. If your provisioned resource pool has insufficient capacity, click Scale-out in the Actions column and follow the on-screen instructions to expand it. For more information, see Provisioned resource pools (subscription).	Provisioned Resource Pool: fc-pool-** GPU Card Type**: Ada
Specifications	Set the GPU Memory, vCPU, Memory, and Disk specifications for the function based on your business needs. After you set the specifications, the usage of each resource is calculated by multiplying the specification by the duration of use. For more information, see Billing overview. Note All directories on the disk are writable. The disk space is shared. The disk is tied to the instance lifetime of the underlying function. When the system reclaims the instance, data on the disk is lost. If you need persistent storage, you can mount a NAS file system or an OSS bucket. For more information, see Configure a NAS file system and Configure Object Storage Service.	GPU Memory: 48 GB vCPU: 8 vCPU Memory: 64 GB Disk: 512 MB (not billed, Function Compute provides a free quota of 10 GB disk space)
Number Of Provisioned Instances	Allocate a number of provisioned instances to the target function based on the resources available in the provisioned resource pool.	1
Concurrency Per Instance	You can configure multiple concurrent requests for a single GPU function instance. This means a single instance can process multiple requests simultaneously. For more information, see Configure concurrency per instance.	20

Function Code: Configure the function's runtime environment and code.

Configuration Item	Description	Example
Runtime Environment	Use Sample Image: Select a sample image provided by Function Compute to quickly deploy an image-based function. Select the target image from the image list under the Container Image configuration item. Use Image from ACR: Under the Container Image configuration item, click Select Image From ACR. In the Select Container Image panel, select the created Container Registry instance and ACR image repository. Then, find the target image in the image area below and click Select in the Actions column. For more information, see Create a function that uses a custom image.	Custom Image > Use Sample Image
Container Image	Select the target image.	SpringBoot Web Application Sample Image
Startup Command	The startup command for the program. If you do not configure a startup command, the Entrypoint/CMD from the image is used by default.	None
Listener Port	The port that the HTTP server in your code listens on.	9000
Execution Timeout	Set the timeout period. The default Execution Timeout is 60 seconds, and the maximum is 86400 seconds.	60

Instance Prefetch: In AI inference scenarios, you can configure instance prefetch to pre-warm the model. This eliminates the cold start latency for the first request.

Configuration Item	Description	Example
Instance Prefetch
Instance Prefetch	Configure an Initializer hook to pre-warm the instance and optimize cold starts. The hook runs a specified script or calls an interface to load the model after the function instance starts but before it processes requests. For more information about Initializer hooks, see Configure the instance lifecycle.	Enabled
Timeout	Set the timeout period for the Initializer hook.	60
Prefetch Program Type	You can configure two types of Initializer hooks to pre-warm the model: Execute Instruction and Invoke Code.	Execute Instruction
Instruction Content	Configure the content of the instruction to execute. You can use custom shell implementations, such as `/bin/bash`, `/bin/sh`, `/bin/csh`, and `/bin/zsh`. Make sure the function's runtime environment supports the selected shell.	See Callback method implementation

Permissions, Network, and Storage: Configure the function's access role, network settings, and storage mounts.

Parameter	Description	Example
Function Role	The Function Compute platform uses this RAM role to generate temporary keys for accessing Alibaba Cloud resources and passes them to the code. For more information, see Use a function role to grant Function Compute permissions to access other Alibaba Cloud services.	mytestrole
Allow Access To VPC	Enable this to allow the function to access resources in a VPC. For more information, see Configure network settings.	Enabled
VPC	Required if you set Allow Access To VPC to Yes. Create a new VPC or select a VPC ID from the drop-down list.	fc.auto.create.vpc.1632317****
VSwitch	Required if you set Allow Access To VPC to Yes. Create a new vSwitch or select a vSwitch ID from the drop-down list.	fc.auto.create.vswitch.vpc-bp1p8248****
Security Group	Required if you set Allow Access To VPC to Yes. Create a new security group or select a security group from the drop-down list.	fc.auto.create.SecurityGroup.vsw-bp15ftbbbbd****
Allow Default NIC To Access Public Network	Allow the function to access the public network through the default network interface card. Important When you use a static public IP address, you must disable Allow Default NIC To Access Public Network. Otherwise, the configured static public IP address does not take effect. For more information, see Configure a static public IP address.	Enabled
Mount NAS File System	Mount a NAS file system to the function for persistent storage of shared data, such as models shared by multiple inference functions. For more information, see Configure a NAS file system. If you select automatic configuration, the system uses an existing General-purpose NAS file system named Alibaba-Fc-V3-Component-Generated. If a qualifying NAS file system does not exist in your account, the system creates one.	Enabled
Mount OSS Object Storage	Mount an OSS bucket to the function for persistent storage of logs, business files, and other data. For more information, see Configure Object Storage Service (OSS).	Enabled

Logs And Tracing Analysis

Parameter

Description

Example

Log Feature

Persistently save the function's execution logs to Simple Log Service. This helps with code debugging, troubleshooting, and data analytics. For more information, see Configure the logging feature.

Automatic Configuration: Automatically selects a log project that starts with serverless-<region_id>.
Only one such log project is created in each region. If the system finds that this log project already exists in the current region, it uses the existing project.
Custom Configuration: Manually specify the destination Log Project and Logstore.

Enabled

More Configurations

Parameter	Description	Example
Time Zone	Select the time zone for the function. This automatically adds the TZ environment variable to the function with the selected time zone as its value.	UTC
Tags	Set tags for the function to group and manage functions. You must set both a tag key and a tag value.	key : value
Resource Group	Select the resource group for the function. Use resource groups to manage your functions in groups.	Default Resource Group
Environment Variables	Use environment variables to flexibly adjust the function's behavior without changing the code. For more information, see Configure environment variables.	`{ "BUCKET_NAME": "MY_BUCKET", "TABLE_NAME": "MY_TABLE" }`

Edit a function

After a function is created, you can change its image by editing the runtime on the Configuration tab of the function details page.

For information about other modifications, such as changing environment variables or log storage settings, see Configure a function.

Delete a function

Log on to the Function Compute console. On the Function List page, find the function you want to delete and click Delete in the Actions column. In the dialog box that appears, confirm that the function has no attached resources, such as triggers or elastic policies for minimum instances. Then, confirm the deletion.

Get the function ARN

An Alibaba Cloud Resource Name (ARN) is used to identify an Alibaba Cloud resource in your code. You can obtain the ARN of a function to reference it.

Log on to the Function Compute console. In the navigation pane on the left, choose Function Management > Function List.
In the top menu bar, select a region. Then, on the Function List page, click the name of the function.
On the Function Details page, click Copy ARN on the right to obtain the ARN of the target function.

References

Function Compute provides four function types for different scenarios: event functions, web functions, task functions, and GPU functions. For information about how to choose a function type for your scenario, see Technology selection guide.
In addition to the console, Function Compute provides APIs and the Serverless Devs tool to manage functions. For more information, see CreateFunction and Quick Start for Serverless Devs.
If a function execution times out, see What do I do if a "Function timeout" error occurs?.
Functions that are invoked infrequently may have longer invocation times. For more information about the reason, see Why do infrequently used functions have long invocation times?. To eliminate the impact of cold start latency, set the minimum number of instances to 1 or more.