What are serverless GPUs - Function Compute - Alibaba Cloud Documentation Center

Provisioning resident GPU capacity means paying for idle resources during off-peak hours and planning capacity before you know your actual load. Serverless GPU eliminates that trade-off: you get on-demand GPU computing without managing servers or committing to a fixed resource size. Function Compute allocates GPU resources when your workload runs and releases them when it stops—so you pay only for what you use.

How it differs from resident GPUs

With resident GPUs, you provision a fixed capacity and pay for it around the clock, including periods when resources sit idle. Serverless GPU removes that overhead:

Aspect	Resident GPU	Serverless GPU
Capacity planning	Required upfront	Not required
Idle costs	Charged continuously	No charges at rest
Scale-out speed	Limited by provisioned capacity	Rapid scale-out via optimized start and stop
Resource management	You manage the infrastructure	Function Compute manages the infrastructure

Use cases

Use case	Description
AI model inference	Run inference on large models without reserving dedicated GPU capacity for variable traffic.
AI model training	Launch training jobs on demand and release resources immediately when training completes.
Audio and video acceleration	Process transcoding and production workloads with GPU acceleration, scaling up only when jobs are queued.
Graphics and image acceleration	Render or process images at scale with pay-as-you-go GPU resources.

Function Compute:Introduction to serverless GPUs

How it differs from resident GPUs

Use cases

Next steps