All Products
Search
Document Center

Function Compute:Introduction to serverless GPUs

Last Updated:Mar 31, 2026

Provisioning resident GPU capacity means paying for idle resources during off-peak hours and planning capacity before you know your actual load. Serverless GPU eliminates that trade-off: you get on-demand GPU computing without managing servers or committing to a fixed resource size. Function Compute allocates GPU resources when your workload runs and releases them when it stops—so you pay only for what you use.

How it differs from resident GPUs

With resident GPUs, you provision a fixed capacity and pay for it around the clock, including periods when resources sit idle. Serverless GPU removes that overhead:

AspectResident GPUServerless GPU
Capacity planningRequired upfrontNot required
Idle costsCharged continuouslyNo charges at rest
Scale-out speedLimited by provisioned capacityRapid scale-out via optimized start and stop
Resource managementYou manage the infrastructureFunction Compute manages the infrastructure

Use cases

Use caseDescription
AI model inferenceRun inference on large models without reserving dedicated GPU capacity for variable traffic.
AI model trainingLaunch training jobs on demand and release resources immediately when training completes.
Audio and video accelerationProcess transcoding and production workloads with GPU acceleration, scaling up only when jobs are queued.
Graphics and image accelerationRender or process images at scale with pay-as-you-go GPU resources.

Next steps