Provisioning resident GPU capacity means paying for idle resources during off-peak hours and planning capacity before you know your actual load. Serverless GPU eliminates that trade-off: you get on-demand GPU computing without managing servers or committing to a fixed resource size. Function Compute allocates GPU resources when your workload runs and releases them when it stops—so you pay only for what you use.
How it differs from resident GPUs
With resident GPUs, you provision a fixed capacity and pay for it around the clock, including periods when resources sit idle. Serverless GPU removes that overhead:
| Aspect | Resident GPU | Serverless GPU |
|---|---|---|
| Capacity planning | Required upfront | Not required |
| Idle costs | Charged continuously | No charges at rest |
| Scale-out speed | Limited by provisioned capacity | Rapid scale-out via optimized start and stop |
| Resource management | You manage the infrastructure | Function Compute manages the infrastructure |
Use cases
| Use case | Description |
|---|---|
| AI model inference | Run inference on large models without reserving dedicated GPU capacity for variable traffic. |
| AI model training | Launch training jobs on demand and release resources immediately when training completes. |
| Audio and video acceleration | Process transcoding and production workloads with GPU acceleration, scaling up only when jobs are queued. |
| Graphics and image acceleration | Render or process images at scale with pay-as-you-go GPU resources. |