Function Compute offers four function types, three runtime environments, and multiple instance types. This topic covers the key differences between each option to help you choose the right combination for your workload.
Quick selection guide
| Workload | Function type | Runtime |
|---|---|---|
| Web apps and REST APIs | Web Function | Custom Runtime |
| Event-driven file and stream processing | Event Function | Built-in Runtime |
| AI inference (computer vision, AIGC) | GPU Function | Custom Container |
| Long-running and scheduled tasks | Task Function | Built-in Runtime |
Built-in Runtime and Custom Runtime are both deployed as code packages and are best suited for lightweight applications. GPU Functions support only Custom Container.
Function type selection
| Event Function | Web Function | Task Function | GPU Function | |
|---|---|---|---|---|
| Description | Processes files and data streams triggered by cloud service events, such as OSS triggers, Kafka triggers, and SLS triggers. | Supports popular web frameworks. Accessible from a browser or directly via URL. | Processes asynchronous requests. Tracks and saves the state of each stage of an asynchronous invocation. | Runs container images from popular AI projects such as Stable Diffusion WebUI, ComfyUI, RAG, and TensorRT. |
| Use cases | Cloud service integration: Real-time file processing with OSS, log processing with Simple Log Service (SLS). ETL data processing: Database cleaning, message queue processing. | Popular web frameworks: SpringBoot, Express, Flask, and more. Migrate existing apps: HTML5 websites, REST APIs, Backend for Frontend (BFF), mobile apps, mini programs, game settlements, and more. | General-purpose tasks: Scheduled, periodic, and scripted tasks. Multimedia processing: Video transcoding, live recording, image processing. | Traditional inference: Computer vision (CV) and natural language processing (NLP). AIGC model inference: Text-to-text, text-to-image, and text-to-audio generation. |
| Recommended runtime | Built-in Runtime | Custom Runtime | Built-in Runtime | Custom Container only |
| Asynchronous Task | Disabled by default | Disabled by default | Enabled by default | Disabled by default |
| Best suited for | Integrating with Alibaba Cloud services via event triggers | Building and migrating web applications and APIs | Long-running jobs that need state tracking | AI/ML inference workloads requiring GPU acceleration |
Runtime environment selection
| Built-in Runtime | Custom Runtime | Custom Container | |
|---|---|---|---|
| Development workflow | Write a request handler using the interfaces that Function Compute defines. | Develop with a web framework template and view the result at a public endpoint instantly. | Upload a custom image to Alibaba Container Registry (ACR) and deploy, or use an existing image in ACR. |
| Supported instance types | CPU instances | CPU instances | CPU instances and GPU instances |
| Single-instance concurrency | Not supported | Supported | Supported |
| Cold start | Fastest — the code package excludes the runtime. | Fast — the code package (an HTTP server) is larger, but requires no image pull. | Slower — requires pulling an image on cold start. |
| Code package format | ZIP, JAR (Java), or a folder | — | Container Image |
| Code package size limit | 500 MB in select regions (such as Hangzhou); 100 MB in other regions. Use Layers to add dependencies and reduce package size. | — | CPU instance image: 10 GB uncompressed. GPU instance image: 15 GB uncompressed. For AI inference, store large models in NAS or OSS to reduce image size. |
| Supported languages | Node.js, Python, PHP, Java, C#, Go | No restrictions | No restrictions |
| Best suited for | Lightweight functions using supported languages with the fastest cold starts | Web apps and APIs using any framework | GPU workloads and containerized deployments |
Instance type selection
CPU functions support only Elastic Instances. GPU Functions support three instance types, which you can switch between at any time without service interruption.
Decision guide
Use the following questions to find the right instance type:
Is your workload latency-sensitive and interactive? For example, a real-time chatbot or image generation API. If yes, use Provisioned Instances to eliminate cold starts and guarantee response times.
Does your traffic follow a predictable baseline with occasional spikes? If yes, use Mixed Mode (Provisioned + Elastic Instances) to maintain stable baseline capacity while absorbing traffic bursts.
Is your traffic variable, bursty, or low-frequency? If yes, use Elastic Instances and pay only for active usage.
Instance type comparison
| Elastic Instance | Provisioned Instance | Provisioned + Elastic (Mixed Mode) | |
|---|---|---|---|
| Applies to | CPU functions (only option); GPU Functions | GPU Functions only | GPU Functions only |
| Cold start | Yes, if minimum instances = 0. Set minimum instances to 1 or more to pre-allocate resources and reduce cold starts. | None. All requests within allocated capacity get a real-time response. | Partial. Requests within the provisioned pool have no cold start; elastic scale-out instances do. |
| Billing model | Pay-as-you-go | Subscription | Subscription (provisioned portion) + pay-as-you-go (elastic portion) |
| Best suited for | Variable or low-frequency traffic; cost-sensitive workloads | Latency-sensitive or stable traffic workloads | Workloads with a predictable baseline and unpredictable traffic bursts |
Elastic Instance
Elastic Instances scale automatically with request volume and are released when idle. Setting the minimum number of instances to 0 gives you a pure pay-as-you-go model — you pay only for active usage.
Cold start behavior: Cold starts occur when instances scale from zero. To reduce cold start latency, set the minimum number of instances to 1 or more. This pre-allocates elastic resources so instances are ready to handle incoming requests quickly.
Billing: Costs include charges for instances in both the active and Shallow Hibernation states. In Shallow Hibernation, vCPU resources are not charged and GPU resources are billed at one-fifth of the active rate. If you set the minimum number of instances to 1 or more, enable Shallow Hibernation to reduce idle costs.
Use Elastic Instances when:
Your traffic is variable, bursty, or low-frequency
You want to pay only for actual usage
Your workload can tolerate occasional cold start latency (or you mitigate it with a minimum instance count)
Provisioned Instance
Provisioned Instances apply only to GPU Functions. Purchase a Provisioned Resource Pool in advance, then allocate a specific number and type of instances to your function. This eliminates cold starts within your allocated capacity and gives you predictable, fixed costs.
After purchasing a monthly provisioned resource pool, the platform provides an additional boost instance quota at no extra charge.After purchasing a monthly provisioned resource pool, the platform allocates a certain quota of boost instances in addition to your subscription-based provisioned instances. This boost instance quota is not billed.
Cold start behavior: None. All requests within your allocated capacity receive a real-time response. Maximum concurrent requests = (Number of allocated Provisioned Instances) × (Instance concurrency) + boost instance quota+ the boost instance quota. Requests that exceed this limit are throttled.
Billing: The total subscription fee for all purchased Provisioned Resource Pools. Boost instances are not billed.. The boost instance quota is not billed
Provisioned Instances are available only for GPU Functions in the Ada, Ada.2, Ada.3, Hopper, or Xpu.1 series.
Use Provisioned Instances when:
Your workload is latency-sensitive and interactive (for example, a real-time chatbot or image generation API)
Your traffic is steady and predictable
You need guaranteed capacity and consistent response times
Provisioned + Elastic Instances (Mixed Mode)
Mixed Mode applies only to GPU Functions. It combines Provisioned and Elastic Instances: the provisioned pool handles steady-state traffic first, and elastic instances automatically scale out when requests exceed the provisioned capacity. This gives you a guaranteed baseline with the flexibility to absorb sudden traffic bursts.
Cold start behavior: Partial. Requests handled within the provisioned pool have no cold start. Requests that trigger auto-scaling to new elastic instances experience a cold start.
Billing: The provisioned portion is billed against your purchased Provisioned Resource Pool quota. Elastic instances launched beyond the provisioned quota are billed on a pay-as-you-go basis, at the same rates as active and Shallow Hibernation elastic instances.
Use Mixed Mode when:
Your traffic has a predictable baseline but occasional spikes
You want stable performance for normal load with the ability to handle burst traffic
You need a balance between cost predictability and scaling flexibility