In Function Compute, you can use instances in the on-demand mode and provisioned mode. This topic describes the principles, billing methods, the idle billing feature, and the instance scaling limits of the two modes. This topic also describes how to configure provisioned instances and the auto scaling rules for provisioned instances in the Function Compute console.
On-demand mode
Introduction
In the on-demand mode, instances are allocated and released by Function Compute. Function Compute automatically scales instances based on the number of function invocations. Function Compute creates instances when the number of function invocations increases, and destroys instances when the number of requests decreases. During the entire process, instance creation is automatically triggered by requests. On-demand instances are destroyed when no requests are sent for processing for a period of time (usually 3 to 5 minutes). When you invoke an on-demand instance for the first time, you must wait for the cold start of the instance to complete.
By default, a maximum of 300 on-demand instances can be created for an Alibaba Cloud account in each region. If you want to increase the upper limit, join DingTalk group 11721331 to contact the technical support.
Billing method
You are charged only when your function is invoked. If no requests are sent to your function, no instances are allocated and you are not charged for any fees. For more information about the pricing and billing of Function Compute, see Billing overview.
Provisioned mode
Introduction
On-demand instances are automatically created when requests are sent. When you invoke on-demand instances for the first time, you must wait for the cold start of the instance to complete. If you want to eliminate the impact of cold starts, you can use provisioned instances.
In provisioned mode, you can manage the allocation and release of function instances. The provisioned instances are retained unless you release them. Provisioned instances takes precedence over on-demand instances. If the number of provisioned instances is not enough to process all requests, Function Compute allocates on-demand instances to process the remaining requests.
Billing method

Idle billing
By default, the idle billing feature is disabled. In this case, the provisioned instances in Function Compute are always allocated with CPU resources even when no requests are being processed. This ensures that the instances can run background tasks when no requests are made. After the idle billing feature is enabled, Function Compute freezes the CPU resources of the provisioned instances when the provisioned instances are not processing any request. This way, the instances are in the idle state and you are charged based on the idle billing unit prices. The unit prices for idle instance resources are 10% of the unit prices of active instance resources. For more information, see Billing overview.
You can enable the idle billing feature based on your business requirements.
- Costs
If you want to use provisioned instances to eliminate cold stars and save costs, we recommend that you enable the idle billing feature. The idle billing feature allows you to pay fewer fees for provisioned instances when they are in the idle state, and requests can be responded without cold starts.
- Background tasksIf your function needs to run background tasks, we recommend that you disable the idle billing feature. Example scenarios:
- Some application frameworks rely on the built-in scheduler or background features. Some dependent middleware needs to regularly report heartbeats.
- Sone asynchronous operations are performed by using Goroutine lightweight threads that use Go, the async functions that use Node.js, or the asynchronous threads that use the Java language.
Instance scaling limits
Scaling limits for on-demand instances
- Default upper limit for running instances per region: 300.
- The scale-out speed of the running instances is limited by the upper limit of burst instances and the growth rate of the instances. For limits for different regions, see Limits on scaling speeds of instances in different regions.
- Burst instances: the number of instances that can be immediately created. The default upper limit for burst instances is 100 or 300.
- Instance growth rate: the number of instances that can be added per minute after the upper limit for burst instances is reached. The default upper limit of growth rate is 100 or 300.
HTTP status code
is 429
. The following figure shows how Function Compute performs throttling in a scenario where the number of invocations increases rapidly. 
- ①: Before the upper limit for burst instances is reached, Function Compute creates instances immediately. During this process, a cold start occurs but no throttling error is reported.
- ②: When the limit for burst instances is reached, the increase of instances is restricted by the growth rate. Throttling errors are reported for some requests.
- ③: When the upper limit of instances is reached, some requests are throttled.
By default, all functions within an Alibaba Cloud account in the same region share the preceding scaling limits. To configure the limit on the number of instances for a specific function, see Overview of configuring the maximum number of on-demand instances. After the configuration, Function Compute returns a throttling error when the total number of running instances for the function exceeds the configured limit.
Scaling limits for provisioned instances
- Default upper limit for provisioned instances per region: 300.
- Default upper limit for the scaling speed of provisioned instances per minute: 100 or 300. The limit varies based on the region. For more information, see Limits on scaling speeds of instances in different regions. The following figure shows how Function Compute performs throttling when provisioned instances are configured in the same loading scenario as the preceding figure.
- ①: Before the provisioned instances are fully used, the requests are processed immediately. During this process, no cold start occurs and no throttling error is reported.
- ②: When the provisioned instances are fully used, Function Compute creates instances immediately before the upper limit for burst instances is reached. During this process, a cold start occurs but no throttling error is reported.
Limits on scaling speeds of instances in different regions
Region | Number of burst instances | Instance growth rate |
---|---|---|
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen) | 300 | 300 per minute |
Others | 100 | 100 per minute |
- In the same region, the limits on scaling speeds of provisioned instances and on-demand instances are the same.
- If you want to increase the scaling speed, join the DingTalk group 11721331 to contact the technical support.
- The scaling speed of GPU-accelerated instances is lower than that of elastic instances. We recommend that you use performance instances together with the provisioned mode.
Configure provisioned instances
- Log on to the Function Compute console. In the left-side navigation pane, click Services & Functions.
- In the top navigation bar, select a region. On the Services page, click the desired service.
- On the Functions page, click the function that you want to modify.
- On the Function Details page, click the Auto Scaling tab and click Create Rule.
- On the page that appears, configure the following parameters and click Create.
Parameter Description Basic Configurations Version or Alias Select the version or alias for which you want to create provisioned instances. Note You can create provisioned instances only for the LATEST version.Minimum Number of Instances Enter the number of provisioned instances that you want to create. The minimum number of instances equals the number of provisioned instances to be created. Note You can set the minimum number of function instances to reduce cold starts and the time needed to respond to function invocation requests. This helps improve the service performance for online applications that are sensitive to response latency.Idle Billing Enable or disable the idle billing feature. By default, the idle billing feature is disabled. Take note of the following items: - When this feature is enabled, provisioned instances are allocated with CPU resources only when they are processing requests. The CPU resources of the instances are frozen when the instances are not processing any request.
- When this feature is disabled, provisioned instances are allocated with CPU resources whether they are processing requests or not.
Maximum Number of Instances Enter the maximum number of instances. The maximum number of instances equals the number of provisioned instances plus the maximum number of on-demand instances that can be allocated. Note- You can set the maximum number of function instances to prevent a single function from using a large number of instances because of excessive invocations protect backend resources, and prevent unexpected charges.
- If you do not configure this parameter, the maximum number of instances is determined by the resource limits for the current region and Alibaba Cloud account.
(Optional) Scheduled Setting Modification: You can create scheduled scaling rules to flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a scheduled time. This way, the number of provisioned instances can meet the concurrency requirement of your business. Policy Name Enter a custom policy name. Minimum Number of Instances Enter the minimum number of provisioned instances. Schedule Expression (UTC) Enter the expression of the schedule. Example: cron(0 0 20 * * *). For more information, see Parameters. Effective Time (UTC) Set the time when the configurations of scheduled scaling start to take effect and expire. (Optional) Metric-based Setting Modification: Provisioned instances are scaled in or out every minute based on the metrics of instances or concurrency utilization of provisioned instances. Policy Name Enter a custom policy name. Minimum Range of Instances Specify the value range for the minimum number of provisioned instances. Utilization Type This parameter is displayed only when GPU-accelerated instances are configured. Select the types of metrics based on which the auto scaling policy is configured. For more information about the auto scaling policies of GPU-accelerated instances, see Create an auto scaling policy for provisioned GPU-accelerated instances. Usage Threshold Configure the scaling range. Scale-in is performed if the values of metrics or the concurrency utilization of provisioned instances is lower than the specified values. Scale-out is performed if the values of metrics or the concurrency utilization of provisioned instances is higher than the specified values. Effective Time (UTC) Specify the time when the configurations of metric-based auto scaling start to take effect and expire. After the auto scaling rule is created, you can go to the Auto Scaling page of the service and view details of the rule.
You can modify or delete the number of provisioned instances as prompted.
Create an auto scaling rule for a provisioned instance
In provisioned mode, Function Compute creates a specified number of instances, but the instances may not be fully used. You can use Scheduled Setting Modification and Metric-based Setting Modification to make better use of provisioned instances.
Scheduled Setting Modification
- Definition: Scheduled scaling helps you flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a specified time so that the number of instances can meet the concurrency of your business.
- Applicable scenarios: Functions work based on periodic rules or predictable traffic peaks. If the provisioned instances are insufficient to process all the function invocation requests, the remaining requests are processed by on-demand instances. For more information, see Instance types and instance modes.
- Sample configuration: The following figure shows two scheduled actions that are configured. The first scheduled action scales out the provisioned instances before the traffic peak, and the second scheduled action scales in the provisioned instances after the traffic peak.
{
"ServiceName": "service_1",
"FunctionName": "function_1",
"Qualifier": "alias_1",
"ScheduledActions": [
{
"Name": "action_1",
"StartTime": "2022-11-01T10:00:00Z",
"EndTime": "2022-11-30T10:00:00Z",
"TargetValue": 50,
"ScheduleExpression": "cron(0 0 20 * * *)"
},
{
"Name": "action_2",
"StartTime": "2022-11-01T10:00:00Z",
"EndTime": "2022-11-30T10:00:00Z",
"TargetValue": 10,
"ScheduleExpression": "cron(0 0 22 * * *)"
}
]
}
Parameter | Description |
---|---|
Name | The name of the scheduled scaling task. |
StartTime | The time when the configurations start to take effect. Specify the value in UTC. |
EndTime | The time when the configurations expire. Specify the value in UTC. |
TargetValue | The target value. |
ScheduleExpression | The expression that specifies when to run the scheduled task. The following formats are supported:
|
Field name | Valid values | Allowed special characters |
---|---|---|
Seconds | 0 to 59 | None |
Minutes | 0 to 59 | , - * / |
Hours | 0 to 23 | , - * / |
Day-of-month | 1 to 31 | , - * ? / |
Month | 1 to 12 or JAN to DEC | , - * / |
Day-of-week | 1 to 7 or MON to SUN | , - * ? |
Character | Description | Example |
---|---|---|
* | Specifies any or each. | In the Minutes field, 0 specifies that the task is run at the 0th second of every minute. |
, | Specifies a value list. | In the Day-of-week field, MON, WED, FRI specifies every Monday, Wednesday, and Friday. |
- | Specifies a range. | In the Hours field, 10-12 specifies a time range from 10:00 to 12:00 in UTC. |
? | Specifies an uncertain value. | This special character is used together with other specified values. For example, if you specify a specific date, but you do not require the specified date to be a specific day of the week, you can use this special character in the Day-of-week field. |
/ | Specifies increments. n/m specifies an increment of m starting from the position of n. | In the minute field, 3/5 indicates that the operation is performed every 5 minutes starting from the third minute within an hour. |
Metric-based Setting Modification
- Definition: Metric-based auto-scaling tracks metrics to dynamically scale provisioned instances.
- Scenario: Function Compute collects the concurrency utilization of provisioned instances or resource utilization metrics of instances on a regular basis, and uses this metrics together with the threshold values that you specify for scaling operations, to control the scaling of provisioned instances. This way, the number of provisioned instances can be adjusted to meet your business needs.
- Principle: Provisioned instances are scaled every minute based on the metric value.
- If the metric value exceeds the threshold that you configure, the system rapidly performs scale-out operations to adjust the number of provisioned instances to the target value.
- If the metric value is lower than the threshold that you configure, the system slightly performs scale-in operations to adjust the number of provisioned instances to the specified value.
- Sample configuration: The following figure shows an example of metric-based scaling for provisioned instances.
- When the traffic increases and the threshold is triggered, provisioned instances start to be scaled out until the number of provisioned instances reaches the upper limit. Requests that cannot be processed by the provisioned instances are allocated to on-demand instances.
- When the traffic decreases and the threshold is triggered, provisioned instances start to be scaled in.
Only the statistics on provisioned instances are collected to calculate the concurrency utilization of provisioned instances. The statistics on on-demand instances are not included.
The metric is calculated based on the following formula: The number of concurrent requests to which provisioned instances are responding/The maximum number of concurrent requests to which all provisioned instances can respond. The metric value ranges from 0 to 1.
- Each instance processes a single request at a time: Maximum concurrency = Number of instances.
- Each instance concurrently processes multiple requests: Maximum concurrency = Number of instances × Number of requests concurrently processed by one instance.
- The values are determined by the current metric value, metric target, number of provisioned instances, and scale-in factor.
- Calculation principle: The system scales in provisioned instances based on the scale-in factor. The factor value ranges from 0 (excluded) to 1. The scale-in factor is a system parameter that is used to slow down the scale-in speed. You do not need to set the scale-in factor. The target values for scaling operations are the smallest integers that are greater than or equal to the following calculation results:
- Scale-out target value = Current provisioned instances × (Current metric value/Metric target)
- Scale-in target value = Current provisioned instances × Scale-in factor x (1 - Current metric value/Metric target)
- Example: If the current metric value is 80%, the metric target is 40%, and the number of provisioned instances is 100, the target value is calculated based on the following formula: 100 × (80%/40%) = 200. The number of provisioned instances is increased to 200 to ensure that the metric target remains near the 40%.
{
"ServiceName": "service_1",
"FunctionName": "function_1",
"Qualifier": "alias_1",
"TargetTrackingPolicies": [
{
"Name": "action_1",
"StartTime": "2022-11-01T10:00:00Z",
"EndTime": "2022-11-30T10:00:00Z",
"MetricType": "ProvisionedConcurrencyUtilization",
"MetricTarget": 0.6,
"MinCapacity": 10,
"MaxCapacity": 100,
}
]
}
Parameter | Description |
---|---|
Name | The name of the configured metric-based task. |
StartTime | The time when the configurations start to take effect. Specify the value in UTC. |
EndTime | The time when the configurations expire. Specify the value in UTC. |
MetricType | The metric that is tracked. Set the parameter to ProvisionedConcurrencyUtilization. |
MetricTarget | The threshold for metric-based auto scaling. |
MinCapacity | The maximum number of provisioned instances for scale-out. |
MaxCapacity | The minimum number of provisioned instances for scale-in. |