Function Compute provides two modes for instance management: on-demand mode and provisioned mode. This topic describes the principles, billing methods, the instance scaling limits of instances in the preceding two modes, and the idle mode of provisioned instances. This topic also describes how to configure provisioned instances and auto scaling rules for provisioned instances in the Function Compute console.
On-demand mode
Overview
In on-demand mode, instances are allocated and released by Function Compute. Function Compute automatically scales instances based on the number of function invocations. Function Compute creates instances when the number of function invocations increases, and destroys excess instances when the number of function invocations decreases. During the entire process, instance creation is automatically triggered by requests. On-demand instances are destroyed when no requests are sent for processing within a period of time (usually 3 to 5 minutes). The first time you invoke an on-demand instance, you must wait for the cold start of the instance to complete.
By default, a maximum of 300 on-demand instances can be created for an Alibaba Cloud account in each region. If you want to increase the upper limit, join the DingTalk group (ID: 11721331) to contact technical support.
Billing
In on-demand mode, you are charged only when your function is invoked. If no requests are sent to your function, no instances are allocated and no fees incurred. For more information about the pricing and billing of Function Compute, see Billing overview.
Provisioned mode
Overview
On-demand instances are automatically created when requests are sent. The first time you invoke on-demand instances, you must wait for the cold start of the instance to complete. If you want to eliminate the impacts of cold starts, use provisioned instances.
In provisioned mode, you can manage the allocation and release of function instances. The provisioned instances are retained unless you release them. Provisioned instances take precedence over on-demand instances. If the number of provisioned instances is not enough to process all requests, Function Compute allocates on-demand instances to process the remaining requests.
Billing
The billing of provisioned instances starts when the provisioned instances are created and ends when the provisioned instances are released. You are charged for provisioned instances regardless of whether they are used to process requests. For more information about the pricing and billing of Function Compute, see Billing overview.
Idle mode
Elastic instances
By default, the idle mode is not enabled. If the idle mode is not enabled, provisioned instances in Function Compute are always allocated with CPU resources even when no requests are being processed. This ensures that the instances can run background tasks even when no requests are made. After the idle mode is enabled, Function Compute freezes the vCPUs of provisioned instances when the provisioned instances do not process any request. This way, the instances enter the idle state. You are not charged for vCPU resources when the provisioned instances are in the idle state, which helps you save costs. For more information, see Billing overview.
You can choose whether to enable the idle mode based on your business requirements.
Costs
If you want to use provisioned instances to eliminate cold starts and hope to save costs, we recommend that you enable the idle mode. The idle mode allows you to pay fewer fees for provisioned instances, and requests can be responded to without cold starts.
Background tasks
If your function needs to run background tasks, we recommend that you do not enable the idle mode. Examples:
Some application frameworks rely on the built-in scheduler or background features. Some dependent middleware needs to regularly report heartbeats.
Some asynchronous operations are performed by using Goroutine lightweight threads that use Go, the async functions that use Node.js, or the asynchronous threads that use Java.
GPU-accelerated instance (in public preview)
By default, the idle mode for GPU-accelerated instances is disabled. When the idle mode is enabled, Function Compute freezes the vCPUs and GPUs of a provisioned instance when the instance does not process any requests. Then, the vCPUs and GPUs become idle. Idle vCPUs are free of charge, and you are charged for idle GPUs based on the unit price of memory, which greatly reduces costs. For more information, see Billing overview.
The idle mode of GPU-accelerated instances is still in public preview. To use the feature, you must use a function that is configured with a full GPU memory. That is, the GPU memory of the instance must be 16 GB (NVIDIA Tesla T4) or 24 GB (NVIDIA Ampere A10). To apply for trial use, click Apply for idle GPU-accelerated instance for public preview.
Limits on instance scaling
Scaling limits for on-demand instances
Function Compute preferentially uses existing instances to process requests. If the existing instances are fully loaded, Function Compute creates new instances to process requests. As the number of invocations increases, Function Compute continues to create new instances until enough instances are created to handle requests or the upper limit is reached. During instance scale-out, the following limits apply:
Default upper limit for running instances per region: 300.
The scale-out speed of running instances is limited by the upper limit of burstable instances and the growth rate of the instances. For limits on scale-out for different regions, see the "Limits on scaling speeds of instances in different regions" section of the Configure provisioned instances and auto scaling rules topic.
Burstable instances: the number of instances that can be immediately created. The default upper limit for burstable instances is 100 or 300 based on the region.
Instance growth rate: the number of instances that can be added per minute after the upper limit for burstable instances is reached. The default upper limit of growth rate is 100 or 300 based on the region.
When the total number of instances or the scaling speed of the instances exceeds the limit, Function Compute returns a throttling error, for which the HTTP status code
is 429
. The following figure shows how Function Compute performs throttling in a scenario where the number of invocations rapidly increases.
1: Before the upper limit on burstable instances is reached, Function Compute immediately creates instances when the number of requests increases. During this process, a cold start occurs but no throttling error is reported.
2: When the limit on burstable instances is reached, the increase of instances is restricted by the growth rate. Throttling errors are reported for some requests.
3: When the upper limit of instances is reached, some requests are throttled.
By default, the preceding scaling limits take effect for all functions within an Alibaba Cloud account in the same region. To configure a limit on the number of instances for a specific function, see Overview of configuring the maximum number of on-demand instances. After you configured the maximum number of on-demand instances, Function Compute returns a throttling error when the total number of running instances for the function exceeds the configured limit.
Scaling limits for provisioned instances
When the number of burstable invocations is large, the creation of a large number of instances is throttled, which results in request failures. The cold starts of instances also increase the latencies of requests. To prevent these issues, you can use provisioned instances in Function Compute. Provisioned instances are those reserved in advance of invocations. The upper limits on the number of provisioned instances and the limits on the scaling speed of provisioned instances are independent.
Default upper limit for provisioned instances per region: 300.
Default upper limit for the scaling speed of provisioned instances per minute: 100 or 300. The limit varies based on the region. For more information, see the "Limits on scaling speeds of instances in different regions" section of the Configure provisioned instances and auto scaling rules topic. The following figure shows how Function Compute performs throttling when provisioned instances are configured in the same loading scenario as the preceding figure.
1: Before all the provisioned instances are used, the requests are processed immediately. During this process, no cold starts occur and no throttling errors are reported.
2: When all the provisioned instances are used, Function Compute creates instances immediately before the upper limit for burstable instances is reached. During this process, a cold start occurs but no throttling errors are reported.
Limits on scaling speeds of instances in different regions
Region | Limits on burstable instances | Limits on instance growth rate |
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen) | 300 | 300 instances per minute |
Other | 100 | 100 instances per minute |
The limits on scaling speeds of provisioned instances and on-demand instances in the same region are the same.
If you want a higher scaling speed, join the DingTalk group (group number: 11721331) to contact technical support.
The scaling speed of GPU-accelerated instances is lower than that of elastic instances. We recommend that you use instances together with the provisioned mode.
Configure provisioned instances
- Log on to the Function Compute console. In the left-side navigation pane, click Services & Functions.
- In the top navigation bar, select a region. On the Services page, click the desired service.
- On the Functions page, click the function that you want to modify.
On the function details page of a desired function, click the Auto Scaling tab and click Create Rule.
On the page that appears, configure parameters and click Create. The following table describes the parameters.
Parameter
Description
Basic Settings
Version or Alias
Select the version or alias for provisioned instances that you want to create.
NoteProvisioned instances only of the LATEST version can be created.
Minimum Number of Instances
Enter the number of provisioned instances that you want to create. The value of this parameter equals the number of provisioned instances to be created.
NoteYou can set the minimum number of function instances to reduce cold starts and the time required to respond to function invocation requests. This helps improve service performance for online applications that are sensitive to response latency.
Idle Mode
Enable or disable the idle mode based on your business requirements. By default, the idle mode is disabled. Take note of the following items:
When this feature is enabled, provisioned instances are allocated with vCPU resources only when the instances are processing requests. The vCPU resources of the instances are frozen when the instances are not processing any requests.
When this feature is disabled, provisioned instances are allocated with vCPUs regardless of whether the instances are processing requests or not.
Maximum Number of Instances
Enter the maximum number of instances. The maximum number of instances equals the number of provisioned instances plus the maximum number of on-demand instances.
NoteYou can set the maximum number of function instances to prevent a single function from using a large number of instances because of excessive invocations, protect backend resources, and prevent unexpected costs.
If you do not configure this parameter, the maximum number of instances is determined by the resource limits for the current region and Alibaba Cloud account.
(Optional) Scheduled Setting Modification: You can create scheduled scaling rules to flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a scheduled point in time. This way, the number of provisioned instances can meet the concurrency requirement of your business.
Policy Name
Enter a policy name.
Minimum Number of Instances
Enter the minimum number of provisioned instances.
Schedule Expression (UTC)
Enter the expression of the schedule. Example: cron(0 0 20 * * *). For more information, see Parameters.
Effective Time (UTC)
Specify the time when the configurations of scheduled scaling start to take effect and expire.
(Optional) Metric-based Setting Modification: Provisioned instances are scaled in or out every minute based on the metrics of instances or concurrency utilization of provisioned instances.
Policy Name
Enter a policy name.
Minimum Range of Instances
Specify the value range for the minimum number of provisioned instances.
Utilization Type
This parameter is displayed only when GPU-accelerated instances are configured. Select the types of metrics based on which the auto scaling policy is configured. For more information about the auto scaling policies of GPU-accelerated instances, see Create an auto scaling policy for provisioned GPU-accelerated instances.
Concurrency Usage Threshold
Configure the scaling range. Scale-in is performed if the values of metrics or the concurrency utilization of provisioned instances is lower than the specified values. Scale-out is performed if the values of metrics or the concurrency utilization of provisioned instances is higher than the specified values.
Effective Time (UTC)
Specify the time when the configurations of metric-based auto scaling start to take effect and expire.
After the auto scaling rule is created, you can go to the Auto Scaling tab of the service and view details of the rule.
You can modify or delete the number of provisioned instances as prompted.
To delete provisioned instances, set the Minimum Number of Instances parameter to 0.
Create an auto scaling rule for provisioned instances
In provisioned mode, Function Compute creates a specified number of instances, but the instances may not be fully used. You can use the Scheduled Setting Modification and Metric-based Setting Modification parameters to make better use of provisioned instances.
Scheduled Setting Modification
Definition: Scheduled scaling helps you flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a specified point in time so that the number of instances can meet the concurrency for your business.
Suitable scenarios: Functions work based on periodic rules or predictable traffic peaks. If the provisioned instances are insufficient to process all the function invocation requests, the remaining requests are processed by on-demand instances. For more information, see the "On-demand mode" section of the Instance types and instance modes topic.
Example: The following figure shows two scheduled actions for instance scaling. The first scheduled action scales out the provisioned instances before the traffic peak, and the second scheduled action scales in the provisioned instances after the traffic peak.
The following information describes the configuration details. In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out instances. The configurations take effect from 10:00:00 on November 1, 2022 to 10:00:00 on November 30, 2022. The number of provisioned instances is adjusted to 50 at 20:00 and to 10 at 22:00 every day.
{
"ServiceName": "service_1",
"FunctionName": "function_1",
"Qualifier": "alias_1",
"ScheduledActions": [
{
"Name": "action_1",
"StartTime": "2022-11-01T10:00:00Z",
"EndTime": "2022-11-30T10:00:00Z",
"TargetValue": 50,
"ScheduleExpression": "cron(0 0 20 * * *)"
},
{
"Name": "action_2",
"StartTime": "2022-11-01T10:00:00Z",
"EndTime": "2022-11-30T10:00:00Z",
"TargetValue": 10,
"ScheduleExpression": "cron(0 0 22 * * *)"
}
]
}
The following table describes the parameters in the preceding sample code.
Parameter | Description |
Name | The name of the scheduled scaling task. |
StartTime | The time when the configurations start to take effect. Specify the value in UTC. |
EndTime | The time when the configurations expire. Specify the value in UTC. |
TargetValue | The number of instances for scale-out or scale-in. |
ScheduleExpression | The expression that specifies when to run the scheduled scaling task. The following formats are supported:
|
The following table describes the fields of the cron expression in the Seconds Minutes Hours Day-of-month Month Day-of-week format.
Field | Value range | Allowed special character |
Seconds | 0–59 | N/A |
Minutes | 0–59 | , - * / |
Hours | 0–23 | , - * / |
Day-of-month | 1–31 | , - * ? / |
Month | 1–12 or JAN–DEC | , - * / |
Day-of-week | 1–7 or MON–SUN | , - * ? |
Character | Definition | Example |
* | Specifies any or each. | In the |
, | Specifies a value list. | In the |
- | Specifies a range. | In the |
? | Specifies an uncertain value. | This special character is used together with other specified values. For example, if you specify a specific date, but you do not require the specified date to be a specific day of the week, you can use this character in the |
/ | Specifies increments. n/m specifies an increment of m starting from the position of n. | In the |
Metric-based Setting Modification
Definition: Metric-based auto-scaling tracks metrics to dynamically scale provisioned instances.
Scenario: Function Compute collects the concurrency utilization of provisioned instances or resource utilization metrics of instances on a regular basis, and uses this metrics together with the threshold values that you specify for scaling operations, to control the scaling of provisioned instances. This way, the number of provisioned instances can be adjusted to meet your business requirements.
Principle: Provisioned instances are scaled every minute based on the metric value.
If the metric value exceeds the threshold that you configure, the system rapidly performs scale-out operations to adjust the number of provisioned instances to the target value.
If the metric value is lower than the threshold that you configure, the system slightly performs scale-in operations to adjust the number of provisioned instances to the specified value.
If the maximum and minimum numbers of provisioned instances are configured, the system scales the provisioned instances between the maximum and minimum numbers. If the number of instances reaches the maximum or minimum number, the scaling stops.
Example: The following figure shows an example of metric-based scaling for provisioned instances.
When the traffic increases and the threshold is triggered, provisioned instances start to be scaled out until the number of provisioned instances reaches the upper limit. Requests that cannot be processed by the provisioned instances are sent to on-demand instances.
When the traffic decreases and the threshold is triggered, provisioned instances start to be scaled in.
Only the statistics on provisioned instances are collected to calculate the concurrency utilization of provisioned instances. The statistics on on-demand instances are not included.
The metric is calculated based on the following formula: The number of concurrent requests to which provisioned instances are responding/The maximum number of concurrent requests to which all provisioned instances can respond. The metric value ranges from 0 to 1.
The maximum number of concurrent requests to which provisioned instances can respond is calculated based on different instance concurrences. For more information, see Configure instance concurrency.
Each instance processes a single request at a time: Maximum concurrency = Number of instances.
Each instance concurrently processes multiple requests: Maximum concurrency = Number of instances × Number of requests concurrently processed by one instance.
Destination values for scaling:
The values are determined by the current metric value, metric target, number of provisioned instances, and scale-in factor.
Calculation principle: The system scales in provisioned instances based on the scale-in factor. The factor value ranges from 0 (excluded) to 1. The scale-in factor is a system parameter that is used to slow down the scale-in speed. You do not need to set the scale-in factor. The target values for scaling operations are the smallest integers that are greater than or equal to the following calculation results:
Scale-out target value = Current provisioned instances × (Current metric value/Metric target)
Scale-in target value = Current provisioned instances × Scale-in factor x (1 – Current metric value/Metric target)
Example: If the current metric value is 80%, the metric target is 40%, and the number of provisioned instances is 100, the target value is calculated based on the following formula: 100 × (80%/40%) = 200. The number of provisioned instances is increased to 200 to ensure that the metric target remains near the 40%.
The following information describes the configuration details. In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out based on the ProvisionedConcurrencyUtilization metric. The configurations start to take effect from 10:00:00 on November 1, 2022 to 10:00:00 on November 30, 2022. When the concurrency utilization exceeds 60%, provisioned instances are scaled out, and the number of provisioned instances can be up to 100. When the concurrency utilization is lower than 60%, provisioned instances are scaled in, and the number of provisioned instances can be reduced to 10.
{
"ServiceName": "service_1",
"FunctionName": "function_1",
"Qualifier": "alias_1",
"TargetTrackingPolicies": [
{
"Name": "action_1",
"StartTime": "2022-11-01T10:00:00Z",
"EndTime": "2022-11-30T10:00:00Z",
"MetricType": "ProvisionedConcurrencyUtilization",
"MetricTarget": 0.6,
"MinCapacity": 10,
"MaxCapacity": 100,
}
]
}
The following table describes the parameters in the preceding sample code.
Parameter | Description |
Name | The name of the configured metric-based task. |
StartTime | The time when the configurations start to take effect. Specify the value in UTC. |
EndTime | The time when the configurations expire. Specify the value in UTC. |
MetricType | The metric that is tracked. Set the parameter to ProvisionedConcurrencyUtilization. |
MetricTarget | The threshold value for metric-based auto scaling. |
MinCapacity | The maximum number of provisioned instances for scale-out. |
MaxCapacity | The minimum number of provisioned instances for scale-in. |