auto scaling of on-demand instances and provisioned instances - Function Compute (2.0)

Function Compute provides two modes for instance management: on-demand mode and provisioned mode. You can configure auto scaling rules based on the total number of instances in the two modes and the limits on scaling speeds. For instances in provisioned mode, you can configure the limits on scaling speeds at a specific point in time or based on specific metrics.

Configure scaling limits

Configure scaling limits for on-demand instances

Function Compute preferentially uses existing instances to process requests. If the existing instances are fully loaded, Function Compute creates new instances to process requests. As the number of invocations increases, Function Compute continues to create new instances until enough instances are created to handle requests or the upper limit is reached. During instance scale-out, the following limits apply:

By default, the total number of running instances per region is 300.
The scale-out speed of the running instances is limited by the upper limit of burstable instances and the growth rate of the instances. For limits on scale-out for different regions, see the "Limits on scaling speeds of instances in different regions" section of the Configure provisioned instances and auto scaling rules topic.
- Burstable instances: the number of instances that can be immediately created. The default upper limit for burstable instances is 100 or 300 based on the region.
- Instance growth rate: the number of instances that can be added per minute after the upper limit for burstable instances is reached. The default upper limit of growth rate is 100 or 300 based on the region.

When the total number of instances or the scaling-out speed of the instances exceeds the limit, Function Compute returns a throttling error, for which the HTTP status code is 429. The following figure shows how Function Compute performs throttling in a scenario in which the number of invocations rapidly increases. dg_fc_throttling_behaviors

1: Before the upper limit on burstable instances is reached, Function Compute immediately creates instances when the number of requests increases. During this process, a cold start occurs but no throttling error is reported.
2: When the limit on burstable instances is reached, the increase of instances is restricted by the growth rate. Throttling errors are reported for some requests.
3: When the upper limit of instances is reached, some requests are throttled.

By default, the preceding scaling limits take effect for all functions within an Alibaba Cloud account in the same region. To configure a limit on the number of instances for a specific function, see Overview of configuring the maximum number of on-demand instances. After the maximum number of on-demand instances is specified, Function Compute returns a throttling error when the total number of running instances for the function exceeds the specified limit.

Configure scaling limits for provisioned instances

When the number of burstable invocations is large, the creation of a large number of instances is throttled, which results in request failures. The cold starts of instances also increase the latencies of requests. To prevent these issues, you can use provisioned instances in Function Compute. Provisioned instances are the instances that are reserved in advance of invocations. The upper limits on the number of provisioned instances and scaling speed of provisioned instances are independent of those limits on on-demand instances.

By default, the upper limit of provisioned instances per region is 300.
By default, the upper limit for the scaling speed of provisioned instances per minute is 100 or 300. The limit varies based on the region. For more information, see the "Limits on scaling speeds of instances in different regions" section of the Configure provisioned instances and auto scaling rules topic. The following figure shows how Function Compute performs throttling when provisioned instances are configured in the same loading scenario as the preceding figure.
- 1: Before all the provisioned instances are used, the requests are processed immediately. During this process, no cold starts occur and no throttling errors are reported.
- 2: When all the provisioned instances are used, Function Compute creates instances immediately before the upper limit for burstable instances is reached. During this process, a cold start occurs but no throttling errors are reported.

Limits on scaling speeds of instances in different regions

Region	Limits on burstable instances	Limits on instance growth rate
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)	300	300 per minute
Other	100	100 per minute

Note

The scaling speed limits on provisioned instances and on-demand instances in the same region are the same.
If you require a higher scaling speed, join the DingTalk group (ID: 11721331) to contact technical support.
The scaling speed of GPU-accelerated instances is lower than that of elastic instances. We recommend that you use the GPU-accelerated instances in provisioned mode.

Configure auto scaling rules

Creates a scaling rule

Log on to the Function Compute console. In the left-side navigation pane, click Services & Functions.
In the top navigation bar, select a region. On the Services page, click the desired service.
On the Functions page, click the function that you want to modify.
On the Function Details page, click the Scaling Policy tab and click Create Rule.

On the page that appears, configure the following parameters and click Create.

Configure scaling rules for on-demand instances
Set the Minimum Number of Instances parameter to 0 and the Maximum Number of Instances parameter to a value that best suits your business requirements. If you do not set the Maximum Number of Instances parameter, the maximum number of instances is subject to the limits on instances in your Alibaba Cloud account and in the current region.
Note
The Idle Mode, Scheduled Setting Modification, and Metric-based Setting Modification parameters take effect for provisioned instances.

Configure scaling rules for provisioned instances

Parameter	Description
Basic Settings
Version or Alias	Select the version or alias for provisioned instances that you want to create. Note You can create the LATEST version only for provisioned instances.
Minimum Number of Instances	Enter the number of provisioned instances that you want to create. The minimum number of instances equals the number of provisioned instances that you want to create. Note You can set the minimum number of function instances to reduce cold starts and the time to respond to function invocation requests. This helps improve the service performance for online applications that are sensitive to response latency.
Idle Mode	Enable or disable the idle mode based on your business requirements. By default, the idle mode is disabled. Take note of the following items: When this mode is enabled, provisioned instances are allocated with vCPU resources only when the instances are processing requests. The vCPU resources of the instances are frozen when the instances are not processing any request. If the idle mode feature is enabled, Function Compute preferentially distributes requests to the same instance based on the per-instance concurrency setting of the function until the instance is fully loaded. For example, if you set the per-instance concurrency of a function to 50 and you have 10 idle provisioned instances, when 40 requests are received, Function Compute sends all 40 requests to one instance, and only one instance enters the active state. When this mode is disabled, provisioned instances are allocated with vCPUs regardless of whether the instances are processing requests or not.
Maximum Number of Instances	Enter the maximum number of instances. The maximum number of instances equals the number of provisioned instances plus the maximum number of on-demand instances. Note You can set the maximum number of function instances to prevent a single function from using a large number of instances because of excessive invocations, protect backend resources, and prevent unexpected costs. If you do not configure this parameter, the maximum number of instances is subject to your Alibaba Cloud account and the upper limit in the current region.
(Optional) Scheduled Setting Modification: You can create scheduled scaling rules to flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a scheduled point in time. This way, the number of provisioned instances can meet the concurrency requirement of your business. For more information about configuration principles and examples, see Scheduled Setting Modification.
Policy Name	Enter a policy name.
Minimum Number of Instances	Enter the minimum number of provisioned instances.
Schedule Expression (UTC)	Enter the expression of the schedule. Example: cron(0 0 20 * * *). For more information, see Parameters.
Effective Time (UTC)	Specify the time when the configurations of scheduled scaling start to take effect and expire.
(Optional) Metric-based Setting Modification: Provisioned instances are scaled in or out every minute based on the metrics of instances or concurrency utilization of provisioned instances. For more information about configuration principles and examples, see Table 1. Field description.
Policy Name	Enter a policy name.
Minimum Range of Instances	Specify the value range for the minimum number of provisioned instances.
Utilization Type	This parameter is displayed only when GPU-accelerated instances are configured. Select the types of metrics based on which the auto scaling policy is configured. For more information about the auto scaling policies of GPU-accelerated instances, see Create an auto scaling policy for provisioned GPU-accelerated instances.
Concurrency Usage Threshold	Configure the threshold that triggers scaling. Scale-in is performed if the values of metrics are or the concurrency utilization of provisioned instances is lower than the specified values. Scale-out is performed if the values of metrics are or the concurrency utilization of provisioned instances is higher than the specified values.
Effective Time (UTC)	Specify the time when the configurations of metric-based auto scaling start to take effect and expire.

After the auto scaling rule is created, you can go to the Auto Scaling tab of the function and view details of the rule.

Modify or delete an auto scaling rule

On the Auto Scaling tab, you can view the rule that you created. Find the rule that you want to manage, click Modify or Delete in the Actions column to modify or delete the rule.

Note

To delete provisioned instances, set the Minimum Number of Instances parameter to 0.

Scaling policies of provisioned instances

In provisioned mode, Function Compute creates a specified number of instances, but the instances may not be fully used. You can use Scheduled Setting Modification and Metric-based Setting Modification to make better use of provisioned instances.

Scheduled Setting Modification

Definition: Scheduled scaling helps you flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a specified point in time so that the number of instances can meet the concurrency requirement of your business.
Applicable scenarios: Functions work based on periodic rules or predictable traffic peaks. If the provisioned instances are insufficient to process all the function invocation requests, the remaining requests are processed by on-demand instances. For more information, see the "On-demand mode" section of the Instance types and instance modes topic.
Example: The following figure shows two scheduled actions for instance scaling. The first scheduled action scales out the provisioned instances before the traffic peak, and the second scheduled action scales in the provisioned instances after the traffic peak.

The following sample code shows the configuration details. In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out. The configurations take effect from 10:00:00 on November 1, 2022 to 10:00:00 on November 30, 2022. The number of provisioned instances is adjusted to 50 at 20:00 and 10 at 22:00 every day. For more information about how to use the PutProvisionConfig operation to configure scheduled scaling, see the following sample code to configure request parameters.

{
  "ServiceName": "service_1",
  "FunctionName": "function_1",
  "Qualifier": "alias_1",
  "ScheduledActions": [
    {
      "Name": "action_1",
      "StartTime": "2022-11-01T10:00:00Z",
      "EndTime": "2022-11-30T10:00:00Z",
      "TargetValue": 50,
      "ScheduleExpression": "cron(0 0 20 * * *)"
    },
    {
      "Name": "action_2",
      "StartTime": "2022-11-01T10:00:00Z",
      "EndTime": "2022-11-30T10:00:00Z",
      "TargetValue": 10,
      "ScheduleExpression": "cron(0 0 22 * * *)"
    }
  ]
}

Parameter description

Parameter	Description
Name	The name of the scheduled scaling task.
StartTime	The time when the configurations start to take effect. Specify the value in UTC.
EndTime	The time when the configurations expire. Specify the value in UTC.
TargetValue	The number of instances that you want to scale out or in.
ScheduleExpression	The expression that specifies when to run the scheduled scaling task. The following formats are supported: At expressions - "at(yyyy-mm-ddThh:mm:ss)": specifies to run the scheduled task only once. Set the value in UTC. For example, if you want the scheduling to start at 20:00 on April 1 (UTC+8), use `at(2021-04-01T12:00:00)`, which specifies that the scheduling starts at 12:00 on April 1 (UTC). Cron expressions - "cron(0 0 4 * * )": specifies to run the scheduled task for multiple times. Specify the value in the standard crontab format in UTC. For example, if you want the scheduling to start at 20:00 (UTC+8), use `cron(0 0 12 * *)`, which specifies that the scheduling starts at 12:00 (UTC) every day.

The following table describes the fields of the cron expression in the format of Seconds Minutes Hours Day-of-month Month Day-of-week.

Table 1. Field description
Field	Value range	Allowed special character
Seconds	0 to 59	None
Minutes	0 to 59	, - * /
Hours	0 to 23	, - * /
Day-of-month	1 to 31	, - * ? /
Month	1 to 12 or JAN to DEC	, - * /
Day-of-week	1 to 7 or MON to SUN	, - * ?

Table 2. Special characters
Character	Description	Example
*	Specifies any or each.	In the `Minutes` field, 0 specifies that the task is run at the 0th second of every minute.
,	Specifies a value list.	In the `Day-of-week` field, MON, WED, FRI specifies every Monday, Wednesday, and Friday.
-	Specifies a range.	In the `Hours` field, 10-12 specifies a time range from 10:00 to 12:00 in UTC.
?	Specifies an uncertain value.	This special character is used with other specified values. For example, if you specify a specific date, but you do not require the specified date to be a specific day of the week, you can use this character in the `Day-of-week` field.
/	Specifies increments. n/m specifies an increment of m starting from the position of n.	In the `minute` field, 3/5 indicates that the operation is performed every 5 minutes starting from the third minute within an hour.

Metric-based Setting Modification

Definition: Metric-based auto-scaling tracks metrics to dynamically scale provisioned instances.
Scenario: Function Compute collects the concurrency utilization of provisioned instances or resource utilization metrics of instances on a regular basis, and uses these metrics together with the threshold values that you specify for scaling operations to control the scaling of provisioned instances. This way, the number of provisioned instances can be adjusted to meet your business requirements.
Principle: Provisioned instances are scaled every minute based on the metric values.
- If the metric values exceed the thresholds that you configure, the system rapidly performs scale-out operations to adjust the number of provisioned instances to the specified value.
- If the metric values are lower than the thresholds that you configure, the system slightly performs scale-in operations to adjust the number of provisioned instances to the specified value.
If the maximum and minimum numbers of provisioned instances are configured, the system scales the provisioned instances between the maximum and minimum numbers. If the number of instances reaches the maximum or minimum number, the scaling stops.
Sample configuration: The following figure shows an example of the concurrency utilization of provisioned instances.
- When the traffic increases and the threshold is reached, provisioned instances start to be scaled out until the number of provisioned instances reaches the upper limit. Requests that cannot be processed by the provisioned instances are sent to on-demand instances.
- When the traffic decreases and the threshold is reached, provisioned instances start to be scaled in.

Only the statistics on provisioned instances are collected to calculate the concurrency utilization of provisioned instances. The statistics on on-demand instances are not included.

The concurrency utilization of provisioned instances is calculated based on the following formula: The number of concurrent requests to which provisioned instances are responding/The maximum number of concurrent requests to which all provisioned instances can respond. The concurrency ranges from 0 to 1.

The maximum number of concurrent requests to which provisioned instances can respond is calculated based on different instance concurrences. For more information, see Configure instance concurrency.

Each instance processes a single request at a time: Maximum concurrency = Number of instances.
Each instance concurrently processes multiple requests: Maximum concurrency = Number of instances × Number of requests concurrently processed by one instance.

Target values for scaling:

The values are determined by the current metric value, metric target, number of provisioned instances, and scale-in factor.
Calculation principle: The system scales in provisioned instances based on the scale-in factor. The factor value ranges from 0 (excluded) to 1. The scale-in factor is a system parameter that is used to slow down the scale-in speed. You do not need to set the scale-in factor. The target values for scaling operations are the smallest integers that are greater than or equal to the following calculation results:
- Scale-out target value = Current provisioned instances × (Current metric value/Metric target)
- Scale-in target value = Current provisioned instances × Scale-in factor × (1 - Current metric value/Metric target)
Example: If the current metric value is 80%, the metric target is 40%, and the number of provisioned instances is 100, the target value is calculated based on the following formula: 100 × (80%/40%) = 200. The number of provisioned instances is increased to 200 to ensure that the metric target remains near the 40%.

The following sample code shows the configuration details. In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out based on the ProvisionedConcurrencyUtilization metric. The configurations are set to be effective from 10:00:00 on November 1, 2022 to 10:00:00 on November 30, 2022. When the concurrency utilization exceeds 60%, provisioned instances are scaled out, and the number of provisioned instances can be up to 100. When the concurrency utilization is lower than 60%, provisioned instances are scaled in, and the number of provisioned instances can be reduced to 10. For more information about how to use the PutProvisionConfig operation to configure scheduled scaling, see the following sample code to configure request parameters.

{
  "ServiceName": "service_1",
  "FunctionName": "function_1",
  "Qualifier": "alias_1",
  "TargetTrackingPolicies": [
    {
      "Name": "action_1",
      "StartTime": "2022-11-01T10:00:00Z",
      "EndTime": "2022-11-30T10:00:00Z",
      "MetricType": "ProvisionedConcurrencyUtilization",
      "MetricTarget": 0.6,
      "MinCapacity": 10,
      "MaxCapacity": 100,
    }
  ]
}

Parameter description

Parameter	Description
Name	The name of the configured metric-based scaling.
StartTime	The time when the configurations start to take effect. Specify the value in UTC.
EndTime	The time when the configurations expire. Specify the value in UTC.
MetricType	The metric that is tracked. Set the parameter to ProvisionedConcurrencyUtilization.
MetricTarget	The threshold value for metric-based scaling.
MinCapacity	The maximum number of provisioned instances for scale-out.
MaxCapacity	The minimum number of provisioned instances for scale-in.

References

Basic concepts and billing methods of the on-demand and provisioned instances: Instance types and usage modes.
After you configure auto scaling for provisioned instances, if you want to view the number of provisioned instances that are executed by a function, you can view the Number of Provisioned Instances metric. For more information, see Function-specific metrics.