Configure auto scaling rules for provisioned instances to put resources into full use - Function Compute

Provisioned instances help reduce request latencies caused by instance cold starts during peak hours. In addition, you can configure an auto scaling policy, such as a scheduled scaling policy or water-level scaling policy, for provisioned instances to improve resource utilization and prevent resource waste.

Limits

The following table shows the limits on the scale-out rate of provisioned instances in different regions.

Region	Upper limit of burst instances	Upper limit of instance growth rate
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)	300	300 per minute
Other regions	100	100 per minute

Note

If you want to increase the upper limit on the speed of instance scale-out, join the DingTalk user group 11721331 for technical support.

Configure provisioned instances

Step 1: Create a provisioned instance policy

You can create a provisioned instance policy by using one of the following methods:

Configure a provisioned instance policy on the Function Details tab of the function details page. This method is used in this topic.
Choose Advanced Features > Auto Scaling and create a provisioned instance policy on the Provisioned Instance Policy tab.

Log on to the Function Compute console. In the left-side navigation pane, click Functions.
In the top navigation bar, select a region. On the Functions page, click the name of the function that you want to manage.
On the Function Details tab, click the Configuration tab.
In the left-side navigation tree, click the Provisioned Instances tab and click Create Provisioned Instance Policy.

In the Create Provisioned Instance Policy panel, configure the parameters and click OK.

Parameter	Description
Version or Alias	Select the version or alias for which you want to create a provisioned instance policy. Note You can create a provisioned instance policy only for the LATEST version.
Provisioned Instances	Specify the number of provisioned instances. Note The minimum number of provisioned instances help quickly respond to function invocation requests, reduce cold starts, and improve service performance for online applications that are sensitive to response latency. Take note that you are charged for these instances even if they do not process any request unless you release them.
Idle Mode	Options: Enable and Disable. Note By default, idle mode is enabled for common instances. You need to manually enable idle mode for GPU functions. If you want to enable idle mode for GPU-accelerated instances, submit a ticket or join the DingTalk group 64970014484 for technical support.
(Optional) Scheduled Scaling: You can select this option to configure a scheduled scaling policy which scales function instances at specified points in time. For more information about the scenario and configuration example, see Scheduled scaling.
Policy Name	Enter a custom policy name.
Provisioned Instances	Specify the number of instances that you want to scale out. Note After you configure this parameter, its value overrides the Provisioned Instances value configured earlier in this section.
Trigger Mode	You can select At Time Points or Custom CRON expression. At Time Points: Specify the Time (UTC), Date (UTC), and Week Day (UTC) parameters as prompted. Custom CRON expression: Specify the Schedule Expression (UTC) parameter. In this example, cron(0 0 4 * * ) is configured to trigger scaling at 12:00 (UTC+8) every day. Note* Because the time setting must be in UTC time, the time is set to 4:00 in this example.
Effective Time (UTC)	Set the time when the scaling configurations start to take effect and the time when the scaling configurations expire.
(Optional) Water-level Scaling: You can select this option to scale function instances every minute based on a metric utilization or concurrency utilization of provisioned instances. For more information about the scenario and configuration example of water-level scaling, see Water-level scaling.
Policy Name	Enter a custom policy name.
Minimum Number of Provisioned Instances	Specify the minimum number of instances that you want to scale.
Maximum Number of Provisioned Instances	Specify the maximum number of instances that you want to scale.
Utilization Type	Note This parameter is displayed only when GPU-accelerated instances are configured. Select a metric based on which instances are scaled.
Concurrency Usage Threshold/Usage Threshold	Configure a usage threshold. If the usage of a metric for an instance or the concurrent usage falls below the configured threshold, a scale-in is triggered. If the usage of a metric for an instance or the concurrent usage reaches or exceeds the configured threshold, a scale-out is triggered.
Effective Time (UTC)	Set the time when the scaling configurations start to take effect and the time when the scaling configurations expire.

After the policy is created, you can view the policy in the policy list of the function.

Step 2: Verify the policy

You can check whether the configured policy takes effect by checking the number of provisioned instances in the monitoring data when specific condition is fulfilled.

On the Function Details tab, click the Monitoring tab.
On the Function Metrics tab, view the data in the Function Provisioned Instances (count) card to check whether the policy takes effect.

Modify or delete a provisioned instance policy

On the Configuration tab of the Function Details tab, click the Provisioned Instances tab in the left-side navigation tree to view created policies. Click Modify or Delete in the Actions column to modify or delete the corresponding policy.

References

For more information about the basic concepts and billing methods of on-demand mode and provisioned mode, see Instance types and usage modes.
For more information about the limits, behaviors, and scaling rules of function instances in on-demand mode and provisioned mode, see Limits and rules of instance scaling.
By default, all functions within an Alibaba Cloud account in the same region share the preceding scaling limits. To limit the number of instances for a function, you can configure the maximum number of concurrent instances. For more information, see Specify the maximum number of concurrent instances. After you configure the maximum number of concurrent instances, Function Compute returns a throttling error if the total number of running instances for the function reaches the specified limit.