Cut Cold Starts with Minimum Instances in Function Compute - Function Compute

When a function receives its first request after a period of inactivity, Function Compute must initialize a new instance — a process known as a cold start that adds latency. Setting the minimum number of instances to a value greater than 0 keeps pre-allocated instances warm and eliminates cold start latency for latency-sensitive workloads. To handle traffic fluctuations automatically, configure elastic policies that scale the minimum number of instances up or down based on a schedule or metric thresholds.

Important

Pre-allocated instances are billed regardless of whether they are processing requests or idle. Active instances are charged at the active elastic instance rate; idle instances are charged at the idle elastic instance rate. For details, see Billing overview.

Elastic policies can only be configured for a function alias or the LATEST version.

Set the minimum number of instances

Log on to the Function Compute console. In the left-side navigation pane, click Functions.
In the top navigation bar, select a region. On the Functions page, click Create Function.
On the Create Function page, go to the Elastic Configuration section and set the Minimum Number Of Instances parameter. Configure the remaining required parameters, then click Create.

Configure elastic policies

Elastic policies automatically adjust the minimum number of instances without manual intervention. Two policy types are supported: Scheduled Scaling and Threshold-based Scaling.

Note

When multiple auto scaling policies are active simultaneously, the system uses the highest Minimum Number Of Instances value across all active policies. For details, see How is the current minimum number of instances calculated?
While any elastic policy is active, the initial Minimum Number Of Instances value is ignored. When no policy is active, the minimum number of instances reverts to that initial value.

To configure an elastic policy:

On the function details page, click the Elastic Configuration tab. In the Elastic Policies section, click Edit in the row of the target policy.
In the Edit Elastic Policy panel, configure a Scheduled Scaling or Threshold-based Scaling policy.

Scheduled scaling

Scheduled scaling suits functions with predictable traffic patterns or known peak periods. At the scheduled times, the system adjusts the minimum number of instances; concurrent invocations beyond that minimum are handled automatically by on-demand elastic instances. For background on how scheduled scaling works, see Scheduled scaling.

The example above uses the Asia/Shanghai (UTC+8) time zone. The policy runs indefinitely, scaling out to 50 instances at 10:00 on weekdays (Monday through Friday) and scaling in to 5 instances at 22:00.

Threshold-based scaling

The system periodically samples metrics and scales the minimum number of instances when a threshold is crossed. For details, see Threshold-based scaling.

The metrics available depend on the function type:

CPU functions	GPU functions

CPU functions: monitor Instance Concurrency Utilization and Memory Utilization
GPU functions: monitor Instance Concurrency Utilization and GPU-related resource utilization metrics

The example above uses the Asia/Shanghai (UTC+8) time zone. The policy is active from 00:00 on July 15, 2025, to 00:00 on July 31, 2025, and tracks Instance Concurrency Utilization. When utilization exceeds 60%, the system scales out up to a maximum of 100 instances. When it falls below 60%, the system scales in to a minimum of 10 instances.

Periodic scaling with a CRON expression

For functions with regular, repeating patterns, use a CRON expression to define the scaling schedule precisely.

The example above uses the Asia/Shanghai (UTC+8) time zone. The minimum number of instances scales out to 10 at 10:00 every Monday and scales in to 1 at 22:00 every Friday.

Modify or delete an elastic policy

Log on to the Function Compute console. In the left-side navigation pane, choose Function Management > Elastic Policies. Find the target policy, then click Edit or Delete in the Actions column.

Important

Deleting an elastic policy for an alias releases all pre-allocated instances for that alias. The function then falls back to on-demand scaling, which may involve a cold start. For CPU-based functions, the average cold start time is typically hundreds of milliseconds, depending on application startup speed. For GPU-based functions, it can be several minutes, depending on model size and loading speed.

What's next

To cap the total number of running instances for a function, configure function quotas. If the number of running instances exceeds the configured limit, Function Compute returns a throttling error.