In Function Compute, you can use instances in the on-demand mode and provisioned mode. This topic describes the principles, billing methods, the idle billing feature, and the instance scaling limits of the two modes. This topic also describes how to configure provisioned instances and the auto scaling rules for provisioned instances in the Function Compute console.

On-demand mode

Introduction

In the on-demand mode, instances are allocated and released by Function Compute. Function Compute automatically scales instances based on the number of function invocations. Function Compute creates instances when the number of function invocations increases, and destroys instances when the number of requests decreases. During the entire process, instance creation is automatically triggered by requests. On-demand instances are destroyed when no requests are sent for processing for a period of time (usually 3 to 5 minutes). When you invoke an on-demand instance for the first time, you must wait for the cold start of the instance to complete.

By default, a maximum of 300 on-demand instances can be created for an Alibaba Cloud account in each region. If you want to increase the upper limit, join DingTalk group 11721331 to contact the technical support.

Billing method

You are charged only when your function is invoked. If no requests are sent to your function, no instances are allocated and you are not charged for any fees. For more information about the pricing and billing of Function Compute, see Billing overview.

Provisioned mode

Introduction

On-demand instances are automatically created when requests are sent. When you invoke on-demand instances for the first time, you must wait for the cold start of the instance to complete. If you want to eliminate the impact of cold starts, you can use provisioned instances.

In provisioned mode, you can manage the allocation and release of function instances. The provisioned instances are retained unless you release them. Provisioned instances takes precedence over on-demand instances. If the number of provisioned instances is not enough to process all requests, Function Compute allocates on-demand instances to process the remaining requests.

Billing method

You are charged for provisioned instances from when you create them to when you release them, regardless of whether they have processed any requests. For more information about the pricing and billing of Function Compute, see Billing overview. On-Demand Resources

Idle billing

By default, the idle billing feature is disabled. In this case, the provisioned instances in Function Compute are always allocated with CPU resources even when no requests are being processed. This ensures that the instances can run background tasks when no requests are made. After the idle billing feature is enabled, Function Compute freezes the CPU resources of the provisioned instances when the provisioned instances are not processing any request. This way, the instances are in the idle state and you are charged based on the idle billing unit prices. The unit prices for idle instance resources are 10% of the unit prices of active instance resources. For more information, see Billing overview.

You can enable the idle billing feature based on your business requirements.

  • Costs

    If you want to use provisioned instances to eliminate cold stars and save costs, we recommend that you enable the idle billing feature. The idle billing feature allows you to pay fewer fees for provisioned instances when they are in the idle state, and requests can be responded without cold starts.

  • Background tasks
    If your function needs to run background tasks, we recommend that you disable the idle billing feature. Example scenarios:
    • Some application frameworks rely on the built-in scheduler or background features. Some dependent middleware needs to regularly report heartbeats.
    • Sone asynchronous operations are performed by using Goroutine lightweight threads that use Go, the async functions that use Node.js, or the asynchronous threads that use the Java language.

Instance scaling limits

Scaling limits for on-demand instances

Function Compute preferentially uses existing instances to process requests. If the existing instances are fully loaded, Function Compute creates new instances to process requests. As the number of invocations increases, Function Compute continues to create new instances until enough instances are created to handle the requests or the upper limit is reached. During instance scale-out, the following limits apply:
  • Default upper limit for running instances per region: 300.
  • The scale-out speed of the running instances is limited by the upper limit of burst instances and the growth rate of the instances. For limits for different regions, see Limits on scaling speeds of instances in different regions.
    • Burst instances: the number of instances that can be immediately created. The default upper limit for burst instances is 100 or 300.
    • Instance growth rate: the number of instances that can be added per minute after the upper limit for burst instances is reached. The default upper limit of growth rate is 100 or 300.
When the total number of instances or the scaling speed of the instances exceeds the limit, Function Compute returns a throttling error, for which the HTTP status code is 429. The following figure shows how Function Compute performs throttling in a scenario where the number of invocations increases rapidly. dg_fc_throttling_behaviors
  • ①: Before the upper limit for burst instances is reached, Function Compute creates instances immediately. During this process, a cold start occurs but no throttling error is reported.
  • ②: When the limit for burst instances is reached, the increase of instances is restricted by the growth rate. Throttling errors are reported for some requests.
  • ③: When the upper limit of instances is reached, some requests are throttled.

By default, all functions within an Alibaba Cloud account in the same region share the preceding scaling limits. To configure the limit on the number of instances for a specific function, see Overview of configuring the maximum number of on-demand instances. After the configuration, Function Compute returns a throttling error when the total number of running instances for the function exceeds the configured limit.

Scaling limits for provisioned instances

When the number of burst invocations is large, the creation of a large number of instances is throttled, which results in request failures. The cold starts of instances also increase the latencies of requests. To avoid these issues, you can use provisioned instances in Function Compute. Provisioned instances are those reserved in advance of invocations. The upper limits on the number of provisioned instances and the scaling speed of provisioned instances are independent of those limits on on-demand instances.
  • Default upper limit for provisioned instances per region: 300.
  • Default upper limit for the scaling speed of provisioned instances per minute: 100 or 300. The limit varies based on the region. For more information, see Limits on scaling speeds of instances in different regions. The following figure shows how Function Compute performs throttling when provisioned instances are configured in the same loading scenario as the preceding figure. dg_fc_ throttling_behaviors_with_provisioned_instances
    • ①: Before the provisioned instances are fully used, the requests are processed immediately. During this process, no cold start occurs and no throttling error is reported.
    • ②: When the provisioned instances are fully used, Function Compute creates instances immediately before the upper limit for burst instances is reached. During this process, a cold start occurs but no throttling error is reported.

Limits on scaling speeds of instances in different regions

RegionNumber of burst instancesInstance growth rate
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)300300 per minute
Others100100 per minute
Note
  • In the same region, the limits on scaling speeds of provisioned instances and on-demand instances are the same.
  • If you want to increase the scaling speed, join the DingTalk group 11721331 to contact the technical support.
  • The scaling speed of GPU-accelerated instances is lower than that of elastic instances. We recommend that you use performance instances together with the provisioned mode.

Configure provisioned instances

  1. Log on to the Function Compute console. In the left-side navigation pane, click Services & Functions.
  2. In the top navigation bar, select a region. On the Services page, click the desired service.
  3. On the Functions page, click the function that you want to modify.
  4. On the Function Details page, click the Auto Scaling tab and click Create Rule.
  5. On the page that appears, configure the following parameters and click Create.
    ParameterDescription
    Basic Configurations
    Version or AliasSelect the version or alias for which you want to create provisioned instances.
    Note You can create provisioned instances only for the LATEST version.
    Minimum Number of InstancesEnter the number of provisioned instances that you want to create. The minimum number of instances equals the number of provisioned instances to be created.
    Note You can set the minimum number of function instances to reduce cold starts and the time needed to respond to function invocation requests. This helps improve the service performance for online applications that are sensitive to response latency.
    Idle BillingEnable or disable the idle billing feature. By default, the idle billing feature is disabled. Take note of the following items:
    • When this feature is enabled, provisioned instances are allocated with CPU resources only when they are processing requests. The CPU resources of the instances are frozen when the instances are not processing any request.
    • When this feature is disabled, provisioned instances are allocated with CPU resources whether they are processing requests or not.
    Maximum Number of InstancesEnter the maximum number of instances. The maximum number of instances equals the number of provisioned instances plus the maximum number of on-demand instances that can be allocated.
    Note
    • You can set the maximum number of function instances to prevent a single function from using a large number of instances because of excessive invocations protect backend resources, and prevent unexpected charges.
    • If you do not configure this parameter, the maximum number of instances is determined by the resource limits for the current region and Alibaba Cloud account.
    (Optional) Scheduled Setting Modification: You can create scheduled scaling rules to flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a scheduled time. This way, the number of provisioned instances can meet the concurrency requirement of your business.
    Policy NameEnter a custom policy name.
    Minimum Number of InstancesEnter the minimum number of provisioned instances.
    Schedule Expression (UTC)Enter the expression of the schedule. Example: cron(0 0 20 * * *). For more information, see Parameters.
    Effective Time (UTC)Set the time when the configurations of scheduled scaling start to take effect and expire.
    (Optional) Metric-based Setting Modification: Provisioned instances are scaled in or out every minute based on the metrics of instances or concurrency utilization of provisioned instances.
    Policy NameEnter a custom policy name.
    Minimum Range of InstancesSpecify the value range for the minimum number of provisioned instances.
    Utilization TypeThis parameter is displayed only when GPU-accelerated instances are configured. Select the types of metrics based on which the auto scaling policy is configured. For more information about the auto scaling policies of GPU-accelerated instances, see Create an auto scaling policy for provisioned GPU-accelerated instances.
    Usage ThresholdConfigure the scaling range. Scale-in is performed if the values of metrics or the concurrency utilization of provisioned instances is lower than the specified values. Scale-out is performed if the values of metrics or the concurrency utilization of provisioned instances is higher than the specified values.
    Effective Time (UTC)Specify the time when the configurations of metric-based auto scaling start to take effect and expire.
    After the auto scaling rule is created, you can go to the Auto Scaling page of the service and view details of the rule.

You can modify or delete the number of provisioned instances as prompted.

Note To delete provisioned instances, set the Minimum Number of Instances parameter to 0.

Create an auto scaling rule for a provisioned instance

In provisioned mode, Function Compute creates a specified number of instances, but the instances may not be fully used. You can use Scheduled Setting Modification and Metric-based Setting Modification to make better use of provisioned instances.

Scheduled Setting Modification

  • Definition: Scheduled scaling helps you flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a specified time so that the number of instances can meet the concurrency of your business.
  • Applicable scenarios: Functions work based on periodic rules or predictable traffic peaks. If the provisioned instances are insufficient to process all the function invocation requests, the remaining requests are processed by on-demand instances. For more information, see Instance types and instance modes.
  • Sample configuration: The following figure shows two scheduled actions that are configured. The first scheduled action scales out the provisioned instances before the traffic peak, and the second scheduled action scales in the provisioned instances after the traffic peak. instance
Example: In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out. The configurations are set to be effective from 10:00:00 on November 1, 2022 to 10:00:00 on November 30, 2022. The number of provisioned instances is adjusted to 50 at 20:00 and 10 at 22:00 every day.
{
  "ServiceName": "service_1",
  "FunctionName": "function_1",
  "Qualifier": "alias_1",
  "ScheduledActions": [
    {
      "Name": "action_1",
      "StartTime": "2022-11-01T10:00:00Z",
      "EndTime": "2022-11-30T10:00:00Z",
      "TargetValue": 50,
      "ScheduleExpression": "cron(0 0 20 * * *)"
    },
    {
      "Name": "action_2",
      "StartTime": "2022-11-01T10:00:00Z",
      "EndTime": "2022-11-30T10:00:00Z",
      "TargetValue": 10,
      "ScheduleExpression": "cron(0 0 22 * * *)"
    }
  ]
}
The following table describes the parameters.
ParameterDescription
NameThe name of the scheduled scaling task.
StartTimeThe time when the configurations start to take effect. Specify the value in UTC.
EndTimeThe time when the configurations expire. Specify the value in UTC.
TargetValueThe target value.
ScheduleExpressionThe expression that specifies when to run the scheduled task. The following formats are supported:
  • At expressions - "at(yyyy-mm-ddThh:mm:ss)": specifies to run the scheduled task only once. Specify the value in UTC. For example, at(2021-04-01T12:00:00) specifies that the scheduling starts at 12:00 on April 1 in UTC.
  • Cron expressions - "cron(0 0 4 * * *)": specifies to run the scheduled task for multiple times. Specify the value in the standard crontab format in UTC. For example, if you want to run the scheduled task at 20:00 (UTC+8) or 12:00 (UTC) on a daily basis, set the parameter to cron(0 0 12 * * *).
The following table describes the fields of the cron expression in the format of Seconds Minutes Hours Day-of-month Month Day-of-week.
Table 1. Field description
Field nameValid valuesAllowed special characters
Seconds0 to 59None
Minutes0 to 59, - * /
Hours0 to 23, - * /
Day-of-month1 to 31, - * ? /
Month1 to 12 or JAN to DEC, - * /
Day-of-week1 to 7 or MON to SUN, - * ?
Table 2. Special characters
CharacterDescriptionExample
*Specifies any or each. In the Minutes field, 0 specifies that the task is run at the 0th second of every minute.
,Specifies a value list. In the Day-of-week field, MON, WED, FRI specifies every Monday, Wednesday, and Friday.
-Specifies a range. In the Hours field, 10-12 specifies a time range from 10:00 to 12:00 in UTC.
?Specifies an uncertain value. This special character is used together with other specified values. For example, if you specify a specific date, but you do not require the specified date to be a specific day of the week, you can use this special character in the Day-of-week field.
/Specifies increments. n/m specifies an increment of m starting from the position of n. In the minute field, 3/5 indicates that the operation is performed every 5 minutes starting from the third minute within an hour.

Metric-based Setting Modification

  • Definition: Metric-based auto-scaling tracks metrics to dynamically scale provisioned instances.
  • Scenario: Function Compute collects the concurrency utilization of provisioned instances or resource utilization metrics of instances on a regular basis, and uses this metrics together with the threshold values that you specify for scaling operations, to control the scaling of provisioned instances. This way, the number of provisioned instances can be adjusted to meet your business needs.
  • Principle: Provisioned instances are scaled every minute based on the metric value.
    • If the metric value exceeds the threshold that you configure, the system rapidly performs scale-out operations to adjust the number of provisioned instances to the target value.
    • If the metric value is lower than the threshold that you configure, the system slightly performs scale-in operations to adjust the number of provisioned instances to the specified value.
    If the maximum and minimum numbers of provisioned instances are configured, the system scales the provisioned instances between the maximum and minimum numbers. If the number of instances reaches the maximum or minimum number, scaling stops.
  • Sample configuration: The following figure shows an example of metric-based scaling for provisioned instances.
    • When the traffic increases and the threshold is triggered, provisioned instances start to be scaled out until the number of provisioned instances reaches the upper limit. Requests that cannot be processed by the provisioned instances are allocated to on-demand instances.
    • When the traffic decreases and the threshold is triggered, provisioned instances start to be scaled in.
    instance

Only the statistics on provisioned instances are collected to calculate the concurrency utilization of provisioned instances. The statistics on on-demand instances are not included.

The metric is calculated based on the following formula: The number of concurrent requests to which provisioned instances are responding/The maximum number of concurrent requests to which all provisioned instances can respond. The metric value ranges from 0 to 1.

The maximum number of concurrent requests to which provisioned instances can respond is calculated based on different instance concurrences. For more information, see Configure instance concurrency.
  • Each instance processes a single request at a time: Maximum concurrency = Number of instances.
  • Each instance concurrently processes multiple requests: Maximum concurrency = Number of instances × Number of requests concurrently processed by one instance.
Target values for scaling:
  • The values are determined by the current metric value, metric target, number of provisioned instances, and scale-in factor.
  • Calculation principle: The system scales in provisioned instances based on the scale-in factor. The factor value ranges from 0 (excluded) to 1. The scale-in factor is a system parameter that is used to slow down the scale-in speed. You do not need to set the scale-in factor. The target values for scaling operations are the smallest integers that are greater than or equal to the following calculation results:
    • Scale-out target value = Current provisioned instances × (Current metric value/Metric target)
    • Scale-in target value = Current provisioned instances × Scale-in factor x (1 - Current metric value/Metric target)
  • Example: If the current metric value is 80%, the metric target is 40%, and the number of provisioned instances is 100, the target value is calculated based on the following formula: 100 × (80%/40%) = 200. The number of provisioned instances is increased to 200 to ensure that the metric target remains near the 40%.
Example: In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out based on the ProvisionedConcurrencyUtilization metric. The configurations are set to be effective from 10:00:00 on November 1, 2022 to 10:00:00 on November 30, 2022. When the concurrency utilization exceeds 60%, provisioned instances are scaled out, and the number of provisioned instances can be up to 100. When the concurrency utilization is lower than 60%, provisioned instances are scaled in, and the number of provisioned instances can be reduced to 10.
{
  "ServiceName": "service_1",
  "FunctionName": "function_1",
  "Qualifier": "alias_1",
  "TargetTrackingPolicies": [
    {
      "Name": "action_1",
      "StartTime": "2022-11-01T10:00:00Z",
      "EndTime": "2022-11-30T10:00:00Z",
      "MetricType": "ProvisionedConcurrencyUtilization",
      "MetricTarget": 0.6,
      "MinCapacity": 10,
      "MaxCapacity": 100,
    }
  ]
}
The following table describes the parameters.
ParameterDescription
NameThe name of the configured metric-based task.
StartTimeThe time when the configurations start to take effect. Specify the value in UTC.
EndTimeThe time when the configurations expire. Specify the value in UTC.
MetricTypeThe metric that is tracked. Set the parameter to ProvisionedConcurrencyUtilization.
MetricTargetThe threshold for metric-based auto scaling.
MinCapacityThe maximum number of provisioned instances for scale-out.
MaxCapacityThe minimum number of provisioned instances for scale-in.