In Function Compute, you can use instances in the on-demand mode and provisioned mode. This topic describes the principles, billing methods, and scaling limits of the two modes and how to configure provisioned instances and auto scaling rules for provisioned instances in the Function Compute console.

On-demand mode

In on-demand mode, instances are allocated and released by Function Compute. Function Compute automatically scales instances based on the number of function invocations. Function Compute creates instances when the number of function invocations increases, and destroys instances when the number of requests decrease. The on-demand mode helps improve resource utilization and simplify resource management. By default, a maximum number of 300 on-demand instances can be created for an Alibaba Cloud account in each region. To increase the upper limit, Submit a ticket.

Billing method: You are charged only when a function invocation is performed. No instances are allocated and you are not charged when no function invocation request is sent. For more information about pricing and billing, see Billing overview.

Provisioned mode

In on-demand mode, instances are automatically created when function invocation requests are sent. When you initiate an invocation for the first time, a cold start occurs. If you want to reduce the impact of the cold start latency, you can configure the provisioned mode. In provisioned mode, you can manage the allocation and release of function instances. The provisioned instances are retained unless you release them. Instances in provisioned mode are assigned a higher priority than the instances in on-demand mode. If the number of provisioned instances is insufficient to process all requests, Function Compute allocates on-demand instances to process the remaining requests.

Billing method: The billing of a provisioned instance starts after the instance is created and ends when the instance is released. You are charged for provisioned instances until you release the instances regardless of whether the instances are processing requests or not. For more information about pricing and billing, see Billing overview. On-Demand Resources

Instance scaling limits

Configure scaling limits for on-demand instances

Function Compute preferentially uses existing instances to process function invocation requests. If the existing instances are fully loaded, Function Compute creates new instances to process the requests. As the number of invocation requests increases, Function Compute continues to create new instances until enough instances are created to handle the requests or the upper limit is reached. The following limits apply when instances are scaled out:
  • Default upper limit for running instances per region: 300.
  • The scale-out speed of the running instances is limited by the upper limit of burst instances and the growth rate of the instances. For the limits for different regions, see Limits on scaling speeds of instances in different regions.
    • Burst instances: the number of instances that can be immediately created. The default upper limit for burst instances is 100 or 300.
    • Instance growth rate: the number of instances that can be added per minute after the upper limit for burst instances is reached. The default upper limit of growth rate is 100 or 300.
When the total number of instances or the scaling speed of the instances exceeds the limit, Function Compute returns a throttling error, for which the HTTP status code is 429. The following figure shows how Function Compute performs throttling in a scenario where the number of invocations increases rapidly.dg_fc_throttling_behaviors
  • Before the upper limit for burst instances is reached, Function Compute creates instances immediately. During this process, a cold start occurs and no throttling error is reported. This process is marked by 1 in the preceding figure.
  • When the limit for burst instances is reached, the increase of instances is restricted by the growth rate. Throttling errors are reported for some requests. This process is marked by 2 in the preceding figure.
  • After the limit for burst instances is exceeded, throttling errors are reported for some requests. This process is marked by 3 in the preceding figure.

By default, all functions within an Alibaba Cloud account in the same region share the preceding scaling limits. To configure the limit on the number of instances for a specific function, see Overview of configuring the maximum number of on-demand instances. After the configuration, Function Compute returns a throttling error when the total number of running instances for the function exceeds the configured limit.

Configure scaling limits for provisioned instances

When the number of burst invocations is large, the creation of a large number of instances is throttled, which results in request failures. The cold starts of instances also increase the latencies of requests. To avoid these issues, you can use provisioned instances in Function Compute. The provisioned instances are those reserved in advance of invocations. The upper limits on the number of provisioned instances and the scaling speed of provisioned instances are independent of those limits on on-demand instances.
  • Default upper limit for provisioned instances per region: 300.
  • Default upper limit for the scaling speed of provisioned instances per minute: 100 or 300. The limit varies based on the region. The following figure shows how Function Compute performs throttling when provisioned instances are configured in the same loading scenario as the preceding figure.dg_fc_ throttling_behaviors_with_provisioned_instances
    • Before the provisioned instances are fully used, the requests are processed immediately. During this process, no cold start occurs and no throttling error is reported. This process is marked by 1 in the preceding figure.
    • When the provisioned instances are fully used, Function Compute creates instances immediately before the upper limit for burst instances is reached. During this process, a cold start occurs and no throttling error is reported. This process is marked by 2 in the preceding figure.

Limits on scaling speeds of instances in different regions

Region Number of burst instances Instance growth rate
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen) 300 300 per minute
Others 100 100 per minute
Note
  • In the same region, the limits on scaling speeds of provisioned instances and on-demand instances are the same.
  • If you want to increase the upper limits, Submit a ticket.
  • The scaling speed of performance instances is lower than that of elastic instances. We recommend that you use performance instances together with the provisioned mode.

Configure provisioned instances

  1. Log on to the Function Compute console.
  2. In the left-side navigation pane, click Services and Functions.
  3. In the top navigation bar, select the region where the service resides.
  4. On the Services page, find the desired service and click Functions in the Actions column.
  5. On the Functions page, click the function that you want to modify.
  6. On the Function Details page, click the Auto Scaling tab and click Create Rule.
  7. On the page that appears, configure the following parameters and click Create.
    Parameter Description
    Basic Settings
    Version or Alias Select the version or alias for which you want to create provisioned instances.
    Note You can create provisioned instances only for the LATEST version.
    Function Name Select the function whose invocation requests are to be processed by the provisioned instances.
    Minimum Number of Instances Enter the number of provisioned instances that you want to create. The minimum number of instances equals the number of provisioned instances to be created.
    Note You can set the minimum number of function instances to reduce cold starts and the time needed to respond to function invocation requests. This helps improve the service performance for online applications that are sensitive to response latency.
    Maximum Number of Instances Enter the maximum number of instances. The maximum number of instances equals the number of provisioned instances plus the maximum number of on-demand instances that can be allocated.
    Note
    • You can set the maximum number of function instances to prevent a large number of instances from being occupied by a single function due to excessive invocations, protect backend resources, and prevent unexpected charges.
    • If you do not set this parameter, the maximum number of instances is determined by the resource limit for the current region and Alibaba Cloud account.
    (Optional) Scheduled Setting Modification: You can create scheduled scaling rules to flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a scheduled time. This way, the number of provisioned instances can meet the concurrency of your business.
    Policy Name Enter a policy name.
    Minimum Number of Instances Enter the minimum number of provisioned instances.
    Schedule Expression (UTC) Enter the expression of the schedule. Example: cron(0 0 20 * * *). For more information, see Parameters.
    Effective Time (UTC) Set the time when the configurations of scheduled scaling start to take effect and expire.
    (Optional) Metric-based Setting Modification: Provisioned instances are scaled in or out every minute based on the concurrency utilization of provisioned instances.
    Policy Name Enter a policy name.
    Minimum Range of Instances Specify the value range for the minimum number of provisioned instances.
    Usage Threshold Set the expected utilization rate. If resource utilization is lower than the value of this parameter, the system scales in provisioned instances. If resource utilization is higher than the value of this parameter, the system scales out provisioned instances.
    Effective Time (UTC) Specify the time when the configurations of metric-based auto scaling start to take effect and expire.
    After the auto scaling rule is created, you can go to the Auto Scaling page of the service and view details of the rule.

Modify a provisioned instance

  1. Log on to the Function Compute console.
  2. In the left-side navigation pane, click Services and Functions.
  3. In the top navigation bar, select the region where the service resides.
  4. On the Services page, find the desired service and click Functions in the Actions column.
  5. On the Functions page, click the function that you want to modify.
  6. On the Function Details page, click the Auto Scaling tab, find the rule that you want to manage, and then click Modify in the Actions column.
    Note To delete provisioned instances, set the Minimum Number of Instances parameter to 0.
  7. On the page that appears, modify the parameters in the Basic Settings section and the policy information and click Save.

Create an auto scaling rule for a provisioned instance

In provisioned mode, Function Compute creates a specified number of instances, but the instances may not be fully used. You can enable the scheduled scaling or metric-based auto scaling feature to make better use of provisioned instances.

Scheduled scaling

  • Definition: Scheduled scaling helps you flexibly configure provisioned instances. You can configure the number of provisioned instances to be automatically adjusted to a specified value at a specified time so that the number of instances can meet the concurrency of your business.
  • Scenario: You can enable the scheduled scaling feature to reserve instances in advance of periodic or predicted traffic peaks for functions. If the provisioned instances are insufficient to process all the function invocation requests, the remaining requests are processed by on-demand instances. For more information, see Instance types and instance modes.
  • Sample configuration: The following figure shows two scheduled actions that are configured. The first scheduled action scales out the provisioned instances before the traffic peak, and the second scheduled action scales in the provisioned instances after the traffic peak. instance
Sample code:
  • In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out. The configuration is set to be effective from 10:00:00 on November 1, 2020 to 10:00:00 on November 30, 2020 (UTC+8). The number of provisioned instances is adjusted to 50 at 20:00 and 10 at 22:00 every day.
  • {
      "ServiceName": "service_1",
      "FunctionName": "function_1",
      "Qualifier": "alias_1",
      "ScheduledActions": [
        {
          "Name": "action_1",
          "StartTime": "2020-11-01T10:00:00Z",
          "EndTime": "2020-11-30T10:00:00Z",
          "TargetValue": 50,
          "ScheduleExpression": "cron(0 0 20 * * *)"
        },
        {
          "Name": "action_2",
          "StartTime": "2020-11-01T10:00:00Z",
          "EndTime": "2020-11-30T10:00:00Z",
          "TargetValue": 10,
          "ScheduleExpression": "cron(0 0 22 * * *)"
        }
      ]
    }
  • The following table describes the parameters.
    Parameter Description
    Name The name of the scheduled scaling task.
    StartTime The time when the configuration starts to take effect. Specify the value in UTC.
    EndTime The time when the configuration expires. Specify the value in UTC.
    TargetValue The target value.
    ScheduleExpression The expression that specifies when to run the scheduled task. The following formats are supported:
    • At expressions - "at(yyyy-mm-ddThh:mm:ss)": specifies to run the scheduled task only once. Specify the value in UTC. For example, at(2021-04-01T12:00:00) specifies that the scheduling starts at 12:00 on April 1 in UTC.
    • Cron expressions - "cron(0 0 4 * * *)": specifies to run the scheduled task for multiple times. Specify the value in the standard crontab format in UTC. For example, if you want to run the scheduled task at 20:00 (UTC+8) or 12:00 (UTC) on a daily basis, set the parameter to cron(0 0 12 * * *).
    The following table describes the fields of the cron expression in the format of Seconds Minutes Hours Day-of-month Month Day-of-week.
    Table 1. Field description
    Field Value range Allowed special character
    Seconds 0 to 59 No
    Minutes 0 to 59 , - * /
    Hours 0 to 23 , - * /
    Day-of-month 1 to 31 , - * ? /
    Month 1 to 12 or JAN to DEC , - * /
    Day-of-week 1 to 7 or MON to SUN , - * ?
    Table 2. Special characters
    Character Definition Examples
    * Specifies any or each. In the Minutes field, 0 specifies that the task is run at the 0th second of every minute.
    , Specifies a value list. In the Day-of-week field, MON, WED, FRI specifies every Monday, Wednesday, and Friday.
    - Specifies a range. In the Hours field, 10-12 specifies a time range from 10:00 to 12:00 in UTC.
    ? Specifies an uncertain value. This special character is used together with other specified values. For example, if you specify a specific date, but you do not require the specified date to be a specific day of the week, you can use this special character in the Day-of-week field.
    / Specifies increments. n/m specifies an increment of m starting from the position of n. In the minute field, 3/5 indicates that the operation is performed every 5 minutes starting from the third minute within an hour.

Metric-based auto scaling

  • Definition: Metric-based auto-scaling tracks metrics to dynamically scale provisioned instances.
  • Scenario: Function Compute collects the concurrency utilization of provisioned instances on a regular basis, and uses this metric together with the threshold values that you specify for scaling operations, to control the scaling of provisioned instances. This way, the number of provisioned instances can be adjusted to meet your business needs.
  • Principle: Provisioned instances are scaled every minute based on the metric value.
    • If the metric value exceeds the threshold that you configure, the system rapidly performs scale-out operations to adjust the number of provisioned instances to the target value.
    • If the metric value is lower than the threshold that you configure, the system slightly performs scale-in operations to adjust the number of provisioned instances to the specified value.
    If the maximum and minimum numbers of provisioned instances are configured, the system scales the provisioned instances between the maximum and minimum numbers. If the number of instances reaches the maximum or minimum number, scaling stops.
  • Examples:instance
    • When the traffic increases and the threshold (80% in this example) is triggered, provisioned instances start to be scaled out until the number of provisioned instances reaches the upper limit (100 in this example). Requests that cannot be processed by the provisioned instances are allocated to on-demand instances.
    • When the traffic decreases and the threshold (60% in this example) is triggered, provisioned instances start to be scaled in.
Only the statistics on provisioned instances are collected to calculate the concurrency utilization of provisioned instances. The statistics on the on-demand instances are not included. The metric is calculated based on the following formula: The number of concurrent requests to which provisioned instances are responding/The maximum number of concurrent requests to which all provisioned instances can respond. The metric value ranges from 0 to 1. The maximum number of concurrent requests to which provisioned instances can respond is calculated based on different instance concurrencies. For more information, see Set the request concurrency in a single instance.
  • Each instance processes a single request at a time: Maximum concurrency = Number of instances.
  • Each instance concurrently processes multiple requests: Maximum concurrency = Number of instances × Number of requests concurrently processed by one instance.
Target values for scaling:
  • The values are determined by the current metric value, metric target, number of provisioned instances, and scale-in factor.
  • Calculation principle: The system scales in provisioned instances based on the scale-in factor. The factor value ranges from 0 (excluded) to 1. The scale-in factor is a system parameter that is used to slow down the scale-in speed. You do not need to set the scale-in factor. The target values for scaling operations are the smallest integers that are greater than or equal to the following calculation results:
    • Scale-out target value = Current provisioned instances × (Current metric value/Metric target)
    • Scale-in target value = Current provisioned instances × Scale-in factor x (1 - Current metric value/Metric target)
  • Example: If the current metric value is 80%, the metric target is 40%, and the number of provisioned instances is 100, the target value is calculated based on the following formula: 100 × (80%/40%) = 200. The number of provisioned instances is increased to 200 to ensure that the metric target remains near the 40%.
Sample code:
  • In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out based on the ProvisionedConcurrencyUtilization metric. The configuration is set to be effective from 10:00:00 on November 1, 2020 to 10:00:00 on November 30, 2020. When the concurrency utilization exceeds 60%, provisioned instances are scaled out, and the number of provisioned instances can be up to 100. When the concurrency utilization is lower than 60%, provisioned instances are scaled in, and the number of provisioned instances can be reduced to 10.
  • {
      "ServiceName": "service_1",
      "FunctionName": "function_1",
      "Qualifier": "alias_1",
      "TargetTrackingPolicies": [
        {
          "Name": "action_1",
          "StartTime": "2020-11-01T10:00:00Z",
          "EndTime": "2020-11-30T10:00:00Z",
          "MetricType": "ProvisionedConcurrencyUtilization",
          "MetricTarget": 0.6,
          "MinCapacity": 10,
          "MaxCapacity": 100,
        }
      ]
    }
  • The following table describes the parameters.
    Parameter Description
    Name The name of the scheduled scaling task.
    StartTime The time when the configuration starts to take effect. Specify the value in UTC.
    EndTime The time when the configuration expires. Specify the value in UTC.
    MetricType The metric that is tracked. Set the parameter to ProvisionedConcurrencyUtilization.
    MetricTarget The threshold for metric-based auto scaling.
    MinCapacity The maximum number of provisioned instances for scale-out.
    MaxCapacity The minimum number of provisioned instances for scale-in.