The optimal solution between serverless cold start and cost

I heard that you have also made such technology selection

Xiao Wang is a programmer, and the company's applications run on servers in its own computer room. All the underlying services and operation and maintenance need to be done by oneself. Every upgrade and machine expansion brings considerable operation and maintenance pressure. At the same time, in order to be able to expand the capacity in a timely manner, many idle machines have been piled up, and the machine cost has been relatively high. Recently, the company has developed two new application systems. Xiaowang is making technology choices and plans to embrace cloud computing, deploy new applications on the cloud, and design a set of highly elastic, low-cost, simple O&M architecture solutions that can easily cope with sudden increases in business traffic, allowing him to devote more energy to business development, reducing his O&M burden.

These two applications have several common characteristics:

• Both applications belong to online applications and have relatively high requirements for call latency and service stability

• Application traffic varies greatly with business, and it is difficult to predict in advance how much business volume will increase, requiring high flexibility.

• There are obvious low peak periods of business, and the amount of adjustment during the low peak period is relatively low. It is expected that the low peak period will mainly focus on the evening.

• Long application startup time: One is the order system of Java SpringBoot, and the other is an AI image recognition system based on large size images, with a startup time of nearly 1 minute.

Xiao Wang's needs can be summarized into three aspects:

• First, I hope to save time and worry in operation and maintenance. After delivering the jar package or image, I only need to configure the application to run easily, and I don't need to spend special efforts on operation and maintenance, monitoring, and alerting.

• Second, it has good flexibility. When business traffic increases, it can automatically and timely expand capacity, and when traffic decreases, it can automatically shrink capacity.

• The third is to improve resource utilization through the use of cloud computing, which has a greater cost advantage.

Let's take it apart and see how Xiao Wang carries out the technology selection step by step.

Highly integrated service, free of operation and maintenance, high elasticity

When making technology selection, Xiao Wang considered three technical architectures: SLB+cloud server+traditional architecture with elastic scaling, K8s architecture, and functional computing (FC) architecture.

Traditional architectures require their own SLB load balancing; Configure elastic scaling services and continuously debug to find appropriate scaling strategies; You also need to collect logs yourself to create alerts and monitor the large disk. The cost of operation, maintenance, and deployment for this set of systems is not actually very low. Is there a more convenient solution?

Xiao Wang further investigated the K8s architecture. The services and Ingress rules of K8s can manage access to the application layer, eliminating the need for SLB load balancing on their own. At the same time, HPA is used to scale horizontally based on application water levels. This may seem like a good idea, but during actual testing, it was found that HPA scaling is at the minute level, and slower scaling is not a problem. However, when traffic increases rapidly, scaling is always delayed by a few minutes, which can lead to increased or failed requests, affecting service availability. If you lower the indicator threshold for capacity expansion, you can solve this problem, but at the same time, it reduces resource utilization and increases costs considerably. In addition, we also need to conduct our own log collection, alarm, and monitoring of the overall market, and there are also many operation and maintenance costs. Moreover, Xiao Wang has not been exposed to K8S before, and understanding the various concepts of K8S has a lot of costs.

The FC based architecture can well solve the above problems. First of all, FC supports reservation mode and automatic scaling based on instance indicators. This mode can achieve more sensitive and rapid scaling capabilities, and ensure that the request latency remains stable during scaling; Secondly, FC is highly integrated with many out of the box functions, providing a smooth and hassle free experience, such as providing http triggers, eliminating the work of interfacing with gateways and SLBs; The console provides complete observable capabilities to easily view requests, instance status, and operation logs. Finally, FC only needs to pay for calls and active resources used during calls, without incurring fees when there are no calls, which can fully improve resource utilization and reduce costs.

Below, we will specifically introduce the use of the reservation mode and how to reduce the cost of using the reservation through idle charging.

Reserved mode, perfect solution for cold start

FC supports two usage modes: on-demand and reserved. The on-demand mode automatically triggers the creation and expansion of instances through requests, creates instances when the call volume increases, and destroys instances after the request decreases. "The on-demand mode fully improves resource utilization, but for applications such as Xiaowang that have a relatively long startup time, there will be a significant cold start phenomenon when creating instances in the on-demand mode.". To address this cold start issue, FC provides a reserved usage mode. After the user configures the reservation, FC will create a specified number of reserved instances to reside in the system until the user updates the reservation configuration and releases them. When there are requests, priority will be given to scheduling reserved instances. When the reserved instances are full, new requests will trigger the creation of on-demand instances. At the same time, in order to better align the reserved instance volume with the business curve, the reservation timing scaling and scaling by index capabilities are also provided to improve the utilization rate of reserved instances. Click here for more details.

This approach not only solves the problem of long cold start times for applications, but also ensures that reserved instances maintain a relatively high utilization level. Even if there are occasional large traffic fluctuations, you can temporarily expand the capacity of on-demand instances to respond to requests, and try to ensure the quality of service in the event of rapid traffic increases.

Idle billing, a big killer of cost reduction

In real usage scenarios, in order to ensure low latency of application requests, it is necessary to maintain a certain number of reserved instances even when there are no requests, which leads to an increase in costs. Is there a way to achieve both low latency and low cost? Function Calculation To help users reduce the usage cost in this scenario, the idle billing function for reserved instances has been introduced. Let's take a closer look at this function.

Idle billing

Based on whether the reserved instance is processing requests, we distinguish the instance into idle and active states, and set the billing unit price for each state. The active billing unit price is consistent with the original resource usage unit price. The idle billing unit price is 20% of the active billing unit price. Enabling idle billing can help you save a lot of costs.

By default, the idle charging function is turned off. At this time, regardless of whether an instance in the reserved mode is processing requests or not, FC will allocate CPU to it and keep the instance active all the time, to ensure that the instance can still run background tasks normally when there are no requests. After enabling the idle charging function, when an instance in the reserved mode has no requests, FC will freeze the CPU on the instance, causing the instance to enter the idle state.

By increasing idle billing, reserved instances are also paid only for the CPU resources that are actually used. When a reserved instance is idle, it only needs to pay a 20% fee to address the issue of cold start of the instance. This will help users significantly reduce the cost of using reserved instances. At the same time, users can also pay less attention to the utilization of reserved instances and confidently use reserved instances.

As an example, let's assume that the utilization rate of reserved instances is 60% and the original usage cost is 1. After using idle billing, the cost is 60% * 1+40% * 20% * 1=0.68, which can lead to a 32% cost reduction.

Collocation method

You can configure reserved instances and idle billing through the console and SDK.

Log in to the function calculation console and select the creation rule on the Home page ->Elastic Management page to configure the Idle Billing. At the same time, you can use the SDK for configuration, and support multiple languages such as Java, Go, and Node.js. For details, please refer to API online debugging.

After enabling idle billing, you can check the idle resource usage fees for elastic instances and performance instances in the Expense Center - Billing Details - Detailed Billing (billing bills typically delay output by 3-6 hours).


Functional Computing (FC) has been committed to providing users with highly resilient, maintenance free, and low-cost fully managed computing services. The release of the idle charging function can help users further reduce the cost of using reserved instances, allowing users to only pay for the reserved resources that they actually use. Functional computing will gradually release more technical dividends from serverless, continuously providing users with better performance, cost, and experience.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us