Four Secrets of Serverless Application Optimization

Resource assessment remains important

Although the Serverless architecture is paid as you go, it does not necessarily mean that it is cheaper than traditional server rental fees. If we do not evaluate our own projects accurately and set some indicators unreasonably, the costs incurred by the Serverless architecture may be enormous.

Generally, the fees charged by the FaaS platform are directly related to three indicators:

The configured memory specification;

The time consumed by the program;

And the resulting traffic costs.

Generally, the time consumed by a program may be related to memory specifications and the business logic handled by the program itself. The traffic cost is related to the size of data packets interacting with the program itself and the client. Therefore, in these three common indicators, there may be significant deviations in billing due to non-standard configuration, which is the memory specification. Taking Alibaba Cloud function computing as an example, let's assume that there is a Hello World program that is executed 10000 times a day. It can calculate the costs incurred by instances of different specifications (excluding network costs):

Alibaba Cloud

As can be seen from the table above, when the program can execute normally in 128MB of memory, if we mistakenly set the memory specification to 3072MB, the monthly cost may skyrocket by 25 times! Therefore, before launching the Serverless application, we need to evaluate the resources to obtain a more reasonable configuration to further reduce our costs.

Reasonable code package specification

Each cloud vendor's FaaS platform has restrictions on the size of code packages. Throwing aside the restrictions imposed by cloud vendors on code packages, simply stating the possible impact of code package specifications can be seen through the function's cold start process:

During the function startup process, there is a process of loading code. If the code package we upload is too large, or if there are too many files that cause the decompression speed to be too slow, this directly leads to a longer process of loading code, further leading to a longer cold start time.

It can be imagined that when we have two compression packages, one is a code compression package with a size of only 100KB, and the other is a code compression package with a size of 200MB, both of which are ideally downloaded with a gigabit intranet bandwidth (that is, regardless of the storage speed of the disk, etc.). Even if the maximum speed can reach 125MB/S, the download speed of the former is only less than 0.01s, and the latter requires 1.6s. In addition to the download time, there is also a file decompression time, so the cold start time of the two may differ by 2s.

Generally, if a traditional Web interface requires a response time of more than 2s, it is actually unacceptable for many businesses. Therefore, when packaging code, it is necessary to reduce the size of the compressed package as much as possible. Taking the Node.js project as an example, when packaging code packages, methods such as Webpack can be used to compress the size of dependent packages, further reducing the overall code package specification, and improving the cold start efficiency of functions.

Reasonably utilizing reuse of instances

In various cloud vendors' FaaS platforms, in order to better solve the cold start problem and make more reasonable use of resources, there is a situation of "instance" reuse. The so-called instance reuse is that when an instance completes a request, it does not release, but enters a "silent" state. Within a certain time frame, if new requests are allocated, the corresponding methods will be directly called without the need to initialize various resources, which greatly reduces the occurrence of cold starting of functions. To verify, we can create two functions:

Function 1:

# -*- coding: utf-8 -*-

def handler(event, context):


return 'hello world'

Function 2:

# -*- coding: utf-8 -*-


def handler(event, context):

return 'hello world'

We clicked the "Test" button multiple times on the console to test these two functions and determine whether they output "Test" in the log. We can count the results:

According to the above situation, we can see that instance reuse actually exists. "Function 2" does not always execute statements other than the entry function. Based on "Function 1" and "Function 2", we can also further consider that if the print ("Test") statement is an initialization database connection or a deep learning model is loaded, is it true that "Function 1" is written in such a way that every request is executed, while "Function 2" can be written in such a way that existing objects can be reused?

Therefore, in actual projects, there are some initialization operations that can be implemented according to "Function 2", such as:

• In machine learning scenarios, load models during initialization to avoid efficiency issues caused by loading models every time a function is triggered, and improve response efficiency in instance reuse scenarios;

• Database and other link operations can establish link objects during initialization to avoid creating link objects every time a request is made;

• For other scenarios where files need to be downloaded and loaded for the first time, implementing these requirements during initialization can make instance reuse more efficient;

Good at using functional characteristics

Each cloud manufacturer's FaaS platform has some "platform features". The so-called platform features refer to those functions that may not be the capabilities specified in the "CNCF WG Serverless Whitepaper v1.0" or described in the "CNCF WG Serverless Whitepaper v1.0". They are only functions that are mined and implemented as a cloud platform from the perspective of users based on their own business development and demands, and may only be owned by a certain cloud platform or several cloud platforms. Generally, if properly utilized, this type of function can significantly improve our business performance.

1、Pre-freeze & Pre-stop

Taking Alibaba Cloud functional computing as an example, during the development of the platform, user pain points (especially the smooth migration of traditional applications to the Serverless architecture) are as follows:

• Asynchronous background indicator data delay or loss: If the transmission is not successful during the request, it may be delayed until the next request, or the data point may be discarded.

• Synchronous transmission indicators increase latency: If a similar Flush interface is called after each request ends, it not only increases the latency of each request, but also creates unnecessary pressure on back-end services.

• Function elegant offline: When an instance is closed, the application has requirements for cleaning up connections, closing processes, and reporting status. In function calculation, developers cannot grasp the timing of instance logoff, and there is a lack of Webhook to notify function instance logoff events.

Based on these pain points, runtime extensions have been released. This function extends the existing HTTP service programming model by adding PreFreeze and PreStop webhooks to the existing HTTP server model. Extension developers implement HTTP handlers to listen for function instance lifecycle events, as shown in the following figure:

• PreFreeze: Before each time the function calculation service decides to freeze the current function instance, the function calculation service will call the HTTP GET/pre freeze path. The extension developer is responsible for implementing the corresponding logic to ensure that the necessary operations before freezing the instance are completed, such as waiting for the indicator to be sent successfully. The time of the function call InvokeFunction does not include the execution time of the PreFreeze Hook.

• PreStop: Before each function calculation decision to stop the current function instance, the function calculation service will call the HTTP GET/pre stop path. The extension developer is responsible for implementing the corresponding logic to ensure that the necessary operations before releasing the instance are completed, such as closing database links, reporting, and updating the status.

2. Single Instance Multiple Concurrency

It is well known that function computing by various vendors typically involves request level isolation, that is, when a client simultaneously initiates three requests to function computing, three instances will theoretically be generated to respond. This may involve cold start issues, and may involve state correlation issues between requests. However, some cloud vendors provide the ability to multiple concurrent requests per instance (such as Alibaba Cloud function computing), This capability allows users to set an Instance Concurrency for a function, that is, how many requests a single function instance can handle simultaneously.

As shown in the figure below, suppose that there are three requests to be processed at the same time. When the instance concurrency is set to 1, the function calculation needs to create three instances to process these three requests, with each instance processing one request separately; When the instance concurrency is set to 10 (that is, one instance can process 10 requests at the same time), the function calculation only needs to create one instance to process these 3 requests.

Single Instance Multiple Concurrency Effect Diagram

The advantages of single instance and multiple concurrency are as follows:

• Reduce execution time and save costs. For example, I/O biased functions can be processed concurrently within an instance, reducing the number of instances and thus reducing the total execution time.

• Status can be shared between requests. Multiple requests can share the database connection pool within an instance, thereby reducing the number of connections to the database.

• Reduce cold start probability. Because multiple requests can be processed within one instance, the number of times a new instance is created decreases, and the probability of a cold start decreases.

• Reducing VPC IP usage Under the same load, multiple concurrent instances per instance can reduce the total number of instances, thereby reducing VPC IP usage.

The application scenarios of single instance multi concurrency are relatively broad. For example, scenarios where a function spends more time waiting for a response from downstream services are more suitable for using this function. However, single instance multi concurrency is not suitable for all application scenarios, such as scenarios where a function has shared state and cannot be accessed concurrently, and where the execution of a single request consumes a large amount of CPU and memory resources, It is not suitable to use the single instance multi concurrency function.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us