How to speed up container startup

Abstract

Serverless Computing (FaaS) is a new cloud computing paradigm that allows customers to only focus on their own code business logic. The underlying virtualization, resource management, and elastic scaling of the system are all entrusted to cloud system service providers for maintenance. Serverless Computing supports container ecology and unlocks various business scenarios. However, due to the complex and large container mirroring, high dynamic and unpredictable workload of FaaS, many industry-leading products and technologies cannot be well applied to the FaaS platform. Therefore, efficient container distribution technology faces challenges on the FaaS platform.

In this paper, we design and propose FaaSNet. FaaSNet is a lightweight system middleware with high scalability, which utilizes the image acceleration format for container distribution. The target scenario is large-scale container mirror startup (function cold start) under sudden traffic in FaaS. The core components of FaaSNet include Function Tree (FT), which is a decentralized and self balanced binary tree topology. All nodes in the tree topology are equivalent.

We integrated FaaSNet into function computing products, and experimental results showed that under high concurrent requests, FaaSNet can provide 13.4 times the container startup speed for FC compared to Function Compute (FC). Moreover, compared to FC, FaaSNet can restore end-to-end latency to normal levels in 75.2% less time due to unstable end-to-end latency caused by sudden requests.

Paper Introduction

1. Background and Challenges

FC supports custom container mirroring in September 2020( https://developer.aliyun.com/article/772788 )In December of the same year, AWS Lambda successively released Lambda container image support, indicating that FaaS embraces the general trend of container ecology. And the function calculation was launched in February 2021 to accelerate the image of function calculation( https://developer.aliyun.com/article/781992 )Function. The two functions of function computing unlock more FaaS application scenarios, allowing users to seamlessly migrate their container business logic to the function computing platform, and can achieve GB level mirroring at the second level.

When the function calculation backend encounters large-scale requests that result in too many cold starts of the function, even with the support of image acceleration function, it will bring huge pressure on the bandwidth of the container registry. Multiple machines simultaneously pull image data from the same container registry, resulting in bottleneck or flow restriction of the container image service bandwidth, and increasing the time for pulling and downloading image data (even in image acceleration format). A more direct approach can improve the bandwidth capacity of the function computing background Registry, but this approach cannot solve the fundamental problem, and it will also bring additional overhead.

1) Workload analysis

We first analyzed online data from the two major regions of FC (Beijing and Shanghai):

Figure (a) analyzes the delay of FC system pull image during function cold start, and it can be seen that in Beijing and Shanghai, there are~80% and~90% pull image delays greater than 10 seconds, respectively;

Figure (b) shows the proportion of pull image in the entire cold start. It can also be found that 80% of the functions in the Beijing region and 90% of the functions in the Shanghai region will occupy a delay greater than 60% of the entire cold start;

The analysis of workload shows that the vast majority of the cold start time of the function is spent on obtaining container image data, so optimizing this delay can greatly improve the cold start performance of the function.

According to the historical records of online operation and maintenance, a representative of a major user will instantly pull up 4000 function images. The size of these images before decompression is 1.8GB, and after decompression is 3-4GB. At the moment when a high traffic request arrives and begins to pull up the container, a flow control alarm from the container service is received, causing some request delays to be extended. In severe cases, the container startup failure prompt may be received. These types of problem scenarios are urgently needed for us to solve.

2) State of the art comparison

There are several related technologies in academia and industry that can accelerate the distribution speed of images, such as:

DADI

DADI provides a highly efficient image acceleration format that enables on-demand reading (FaaSNet also utilizes container acceleration format). In terms of image distribution technology, DADI adopts a tree topology structure to network nodes at the granularity of image layers. Each layer corresponds to a tree topology structure, and each VM will exist in multiple logical trees. The P2P distribution of DADI relies on several root nodes with larger performance specifications (CPU, bandwidth) to play the role of data back-to-source and maintain the role of manager of peers in the topology; The tree structure of DADI is relatively static, because the speed of container provisioning generally does not last long. Therefore, by default, the root node of DADI will dissolve the topology logic after 20 minutes and will not maintain it continuously.

dragonfly

Dragonfly is also a P2P based mirroring and file distribution network, with component package blocks Supernode (Master node) and dfget (Peer node). Similar to DADI, dragonflies also rely on several large specifications of Supernodes to support the entire cluster. Dragonflies also manage and maintain a fully linked topology through a central Supernode node (multiple dfget nodes contribute different pieces of the same file to achieve point-to-point transmission to the target node), and Supernode performance can be a potential bottleneck in the throughput performance of the entire cluster.

Kraken

Kraken's origin and tracker nodes serve as central nodes to manage the entire network, and agents exist on each peer node. Kraken's tracker nodes only manage the connections of peers in the organizational cluster, and Kraken allows peer nodes to communicate and transmit data on their own. But Kraken is also a container image distribution network based on layers, and the networking logic can become a more complex fully connected mode.

By explaining the three industry-leading technologies mentioned above, we can see several common points:

Firstly, all three use image layers as distribution units, and the networking logic is too fine-grained, resulting in multiple active data connections on each peer node at the same time;

Secondly, all three rely on the central node for the management of networking logic and the coordination of peer nodes within the cluster. The central nodes of DADI and Dragonfly are also responsible for data return to the source. This design requires the deployment of several large-scale machines to bear very high traffic in production use, and also requires parameter tuning to achieve the expected performance indicators.

We consider the design under the FC ECS architecture with some of the above prerequisites. Each machine in the FC ECS architecture has a specification of 2 CPU cores, 4GB of memory, and 1Gbps of internal network bandwidth, and the lifecycle of these machines is unreliable and may be reclaimed at any time.

This brings three more serious problems:

Insufficient internal network bandwidth makes it easier to experience bandwidth congestion in full connections, leading to a decrease in data transmission performance. The fully connected topology structure does not achieve function aware, which can easily cause system security issues under FC, because each machine executing function logic is not trusted by FC system components, leaving a security risk for tenant A to intercept data from tenant B;

CPU and bandwidth specifications are limited. Due to the billing characteristics of Pay per use function calculations, the machine lifecycle within our cluster is unreliable, and it is not possible to take several machines from the machine pool as central nodes to manage the entire cluster. The overhead of these machines will become a major burden. In addition, the reliability cannot be guaranteed, and the machine will lead to failure; What FC needs is to inherit the on-demand payment feature and provide technology that can instantly form a network.

Multi function problem. The above three do not have a function awareness mechanism. For example, in DADI P2P, there may be a problem where a single node has too many mirrors that become hotspots, resulting in performance degradation. A more serious problem is that multi function pull is inherently unpredictable. When multiple functions simultaneously pull to full bandwidth, the services downloaded from the remote end at the same time will also be affected, such as code packages and third-party dependencies on downloading, resulting in availability issues for the entire system.

With these issues in mind, we will elaborate on the FaaSNet design scheme in the next section.

2. Design Plan - FaaSNet

According to the three mature P2P solutions in the industry mentioned above, the perception at the function level is not achieved, and the topology logic within the cluster is mostly in a fully connected network mode, which puts forward certain requirements for machine performance. These pre settings are not suitable for the system implementation of FC ECS. So we propose Function Tree (hereinafter referred to as FT), a function level and logical tree topology structure of function aware.

1) FaaSNet architecture

The gray part in the figure is the part where we have made system modifications to FaaSNet, while the other white modules continue the existing system architecture of FC. It is worth noting that all Function Trees in FaaSNet are managed on the FC scheduler; On each VM, there is a VM agent to cooperate with the scheduler in gRPC communication to receive upstream and downstream messages; Moreover, the VM agent is also responsible for obtaining and distributing mirror data upstream and downstream.

2) Decentralized function/mirror level self balancing tree topology

To address the above three issues, we first upgraded the topology to the function/mirror level, which can effectively reduce the number of network connections on each VM. Additionally, we designed a tree topology based on AVL tree. Next, we will elaborate on our Function Tree design in detail.

Function Tree

Decentralized self balanced binary tree topology

The design of FT is inspired by the AVL tree algorithm. Currently, there is no concept of node weight in FT, and all nodes are equivalent (including root nodes). When any node is added or removed from the tree, the entire tree will maintain a perfect balanced structure, ensuring that the absolute height difference between the left and right subtrees of any node does not exceed 1. When a node is added or deleted, FT will adjust the shape of the tree itself (left/right rotation) to achieve a balanced structure. As shown in the example of right rotation in the figure below, node 6 is about to be reclaimed, which results in an imbalance in the height of the left and right subtrees with node 1 as the parent node. A right rotation operation is required to achieve a balanced state. State 2 represents the final state after rotation, and node 2 becomes the new root node of the tree. Note: All nodes represent ECS machines in FC.

In FT, all nodes are equivalent, and the main responsibilities include: 1 Pull data from upstream nodes; 2. Distribute data to two downstream child nodes. (Note that in FT, we do not specify a root node. The only difference between a root node and other nodes is that its upstream is the source node, and the root node is not responsible for any metadata management. In the next section, we will introduce how to manage meta information.).

Overlap of multiple FTs on multiple peer nodes

There will inevitably be different functions under the same user on a peer node, so there will inevitably be situations where a peer node is located in multiple FTs. As shown in the above figure, there are three FTs in the instance that belong to func 0-2. However, due to the independent management of FT, even with overlapping transmissions, FT can help each node find the corresponding correct upper node.

In addition, we will limit the maximum number of functions that a machine can hold to achieve function awareness, further solving the problem of uncontrollable multi function pull-down data.

Discussion on the correctness of design

By integrating on FC, we can see that because all nodes in FT are equivalent, we do not need to rely on any central node;

The manager of topology logic does not exist in the cluster, but is maintained by the system component (scheduler) of FC and sent to each peer node through gRPC along with the operation request to create the container;

FT perfectly adapts to the high dynamism of FaaS workload, and any node of any size in the cluster will automatically update its form upon joining or leaving;

Networking with the coarser granularity of function and using the binary tree data structure to implement FT can greatly reduce the number of network connections on each peer node;

Using functions as isolation for networking can naturally implement function aware to improve the security and stability of the system.

3. Performance evaluation

In the experiment, we selected an image of the Alibaba Cloud database DAS application scenario, using Python as the base image. The container image had a pre decompression size of 700MB+and had 29 layers. We have selected the stress test section for interpretation, and please refer to the original paper for all test results. We compared Alibaba's DADI, Dragonfly technology, and Uber's open-source Kraken framework for the testing system.

1) Stress testing

The delay recorded in the pressure testing section is the average end-to-end cold start delay perceived by the user. Firstly, we can see that the mirror acceleration function can significantly improve end-to-end latency compared to traditional FC. However, as the concurrency increases, more machines simultaneously pull data from the central container registry, resulting in competition for network bandwidth and an increase in end-to-end latency (orange and purple bars). However, in FaaSNet, due to our decentralized design, there is only one root node pulling data from the source station and distributing it downwards, regardless of the level of concurrency pressure on the source station. Therefore, it has extremely high system scalability, and the average latency does not increase due to the increase of concurrency pressure.

At the end of the pressure testing section, we explored the performance of placing functions (multiple functions) with different images on the same VM. Here, we compared FC (DADI+P2P) and FaaSNet with mirror acceleration enabled and DADI P2P installed.

The vertical axis in the above figure represents the standardized end-to-end delay level. As the number of functions in different mirrors increases, DADI P2P has a larger number of layers and a smaller specification of each ECS in FC, which puts too much pressure on the bandwidth of each VM, resulting in performance degradation. The end-to-end delay has been extended to over 200%. However, due to establishing connections at the mirror level, the number of connections in FaaSNet is much lower than that in DADI P2P layer trees, so it can still maintain good performance. ​

summary

High scalability and fast image distribution speed can better unlock custom container image scenarios for FaaS service providers. FaaSNet utilizes a lightweight, decentralized, and self balancing Function Tree to avoid performance bottlenecks caused by central nodes, without introducing additional systematic overhead and fully reusing existing FC system components and architecture. FaaSNet can achieve real-time networking based on the dynamic nature of workloads to achieve function awareness, without the need for pre workload analysis and preprocessing.

The target scenario of FaaSNet is not limited to FaaS alone. In many cloud native scenarios, such as Kubernetes and Alibaba SAE, they can exert their efforts in dealing with sudden traffic to solve the pain points caused by too many cold starts that affect the user experience, fundamentally solving the problem of slow cold start of the container.

FaaSNet is the first domestic cloud vendor to publish a paper on accelerating container startup technology for responding to sudden traffic in serverless scenarios at an international top-level conference. We hope that this work can provide new opportunities for container based FaaS platforms, fully opening the door to embrace the container ecosystem and unlocking more application scenarios, such as machine learning, big data analysis, and other tasks.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us