Recently, an innovative paper on serverless scheduling written by the Alibaba Cloud Function Compute Product Team was included in the ACM SoCC International Conference.
Last year, the Alibaba Cloud Function Compute Team proposed a decentralized fast image distribution technology in the FaaS scenario. The paper was included by the top conference USENIX ATC'21 in the field of computer systems and was selected for the list of Class A international conferences recommended by the China Computer Federation (CCF). This year, Alibaba Cloud Function Compute (FC) has made continuous breakthroughs. Its scheduling algorithm paper based on function profile was included by ACM SoCC, the premier international conference on cloud computing. The paper ensures high-performance stability while improving the utilization of function resources.
ACM Symposium on Cloud Computing (SoCC) is an academic conference sponsored by the American Computer Association, focusing on cloud computing technology, which is the premier conference for cloud computing. It brings together researchers, developers, users, and practitioners interested in cloud computing. It is the only conference jointly sponsored by the Special Interest Group on Management Of Data (SIGMOD) and the Special Interest Group on Operating Systems (SIGOPS). This conference has flourished in recent years, aiming to gather scholars in the fields of database and computer systems to jointly promote the research and development of cloud computing technology in the industry.
The paper included this time was entitled Owl: Performance-Aware Scheduling for Resource-Efficient Function-as-a-Service Cloud.
This paper was inspired by Function Compute from Alibaba Cloud's Serverless products. Function Compute is a Function-as-a-Service product of Alibaba Cloud. Alibaba Cloud Function Compute (FC) is a fully managed event-driven computing service. Function Compute allows users to focus on writing and uploading code without managing infrastructure (such as servers). Function Compute prepares computing resources for you, runs your code flexibly and reliably, and provides functions (such as log query, performance monitoring, and alerting). At this stage, it covers actual business scenarios (such as event-driven, audio and video processing, games, IoT, new retail, and AI). It serves multiple businesses or projects (such as Alibaba Cloud, Amap, Alipay, Taobao, and CBU).
The preceding figure shows a classic FaaS scheduling system architecture. The scheduler load schedules different function instances onto nodes in the cluster. Due to a large number of functions, small function granularity, and short execution time of FaaS products, the resource utilization of nodes is low. Simply scheduling more instances to the same node can improve resource utilization to a certain extent, but it arouses resource competition and performance degradation.
Given this problem, the paper innovatively proposes a scheduling algorithm based on function profile, achieving good performance stability while improving resource utilization.
The paper abstracts ten functions according to the typical function load of the production environment to evaluate the effect of the algorithm, which covers different programming languages, resource consumption, execution duration, and external dependencies:
The experimental results show that the OWL scheduling algorithm can save 43.8% of resources at a scale of 100 nodes while the function execution latency does not increase significantly.
There is no significant increase in scheduling latency.
The function profile capability of OWL has been applied to Function Compute online environments with good results. Being included in ACM SoCC marks another innovation for Alibaba Cloud in the field of serverless scheduling.
Title of the Paper:
Owl: Performance-Aware Scheduling for Resource-Efficient Function-as-a-Service Cloud
Authors: Tian Huangshi, Li Suyi, Wang Ao, Wang Wei, Wu Tianlong, Yang Haoran
Abstract: Function-as-a-Service (FaaS) is gaining increasing popularity in cloud computing. All major cloud providers have FaaS platforms. It commences with our observation that memory and CPU are under-utilized in most FaaS sandboxes. A natural solution is to overcommit VM resources when allocating sandboxes, whereas the ensuing contention may cause performance degradation and compromise user experience. To complicate matters, the degradation in FaaS can arise from external factors, such as failed dependencies of user functions.
We design Owl to achieve both high utilization and performance stability. It introduces a customizable rule system for users to specify their toleration of degradation, and overcommits resources with a dual approach. (1) For lessinvoked functions, it allocates resources to the sandboxes with usage-based heuristic, keeps monitoring their performance, and remedies any detected degradation. It differentiates whether a degraded sandbox is affected externally by separating a contention-free environment and migrating the affected sandbox into there as a comparison baseline. (2) For frequently-invoked functions, Owl profiles the interference patterns among collocated sandboxes and place the sandboxes under the guidance of profiles. The collocation profiling is designed to tackle the constraints that profiling has to be conducted in production. Owl further consolidates idle sandboxes to reduce resource waste. We prototype Owl in our production system and implement a representative benchmark suite to evaluate it. The results demonstrate that the prototype could reduce VM cost by 43.80% and effectively mitigate latency degradation, with negligible overhead incurred.
Serverless Devs Enters the CNCF Sandbox and Becomes the First Selected Serverless Tool Project!
73 posts | 6 followersFollow
ApsaraDB - January 17, 2022
Alibaba Clouder - August 10, 2020
Alibaba Developer - September 23, 2020
ApsaraDB - November 28, 2019
Alibaba Developer - January 7, 2019
Alibaba Cloud ECS - April 26, 2019
73 posts | 6 followersFollow
Alibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.Learn More
High Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.Learn More
Deploy custom Alibaba Cloud solutions for business-critical scenarios with Quick Start templates.Learn More
Super Computing Service provides ultimate computing performance and parallel computing cluster services for high-performance computing through high-speed RDMA network and heterogeneous accelerators such as GPU.Learn More
More Posts by Alibaba Cloud Serverless