Serverless Scheduling Papers Included by ACM SoCC

Recently, the innovative paper on Serverless scheduling written by Alibaba Cloud's functional computing product team was accepted by the ACM SoCC International Conference in the field of cloud computing.

Last year, Alibaba Cloud's functional computing team was the first to propose the decentralized fast image distribution technology under the FaaS scenario. The team's paper was accepted by USENIX ATC '21, a top conference in the field of computer systems, and was selected into the list of Class A international conferences recommended by the China Computer Association (CCF) (click to read for details); This year, Alibaba Cloud has made continuous breakthroughs in function computing: it has published a paper on scheduling algorithms based on function portraits and has been accepted by ACM SoCC, the premier conference of international cloud computing, to ensure that the utilization of function resources can be improved while achieving high performance and stability.

ACM Symposium on Cloud Computing (hereinafter referred to as SoCC) is an academic conference organized by the American Computer Association and focusing on cloud computing technology, and is the first conference of cloud computing. It brings together researchers, developers, users and practitioners who are interested in cloud computing. It is the only conference jointly sponsored by SIGMOD (Special Interest Group on Data Management) and SIGOPS (Special Interest Group on Operating Systems). This conference has flourished in recent years, aiming to gather scholars in the fields of database and computer systems, and jointly promote the research and development of cloud computing technology in industry.

The paper employed this time is Owl: Performance-Aware Scheduling for Resource-Efficent Function-as-a-Service Cloud.

This paper is inspired by Alibaba Cloud Serverless product function computing, which is Alibaba Cloud's Function-As-A-Service product. Alibaba Cloud function computing is an event-driven fully managed computing service. Through function calculation, you don't need to manage infrastructure such as servers, just write code and upload it. Function calculation will prepare computing resources for you, run your code in an elastic and reliable manner, and provide log query, performance monitoring, alarm and other functions. At this stage, it has covered event driven, audio and video processing, games, Internet of Things, new retail, AI and other actual business scenarios, and has served Alibaba Cloud, Gaode, Alipay, Taobao, CBU and other businesses or projects.

The figure above shows the architecture of a classic FaaS scheduling system. The scheduler loads different function instances to run on nodes in the cluster. Due to the large number of FaaS product functions, small function granularity and short execution time, the resource utilization of nodes is low. Simply scheduling more instances to the same node can improve resource utilization to some extent, but it also brings resource contention and performance degradation.

Aiming at this problem, the paper innovatively proposes a scheduling algorithm based on function portrait, which achieves better performance stability while improving resource utilization:

1. For functions that are frequently called, the scheduler will identify the performance of different function instances when they are co-located on the same node, so as to guide the scheduling of function instances;

2. For functions that are called at low frequency, the scheduler will count the actual resource consumption during the execution process to guide the scheduling of function instances. At the same time, the scheduler will monitor the execution delay of the function and alleviate it by means of isolation when the delay rises;

3. The scheduler also migrates idle instances from nodes with low utilization to nodes with high utilization to release idle nodes.

In order to evaluate the effect of the algorithm, the paper abstracts 10 functions according to the typical function load of the production environment, which cover different programming languages, resource consumption, execution time, and external dependencies. The list is as follows:

The experimental results show that the OWL scheduling algorithm can save 43.8% of resources under the scale of 100 nodes, while the function execution delay does not increase significantly:

The scheduling delay also did not increase significantly:

At present, OWL's function rendering capability has also been applied to the online environment of function computing, and has achieved good results. The paper selected by ACM SoCC is another innovation of Alibaba Cloud in the field of serverless scheduling.

Attached paper information

• Title of the thesis:

Owl: Performance-Aware Scheduling for Resource-Efficient Function-as-a-Service Cloud

• Authors: Tian Huangshi, Li Suyi, Wang Ao, Wang Wei, Wu Tianlong, Yang Haoran

• Paper overview:

In cloud computing, FaaS is a very popular product form, and mainstream cloud manufacturers have provided corresponding platforms. As a platform builder, we observed that the CPU and memory utilization of most function instances are not high, resulting in the low utilization of cluster nodes. A simple way is to place more function instances on the node, but this may lead to resource contention and performance degradation. In addition, the external dependency of the function may also lead to the performance degradation of the function. In this paper, we design an OWL scheduling system to solve these problems and achieve high resource utilization and performance stability. For functions that are called at low frequency, the scheduler will count the actual resource consumption during the execution process to guide the scheduling of function instances. At the same time, the scheduler will monitor the execution delay of the function and alleviate it by means of isolation when the delay rises; For functions that are frequently called, the scheduler will identify the performance of different function instances when they are co-located on the same node, so as to guide the scheduling of function instances. At the same time, the scheduler also migrates idle instances from nodes with low utilization to nodes with high utilization to release idle nodes. We implemented the OWL prototype system and constructed a set of test sets according to the load of the production environment. Experimental results show that OWL scheduling system can reduce resource consumption by 43.8% and effectively alleviate performance degradation.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us