Overcoming Industry Obstacles: Alibaba Cloud Releases Seven Technological Breakthroughs for Function Compute

This article discusses seven technological innovations and breakthroughs of FC, intending to accelerate the upgrades of modern application architectures.

Catch the replay of the Apsara Conference 2021 at this link!

By Aliware

The essence of Serverless is to realize the focus and freedom of business-layer development by hiding the underlying computing resources. However, as the upper layers become more abstract, the underlying implementation for cloud vendors becomes more complicated. Function Compute (FC) splits services into the granularity of functions, which will inevitably bring new challenges to development, O&M, and delivery. For example, how can we perform end-cloud joint debugging of functions? How can we realize the observability and debugging of functions? How can we optimize GB-level image cold start? These things that were not a problem in the granularity of services in the past have become obstacles to the implementation of enterprises' core production business with Serverless on a large scale.

At the Apsara Conference 2021, Ding Yu (Shutong), a researcher at Alibaba and General Manager of the Cloud-Native Application Platform of Alibaba Cloud Intelligence, unveiled seven major technological innovations and breakthroughs of FC. These innovations and breakthroughs are intended to accelerate the upgrades of modern application architectures.

1. Serverless Devs 2.0: First Desktop in the Industry, Supporting End-Cloud Joint Debugging and Multi-Environment Deployment

The Serverless developer platform Serverless Devs 2.0 was officially released nearly a year after it became open-source. Compared with version 1.0, Serverless Devs 2.0 has achieved all-around improvement in performance and user experience. Serverless Devs 2.0 has released the first desktop client Serverless Desktop in the industry. The desktop client has stronger enterprise-level service capability, finely designed with aesthetic feeling and pragmatism.

Serverless Devs is the industry's first platform for cloud-native full lifecycle management that supports mainstream Serverless services and frameworks. Serverless Devs is committed to creating a comprehensive service for developers to develop applications. Serverless Devs 2.0 proposes multi-mode debugging schemes. These schemes include bridging the gap between online and offline environments, end-cloud joint debugging scheme, the local debugging scheme for direct development state debugging locally, and the online/remote debugging schemes for cloud-end O&M state debugging. The new version adds multi-environment deployment capabilities. Serverless Devs 2.0 supports all-in-one deployment of more than 30 frameworks, including Django, Express, Koa, Egg, Flask, Zblog, and WordPress.

2. First Instance-Level Observability and Debugging in the Industry

An instance is the smallest atomic unit that can be scheduled in function resources. It is similar to the Pod of the container. Serverless highly abstracts heterogeneous basic resources. Therefore, the black box problem is the major obstacle to the large-scale implementation of Serverless. Similar products in the industry have not involved the concept of instance nor revealed CPU, memory, and other indicators in observability. However, observability is the eyes of developers. High availability is not possible without observability.

FC releases instance-level observability. This feature provides real-time monitoring and performance data collection of function instances and displays the data in a visualized way. Thus, developers have access to end-to-end monitoring and troubleshooting paths for function instances. Instance-level metrics allow you to view core metrics such as CPU and memory usage, instance network connection, and the number of requests within an instance. This makes the black box not so black. At the same time, FC supports observability and debugging by opening the login permissions to some instances.

3. First Release of Instance Reservation Strategy with Fixed Quantity, Timing, and Auto Scaling of Resource Usage in the Industry

The cold start of FC is affected by multiple factors, including the size of code and image, startup container, runtime initialization of language, process initialization, and execution logic. It depends on the two-way optimization of users and cloud service providers. Cloud service providers assign the most appropriate number of instances to each function and perform cold start optimization on the platform side. However, some online businesses are very sensitive to latency. Cloud service providers cannot replace users to perform deeper business optimization, such as streamlining code or dependencies, selecting programming language, initializing processes, and optimizing algorithms.

Similar products in the industry generally adopt the policy of reserving a fixed number of instances. In other words, users configure N concurrent values. Unless manually adjusted, N instances will not scale after they are allocated. This solution only solves the cold start delay of some business peaks but increases the O&M and resource costs significantly. It is not friendly to businesses with irregular peaks and troughs, such as red envelope promotion.

Therefore, FC is the first to grant the scheduling permission of some instance resources to users. It allows users to reserve an appropriate number of function instances using multi-dimensional instance reservation policies, such as fixed quantity, scheduled scaling, scaling by resource usage, and hybrid scaling. These features meet the demands of different scenarios, including relatively stable business curves (such as AI/ML scenarios), clear peak and trough periods (such as game entertainment, online education, and new retail), unpredictable burst traffic (such as e-commerce promotion and advertising), and mixed business (such as Web background and data processing.) This will reduce the impact of cold start on latency-sensitive businesses and achieve the ultimate goal of balancing flexibility and performance.

4. First to Provide GPU Instances in the Industry

FC provides two types of instances, elastic instances and performance instances, ranging from 128 MB to 3 GB. The isolation granularity is the finest in the entire cloud ecosystem and can achieve 100% resource utilization in universal scenarios. The range of performance instances includes 4 GB, 8 GB, 16 GB, and 32 GB. Its upper limit of resources is higher. It mainly applies to compute-intensive scenarios, such as audio and video processing, AI modeling, and enterprise-level Java applications. Various GPU manufacturers have launched a dedicated ASIC for video codec with the accelerated development of hardware in the dedicated field. For example, NVIDIA integrated a dedicated circuit of video encoding from the Kepler architecture and an integrated dedicated circuit of video decoding from the Fermi architecture.

FC officially launched GPU instances based on the Turning architecture, allowing Serverless developers to sink the workload of video encoding and decoding to GPU hardware acceleration. This accelerates the efficiency of video production and video transcoding significantly.

5. Deliver up to 20,000 Instances per Minute

The so-called Serverless does not mean that software applications can run without servers. Instead, users do not need to care about the status, resources (such as CPU, memory, disk, and network), and quantity of the underlying servers involved during the run time of applications. The computing resources required for the normal operation of software applications are dynamically provided by cloud service providers. However, users are still concerned about the resource delivery capability of providers and their capability to deal with access fluctuations caused by insufficient resources in burst traffic scenarios.

FC relies on the powerful cloud infrastructure service capabilities of Alibaba Cloud. FC achieves a maximum delivery of 20,000 instances per minute during peak business hours with the bare metal resource pool and ECS resource pool. This has improved the delivery capability of FC in the core businesses of customers.

6. Optimization of VPC Network Connection: From 10 Seconds to 200 Milliseconds

When users need to access resources in VPC, such as RDS/NAS, VPC networks need to be connected. FaaS products in the industry generally dynamically mount ENI to realize VPC connections. They create an ENI in the VPC and mount it to the machine that executes functions in the VPC. This solution allows users to link backend cloud services simply. However, the mounting of ENI generally takes more than ten seconds, which brings performance overhead in latency-sensitive business scenarios.

FC decouples computing and networks by achieving servitization of VPC gateways. Scaling compute nodes is no longer limited by the capability of mounting ENI. In this solution, the gateway service is responsible for ENI mounting and high availability and auto scaling of gateway nodes, while FC focuses on the scheduling of compute nodes. When the VPC network is finally connected, the cold start time of a function is reduced to 200 milliseconds.

7. GB-Level Image Startup: From Minute-Level to Second-Level

FC released the function deployment method of container images in August 2020. AWS released the support for container images from Lambda at the re:Invent 2020 event held in December. Some domestic enterprises also released container support for FaaS in June 2021. Cold start has always been a pain point for FaaS. The introduction of container images, which are dozens of times larger than code compression packages, has increased the latency caused by the cold start process.

FC innovatively invented Serverless Caching. It can realize the co-optimization of software and hardware and build a data-driven, intelligent, and efficient cache system based on the characteristics of different storage services. This improves the experience of Custom Container. So far, FC has optimized the image acceleration to a high level. We selected four public cases of FC (please see https://github.com/awesome-fc ) and adapted them to several large cloud service providers inside and outside of China for horizontal comparison. We called the images above every three hours and repeated the process several times.

Experiments show that FC has reduced the cold start time from minute-level to second-level in the scenario of GB-level image cold start.

8. One Step Ahead, Aiming for the Distant Future

In 2009, Berkeley put forward six predictions on the (then) emerging cloud computing. The predictions included the possibility of pay-as-you-go services and a large increase in physical hardware utilization. Over the past 12 years, all the predictions have come true.

In 2019, Berkeley once again predicted that Serverless computing would become the default computing paradigm in the cloud era and replace the Serverful (traditional cloud) computing model.

Referring to the 12-year development of cloud computing, Serverless is in the third year of Berkeley's prediction, which is 1/4. The past three years have seen the transformation of the cloud from envisioning to the implementation of Serverless First and large-scale investment from cloud service providers. Then, enterprise users made full use of the advantages of Serverless to optimize the existing architecture and objectively deal with the obstacles to the large-scale implementation of Serverless in core businesses. Today, technological innovation and breakthroughs help the industry resolve the common pain points. This requires courage to take the first step and the resolve to aim for the future.

Community

Overcoming Industry Obstacles: Alibaba Cloud Releases Seven Technological Breakthroughs for Function Compute

1. Serverless Devs 2.0: First Desktop in the Industry, Supporting End-Cloud Joint Debugging and Multi-Environment Deployment

2. First Instance-Level Observability and Debugging in the Industry

3. First Release of Instance Reservation Strategy with Fixed Quantity, Timing, and Auto Scaling of Resource Usage in the Industry

4. First to Provide GPU Instances in the Industry

5. Deliver up to 20,000 Instances per Minute

6. Optimization of VPC Network Connection: From 10 Seconds to 200 Milliseconds

7. GB-Level Image Startup: From Minute-Level to Second-Level

8. One Step Ahead, Aiming for the Distant Future

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

Function Compute

GPU(Elastic GPU Service)

Architecture and Structure Design

PrivateLink