By Xu Xiaobin, a Senior Middleware Expert at Alibaba and author of Maven in Practice. He is now responsible for the serverless R&D and O&M platform in the Alibaba Group. He previously maintained the Maven central repository.
In an article titled "The Sound and the Fury of Serverless", I used the metaphor to describe the development of serverless systems in the industry. Serverless is not straightforward: everyone talks about it, nobody really knows how to do it, and everyone thinks everyone else is doing it, so everyone claims they are doing it.
Although it's been half a year since I wrote that article, I don't think the situation has changed much. Many developers and managers have a one-sided or even erroneous understanding of serverless technology. If new technologies are launched without sufficient knowledge of application architecture evolution, cloud infrastructure capabilities, and risks, their business value cannot be realized, efforts are wasted, and technical risks are introduced.
In this article, I will attempt to analyze the charm and core concepts of serverless systems from the perspective of application architecture and summarize the challenges that must be faced to implement serverless systems.
To help you better understand serverless systems, let's look back at the evolution of application architectures. More than a decade ago, in mainstream application architectures, monolith applications were deployed on a single server with a database. Under this architecture, O&M personnel carefully maintained the server to ensure service availability. As businesses grow in size, this simplest architecture soon confronts two problems. First, only one server is available. If the server fails due to hardware damage or other such problems the whole service becomes unavailable. Second, as the business volume grows, a single server's resources are soon unable to handle all the traffic. The most direct method to solve these two problems is to add a Server Load Balancer (SLB) at the traffic entry and deploy a single application on multiple servers. In this way, the server's single point of failure (SPOF) risk is solved, and monolith applications can be scaled out.
As the business continues to grow, more R&D personnel must be hired to develop features for monolith applications. At this time, the code in monolith applications does not have clear physical boundaries, and soon, different parts of the code conflict with each other. This requires manual coordination and a large number of conflict merge operations, sharply decreasing R&D efficiency. In this case, monolith applications are split into microservice applications that are independently developed, tested, and deployed. Services communicate with each other through APIs over protocols such as HTTP, GRPC, and DUBBO. Microservices split based on Bounded Context in the domain-driven design mode greatly improve medium and large teams' R&D efficiency. To learn more about Bounded Context, consult books about domain-driven design.
In the evolution of monolithic applications to microservices, the distributed architecture is the default option from the physical perspective. In this case, architects have to meet new challenges produced by distributed architectures. In this process, distributed services, frameworks, and distributed tracking systems are generally used first, for example, the cache service Redis, the configuration service Application Configuration Management (ACM), the state coordination service ZooKeeper, the message service Kafka, and communication frameworks such as GRPC or DUBBO. In addition to the challenges of distributed environments, the microservice model gives rise to new O&M challenges. Previously, developers only needed to maintain one application, but now they need to maintain ten or more applications. Therefore, the workloads involved in security patch upgrades, capacity assessments, and troubleshooting has increased exponentially. As a result, application distribution standards, lifecycle standards, observation standards, and auto scaling are increasingly important.
Now let's talk about the term "cloud-native." In simple words, whether architecture is cloud-native depends on whether the architecture evolved in the cloud. "Evolving in the cloud" is not simply about using services at the infrastructure as a service (IaaS) layer of the cloud, such as Elastic Compute Service (ECS), Object Storage Service (OSS), and other basic computing and storage services. Rather, it means using distributed services, such as Redis and Kafka, in the cloud. These services directly affect the business architecture. As mentioned earlier, distributed services are necessary for a microservice architecture. Originally, we developed such services ourselves or maintained them based on open-source versions. In the cloud-native era, businesses directly use cloud services.
Another two technologies that need to be mentioned are Docker and Kubernetes. Docker defines application distribution standards. Applications written in Spring Boot and Node.js are all distributed by images. Based on Docker technology, Kubernetes defines a unified standard for applications throughout their life cycles, covering startup, launch, health checks, and deprecation. With application distribution standards and lifecycle standards, the cloud provides standard web app services, including application version management, release, post-release observation, and self-recovery. For example, for stateless applications, an underlying physical node's failure does not affect R&D at all. This is because the web app service automatically switches the application containers from the faulty physical node to a new physical node based on the application lifecycle. Cloud-native provides even greater advantages.
On this basis, the web app service detects runtime data for applications, such as business traffic concurrency, CPU load, and memory usage. Therefore, auto scaling rules are configured for businesses based on these metrics. The platform executes these rules to increase or decrease the number of containers based on business traffic. This is the most basic implementation of auto scaling. This helps you avoid resource constraints during your business's off-peak hours, reduce costs, and improve O&M efficiency.
As the architecture evolves, R&D and O&M personnel gradually shift their focus from physical machines and want the machines to be managed by the platform system without human intervention. This is a simple understanding of serverless.
As we all know, serverless does not actually mean the disappearance of servers. More precisely, serverless means that developers do not need to care about servers. This is just like the modern programming languages Java and Python. With them, developers do not need to allocate and release memory manually. However, the memory still exists but is managed by garbage collectors. A platform that helps you manage your servers is called a serverless platform, which is similar to calling Java and Python memoryless languages.
In today's cloud era, a narrow understanding of serverless as simply not caring about servers is not enough. In addition to the basic computing, network, and storage resources contained in the servers, cloud resources also include various types of higher-level resources, such as databases, caches, and messages.
In February 2019, the University of California, Berkeley published a paper titled Cloud Programming Simplified: A Berkeley View on Serverless Computing. This paper provides a very clear and vivid metaphor:
In the context of the cloud, serverful computing is like programming in a low-level assembly language, while serverless computing is like programming in a high-level language such as Python. Take the simple expression c = a + b as an example. If you describe it in an assembly language, you must first select several registers, load the values into registers, perform mathematical calculations, and then store the results. This is like today's serverful computing in the cloud environment. Developers first need to allocate or find available resources, then load code and data, perform calculations, store the calculation results, and finally release the resources.
The paper describes that serverful computing is still the most common way of using the cloud today, but it should not be how we use the cloud in the future. In my opinion, the serverless vision should be stated as "Write locally, compile to the cloud". This means that code only cares about the business logic, and resources are managed by tools and the cloud. Now, we have a general but abstract knowledge of serverless. Let's understand the features of serverless platforms.
Managing one or two servers may not be a hassle. However, managing thousands or even tens of thousands of servers is not that simple. Any server may fail, so the serverless platform must be able to automatically detect the fault, remove the faulty instance, and upgrade security patches for the operating system without affecting businesses. In addition, it must be interconnected with the logging and monitoring systems by default, and automatically configure security policies for the system to avoid risks. When resources are insufficient, the serverless platform must be able to allocate resources and run relevant code and configurations automatically.
Today's internet applications are designed to be scalable. When business traffic peaks and valleys are apparent, or when a business requires extra resources for a short-term event (such as for marketing activities), the serverless platform implements auto scaling in a timely, stable manner. To implement auto scaling, the platform needs to have powerful resource scheduling capabilities and a keen perception of application metrics such as load and concurrency.
In serverful mode, cloud resources are billed by the number of resources reserved for a business instead of the resources actually used. For example, assume a user buys three ECS instances on the cloud. No matter how much CPU and memory resources of the three instances are actually used, the user has to pay for the total cost of the three ECS instances. In serverless mode, the user is charged by the resources actually used. For example, a request might actually use a 1-core 2-GB resource for 100 ms. In this case, the user only needs to pay for the unit price of this specification multiplied by the duration of use. Similarly, users who use a serverless database only need to pay for the resources actually consumed by the query and the resources used for data storage.
In a serverless architecture, code often uses backend services to separate data and state management from the code. In a function as a service (FaaS) architecture, the code runtime is also managed by the platform. This means, for the same application, much less code is required in serverless mode than in serverful mode. This accelerates both distribution and startup. The serverless platform also provides mature features such as code building, release, and version switching to accelerate delivery.
Despite the many advantages of serverless platforms, it is not easy to implement serverless platforms on a large scale in mainstream scenarios. Below are the key challenges of serverless platform implementation.
To enable auto scaling and pay-as-you-go, the platform must be able to scale out business instances in seconds or even milliseconds. This is a challenge to the infrastructure and imposes high requirements on businesses, especially large business applications. If it takes 10 minutes to distribute and start an application, auto scaling cannot promptly respond to changes in business traffic. We can solve this problem in many ways, for example, by dividing a large application into microservices. However, FaaS architectures split applications into finer-grained functions by using a brand new application architecture. This makes applications more lightweight but requires extensive business transformations. The modules introduced in Java 9 and the native image technology of GraalVM can help reduce the size of Java applications and shorten the startup time.
Once the instance of a serverless application or function is scaled out in seconds, or even in milliseconds, the related infrastructure soon faces great pressure. The most common infrastructure involves service discovery and log monitoring systems. Originally, all instances in a cluster changed several times per hour, but now they change several times per second. If these systems cannot keep up with the speed at which instances change, the user experience will be hit significantly. For example, the container instances of a business may be scaled out in 2 seconds, but the change may take 10 seconds to be synchronized to the service discovery system.
The serverless platform relies on standardized application lifecycles to implement features such as automatic container removal and application self-recovery. In the standard container- and Kubernetes-based system, the platform only controls the container lifecycle. Therefore, the business process lifecycle must be consistent with the container lifecycle, including the startup, stop, readiness probe, and liveness probe specifications. Although many businesses have been transformed into container-based businesses, the containers include both the main business process and many auxiliary processes. This leads to lifecycle inconsistencies between business processes and containers.
To troubleshoot a fault in the production environment in serverful mode, users naturally log on to the server to run Linux commands, search for logs, analyze processes, or even dump the memory. In serverless mode, users do not need to worry about servers and cannot see the server by default. But what if the system has a fault and the platform cannot automatically recover from the fault? In this case, users need a variety of troubleshooting tools to observe traffic, system metrics, and dependent services in order to quickly and accurately diagnose faults. If the observability in serverless mode is insufficient, users will not feel secure.
When deploying their own applications for the first time, almost all developers deploy them on a single server or at a single IP address. This is a hard habit to break. Today, many applications are still stateful and instances cannot be automatically changed. In addition, many change and deployment operations involve IP addresses. For example, beta testing is run on a specific machine, and many release systems do not change instances during rolling updates. Then, the related O&M systems are built in consideration of these features. When a serverless platform is gradually implemented, developers have to change their thinking to gradually adapt to the fact that the IP address may change at any time. As a result, developers need to operate and maintain systems based on the considerations of service versions and traffic.
Let's return to the wonderful metaphor in the paper "Cloud Programming Simplified: A Berkeley View on Serverless Computing": Today, we use the cloud just as we used to write code in assembly languages. In my opinion, this situation will improve in the future. Ideally, the packages delivered by users to the platform for deployment should contain only the code that the users use to describe their businesses. Although this is still not actually the case, many technologies, such as Service Mesh, Dapr.io, and cloudsteate.io, separate logic that is irrelevant to the business but is necessary for a distributed architecture from the business runtime and entrusts logic management to the platform. This trend has become more apparent over the past year. Bilgin Ibryam gave a good summary of this situation in his article Multi-Runtime Microservices Architecture.
As mentioned in this article, the serverless infrastructure development imposes new requirements on the application architecture, continuous delivery, service governance, O&M, and monitoring. Serverless infrastructure also places higher responsiveness requirements on lower-level infrastructures such as computing and storage networks. Therefore, serverless represents a comprehensive technological evolution that involves the application, platform, and infrastructure levels.
Flexible and Efficient Cloud-native Cluster Management Experience with Kubernetes
494 posts | 48 followers
FollowAlibaba Developer - March 3, 2022
Alibaba Clouder - May 27, 2019
Alibaba Clouder - September 24, 2020
Alibaba Cloud MaxCompute - May 5, 2019
Alibaba Cloud Serverless - April 7, 2022
Alibaba Developer - October 13, 2020
494 posts | 48 followers
FollowMSE provides a fully managed registration and configuration center, and gateway and microservices governance capabilities.
Learn MoreVisualization, O&M-free orchestration, and Coordination of Stateful Application Scenarios
Learn MoreServerless Application Engine (SAE) is the world's first application-oriented serverless PaaS, providing a cost-effective and highly efficient one-stop application hosting solution.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreMore Posts by Alibaba Cloud Native Community