Exploration and Practice of Serverless Elasticity in the Cloud Native System

The advent of the era of Serverless

As the name implies, Serverless is a "serverless" architecture, which shields the various operation and maintenance complexities of servers, allowing developers to devote more energy to the design and implementation of business logic. Under the Serverless architecture, developers only need to focus on the development of upper layer application logic, while complex server-related operations such as resource application, environment setup, load balancing, and scaling are maintained by the platform. In the white paper on cloud native architecture, the features of Serverless are summarized as follows:

• Fully hosted computing services, where customers only need to write code to build applications, without paying attention to the development, operation and maintenance, security, and high availability of homogeneous and burdensome infrastructure based on servers;

• Universality, capable of supporting all important types of applications on the cloud;

• Automatic elastic scaling eliminates the need for users to plan capacity in advance for resource usage;

• Pay-as-you-go billing allows enterprises to effectively reduce their usage costs without paying for idle resources.

Looking back on the entire development process of Serverless, we can see that during the period from the first introduction of the concept of Serverless in 2012 to the launch of Lambda cloud products by AWS, people's attention to Serverless has experienced explosive growth, and the expectation and imagination of serverless has gradually ignited the entire industry. However, the process of promoting and implementing Serverless production is not optimistic, There are gaps in the process of serverless concept and practical production, challenging people's inherent use experience and habits.

Alibaba Cloud firmly believes that Serverless will be the definitive development direction after cloud native. It has successively launched multiple cloud products such as FC and SAE to cover different fields and different types of application loads to use Serverless technology, and is continuously promoting the popularization and development of the entire Serverless concept.

In terms of the current overall market structure of Serverless, Alibaba Cloud has achieved the first level of Serverless product capability in China, leading the world. In last year's Forrester Magic Quadrant, it can be clearly seen that Alibaba Cloud has been on par with AWS in the field of Serverless. At the same time, Alibaba Cloud's Serverless users account for the first place in China, In the 2020 China Cloud Native User Survey Report, the proportion of Alibaba Cloud Serverless users has reached 66%. However, the survey on the adoption of Serverless technology indicates that more and more developers and enterprise users have applied or will apply Serverless technology to core businesses.

Serverless elastic exploration

As one of the core capabilities of the cloud, resilience concerns the contradiction between capacity planning and the actual cluster load. By comparing the two charts, it can be seen that if resources are arranged in a pre planned manner, there will be resource waste or insufficient resources due to the mismatch between the amount of resource preparation and the amount of resource demand, leading to excessive costs and even business damage. We expect extreme resilience, The prepared resources almost match the actual required resources, making the overall resource utilization rate of the application high, and the cost increases or decreases with the increase or decrease of the business. At the same time, there is no situation where the application availability is affected due to capacity issues. This is the value of elasticity.

Resilience can be divided into scalability and fault tolerance in terms of implementation. Scalability means that underlying resources can have a certain degree of adaptability based on changes in indicators, while fault tolerance refers to ensuring that applications or instances in a service are in a healthy state through elastic self-healing. The value benefit of the above capabilities lies in reducing costs while improving application availability. On the one hand, resource usage fits the actual consumption of applications, and on the other hand, it improves peak application availability, thereby flexibly adapting to the continuous development and changes of the market.

The following will describe and analyze the three currently popular elastic expansion modes.

The first is IaaS elastic scaling, which represents the elastic scaling of cloud servers from various cloud manufacturers. For example, Alibaba Cloud ESS can trigger corresponding ECS increase or decrease operations by configuring cloud monitoring alert rules. It also supports dynamic increase or decrease of Slb back-end servers and Rds whitelists to ensure availability, and achieves elastic self-healing capabilities through health checks. ESS defines the concept of a scaling group, which is the basic unit of elastic scaling. It is a collection of ECS instances in the same application scenario and associated with Slb and Rds. It also supports multiple scaling rules, such as simple rules, progress rules, target tracking rules, and prediction rules. The user's usage process includes creating a scaling group and scaling configuration, creating scaling rules, and monitoring and viewing the elastic execution.

Kubernetes elastic scaling, which focuses mainly on horizontal elastic hpa, is represented by K8s and its corresponding managed cloud products, such as Alibaba Cloud container services. As an application oriented infrastructure and platform for platform operation and maintenance, K8s provides built-in capabilities mainly around container level management and orchestration, while elastic capabilities focus on dynamic horizontal scaling of underlying Pods, K8s hpa polls the monitoring data of Pod and compares it with the target expected value. It generates the desired number of replicas through real-time algorithm calculation, and then performs an increase or decrease operation on the number of replicas of Workload. Users need to create and configure corresponding indicator sources, elasticity rules, and corresponding Workload in actual use, and can view the implementation of elasticity through events.

Finally, let's introduce application portrait elastic scaling, which is mainly used within Internet companies, such as Alibaba ASI capacity platform. The capacity platform provides capacity prediction services and capacity change decision services, guides underlying capacity change components such as AHPA/VPA to achieve capacity elastic scaling, and corrects the capacity profile based on the elastic results. Realize elastic scaling capability with portrait driving as the main tool and indicator driving as the auxiliary tool. Reduce elastic scaling risk through early scaling and real-time correction. The entire elastic scaling process utilizes ODPS and machine learning capabilities to process data such as instance monitoring and generate application portraits, such as benchmark portraits, elastic portraits, and large promotion portraits. It also utilizes a capacity platform to complete operations such as image injection, change control, and fault fusing. The user usage process is application access, generating corresponding capacity portraits based on historical data/experience, real-time monitoring of indicator correction portraits, and monitoring and viewing of elastic execution.

From the comparison, it can be seen that the elastic scaling function modes of each product are basically the same in abstract terms, and are composed of trigger sources, elastic decisions, and trigger actions. The trigger sources generally rely on external monitoring systems to collect and process node indicators and application indicators. Elastic decisions are generally based on periodic polling and algorithmic decisions, and some are based on historical data analysis and prediction, as well as user-defined timing strategies, The trigger action is to expand and shrink the instance horizontally, and provide change records and external notifications. Based on this, each product strives to enhance the competitiveness of scenario richness, efficiency, and stability, and improves the transparency of the elastic system through observable capabilities, facilitating problem troubleshooting and guiding elastic optimization, while improving the user experience and stickiness.

There are also certain differences in the elastic scaling models of various products. For IaaS elastic scaling, as an old brand elastic scaling capability, it has a long settling time, powerful and rich functions, and the capabilities among cloud vendors tend to be homogeneous. Elastic efficiency is limited compared to containers and strongly binds their underlying Iaas resources. As an open source product, Kubernetes continuously optimizes iterative resilience and best practices through community efforts, which is more in line with the demands of most development, operation, and maintenance personnel. Highly abstract elastic behavior and APIs, but they are not scalable enough to support custom requirements. Application portrait elastic scaling has internal characteristics of the group. It is designed based on the current application situation and elastic demands of the group, and focuses more on resource pool budget cost optimization, volume reduction risk, complexity, and other pain points. Not easy to copy and expand, especially for external small and medium-sized customers.

From the final goal, it can be seen that the direction of public cloud and internet enterprises is different:

• Internet companies often rely on capacity portraits for early elastic expansion based on real-time correction of capacity data calculated by Metrics due to significant traffic characteristics of their internal applications, high dependency on application startup, and slow speed, as well as organizational demands for overall resource pool capacity and water level, inventory financial management, and offline mixing, The goal is to make the capacity portrait accurate enough to achieve the expected resource utilization rate.

• Public cloud vendors serve external customers, provide more general and pervasive capabilities, and meet the differentiated needs of different users through scalability. Especially in the Serverless scenario, more emphasis is placed on the ability of applications to cope with sudden traffic. The goal is to achieve near on-demand use of application resources and availability of services throughout the process through indicator monitoring in conjunction with extreme flexibility, without the need for capacity planning.

Serverless elastic landing

As the best practice of cloud computing, the direction of cloud original development, and the future evolution trend, Serverless's core value lies in rapid delivery, intelligent flexibility, and lower cost.

In the context of the times, SAE emerged as the times require. SAE is an application-oriented Serverless PaaS platform that supports mainstream development frameworks such as Spring Cloud and Dubbo. Users can directly deploy applications to SAE through zero code transformation, and use them on demand and on a pay-as-you-go basis. This can fully leverage the advantages of Serverless to save idle resource costs for customers, while experiencing a fully hosted, O&M free approach, Users only need to focus on core business development, while application lifecycle management, microservice management, logging, monitoring, and other functions are completed by SAE.

The competitiveness of elasticity mainly lies in the competitiveness of scenario richness, efficiency, and stability. Let's talk about SAE's optimization of elasticity efficiency first.

Through data statistics and visual analysis of the entire life cycle of SAE applications, including scheduling, init container creation, pulling user images, creating user containers, and starting user containers&the application stages, the schematic diagram simplifies the proportion of time spent. We can see that the entire application lifecycle is focused on scheduling, pulling user images, and cold starting the application. For the scheduling phase, the main time consuming factor is that SAE currently performs the user VPC operation. Due to the strong coupling of this step to scheduling, it takes a long time, and there are situations such as creating a long tail timeout and failing to retry, resulting in a long overall time consuming scheduling link.

The resulting question is whether scheduling speed can be optimized? Can I skip the scheduling phase? For pulling user images, it includes the duration of pulling and decompressing images, especially in the case of large capacity image deployments. The idea of optimization lies in whether pulling images can optimize the use of cache, and whether decompressing images can be optimized. For application cold start, SAE has a large number of individual and microservice JAVA applications. JAVA type applications often rely heavily on startup, have slow loading configurations, and have a long initialization process, resulting in cold start speeds often reaching the level of minutes. The direction of optimization lies in whether the cold start process can be avoided and users can be as insensitive as possible, with no modifications to the application.

First of all, SAE adopted the in situ upgrade capability. Initially, SAE used the K8s native deployment rolling upgrade strategy for the release process. It would first create a new version of Pod, and then destroy the old version of Pod for upgrade. The so-called in situ upgrade refers to only updating one or more container versions in the Pod without affecting the upgrade of the entire Pod object or other containers. The principle is to upgrade the container in place through the K8s patch capability, and to achieve lossless traffic during the upgrade process through the K8s readyGates capability.

Upgrading in situ brings a lot of value to SAE, the most important of which is to avoid rescheduling and rebuilding Sidecar containers (ARMS, SLS, AHAS), making the entire deployment time from consuming the entire Pod lifecycle to just pulling and creating business containers. At the same time, because there is no need for adjustment, new images can be cached on the node in advance, improving elastic efficiency. SAE uses the Cloneset provided by the Alibaba Open Source Openkruise project as a new application load, and with its in situ upgrade capability, the overall elastic efficiency is increased by 42%.

At the same time, SAE adopts the image warm-up capability, which includes two types of warm-up: warm-up before scheduling. SAE will cache the general basic image across nodes to avoid frequent pulling from the remote end. At the same time, for batch scenarios, preheating during scheduling is supported. With Cloneset's ability to upgrade in place, the node distribution of instances can be sensed during the upgrade process. This allows for image pre fetching of the nodes where the subsequent batches of instances reside while deploying the new version of images in the first batch, thereby achieving parallel scheduling and pulling user images. Through this technology, SAE's elastic efficiency has been improved by 30%.

The optimization point just mentioned is to pull the image portion. For decompressing images, traditional container operations require downloading and unpacking the full amount of image data. However, container startup may only use some of the content, resulting in a long container startup time. SAE automatically converts the original standard image format into an accelerated image that supports random reading through image acceleration technology, which can achieve full download free and online decompression of image data, greatly improving application distribution efficiency. At the same time, using the P2P distribution capabilities provided by Acree can also effectively reduce image distribution time.

For the pain points of slow cold startup of Java applications, SAE and Dragonwell 11 provide an enhanced AppCDS startup acceleration strategy. AppCDS, or Application Class Data Sharing, uses this technology to obtain the Classlist of application startup and dump the shared class files therein. When the application restarts, the shared files can be used to start the application, effectively reducing cold startup time. The deployment scenario mapped to SAE will generate a corresponding cache file in the shared NAS after the application is started, and the cache file can be used for startup during the next release process. Overall cold start efficiency increased by 45%.

In addition to optimizing the efficiency of the entire application lifecycle, SAE has also optimized elastic scaling. The entire elastic scaling process includes three parts: elastic index acquisition, index decision-making, and performing elastic scaling operations. For elastic indicator acquisition, the basic monitoring indicator data has reached the second level of acquisition, while for the seven layer application monitoring indicators, SAE is planning to adopt a transparent traffic interception scheme to ensure the real-time performance of indicator acquisition. In the elastic decision-making stage, the elastic component enables multiple queues for concurrent Reconcile, and monitors queue accumulation and latency in real-time.

SAE elastic scaling includes a powerful indicator matrix, rich policy configurations, complete notification and alarm mechanisms, and comprehensive observability. It supports multiple data sources: native MetricsServer, MetricsAdapter, Prometheus, cloud products SLS, CMS, SLB, and external gateway routes. It supports multiple indicator types: CPU, MEM, QPS, RT, TCP connections, incoming and outgoing bytes, disk usage, and Java threads, There are also custom metrics for GC counts. After capturing and preprocessing indicators, you can customize and configure elastic policies to adapt to specific scenarios of the application: fast scaling, fast scaling, slow scaling, only scaling without scaling, only scaling without scaling, DRYRUN, adaptive scaling, and so on.

At the same time, more refined elastic parameter configurations can be performed, such as instance upper and lower limits, indicator intervals, step ratio ranges, cooling and warm-up times, indicator collection cycles, aggregation logic, and CORN expressions. Event driven capabilities will also be supported in the future. After elastic triggering, corresponding expansion and contraction operations will be carried out, and traffic will be guaranteed to be undamaged through stream switching. Moreover, users can be informed in real-time through improved notification and alarm capabilities (stapling, webhook, phone, email, and SMS). Elastic scaling provides a comprehensive and observable ability to clearly display flexible decision time and decision context, and enables traceability of instance status and monitoring of instance SLAs.

SAE's flexibility also has corresponding competitiveness in terms of scenario richness. Here, we focus on the four scenarios currently supported by SAE:

• Timing elasticity: Configured when the application traffic load cycle is known, the number of application instances can be scaled regularly based on time, week, and date cycles. For example, maintaining 10 instances for daytime traffic during the period from 8:00 a.m. to 8:00 p.m. can maintain 2 instances or even shrink to 0 in the rest of the time due to low traffic. It is suitable for application scenarios with periodic resource utilization rates, and is mostly used in securities, medical, government, education, and other industries.

• Indicator elasticity: The desired monitoring indicator rules can be configured, and the indicators applied by SAE during the meeting are stable within the configured indicator rules. By default, the mode of rapid expansion and slow contraction is adopted to ensure stability. For example, set the target value of the application's CPU indicator to 60%, QPS to 1000, and the range of instances is 2-50. This application scenario is suitable for sudden and typical periodic traffic, and is mostly used in industries such as the Internet, games, and social platforms.

• Hybrid elasticity: Combining timing elasticity with indicator elasticity allows you to configure indicator rules for different times, weeks, and dates, making it more flexible to meet the needs of complex scenarios. For example, during the period from 8:00 a.m. to 8:00 p.m., the CPU indicator target value is set to 60%, and the instance count range is 10-50. For the rest of the time, the instance count range is reduced to 2-5. This is suitable for application scenarios that have both periodic resource utilization rates and sudden and typical periodic traffic. It is mostly used in industries such as the Internet, education, and catering.

• Adaptive Resilience: SAE has optimized for traffic surge scenarios, using the traffic surge window to calculate whether the current indicator has a traffic surge problem at this time, and will add some redundancy when calculating instances required for expansion based on the intensity of the traffic surge. In surge mode, capacity reduction is not allowed.

Stability is a very important part of the process of building SAE's elastic capabilities. Ensuring that user applications scale up and down according to expected behavior during the elastic process, and ensuring the availability of the entire process, is the focus of attention. SAE elastic scaling follows the principle of fast expansion and slow contraction as a whole, ensuring execution stability through multi-level smoothing and anti-shaking. At the same time, for scenarios where indicators surge, it uses adaptive capabilities to expand capacity in advance. SAE currently supports four levels of elastic smoothing configuration to ensure stability:

• Level 1 smoothing: configure the indicator acquisition cycle, time window for single indicator acquisition, indicator calculation aggregation, and logic

• Level 2 smoothing: configure the tolerance of index values and interval elasticity

• Three-level smoothing: configure the expansion and contraction step size, percentage, upper and lower limits per unit time

• Four levels of smoothing: configure the expansion and contraction cooling window and instance warm-up time

Serverless Resilience Best Practices

SAE elastic scaling can effectively solve the problem of automatic capacity expansion when the instantaneous flow peak arrives, and automatic capacity reduction after the peak ends. High reliability, operation and maintenance free, and low cost ensure smooth operation of applications. It is recommended to follow the following best practices for elastic configuration during use.

• Configure health checks and lifecycle management

It is recommended to configure the application health check to ensure the overall availability of the application during elastic scaling, and ensure that your application only receives traffic when it is started, running, and ready to receive traffic. It is also recommended to configure the lifecycle management Prestop to ensure that your application is gracefully offline as expected during scaling.

• Adopt exponential retry mechanism

To avoid service invocation exceptions caused by untimely elasticity, untimely application startup, or an application that is not gracefully brought online or offline, it is recommended that the caller adopt an exponential retry mechanism for service invocation.

• Application startup speed optimization

To improve elastic efficiency, it is recommended that you optimize the creation speed of applications by considering the following aspects:

• Software package optimization: Optimize application startup time to reduce excessive application startup caused by external dependencies such as class loading and caching

• Image optimization: Reduces the size of the image and reduces the time spent on image pulling when creating instances. The open source tool Dive can be used to analyze image layer information and make targeted changes to the image layer

• Java application startup optimization: With the help of SAE Dragonwell 11, it provides Java 11 users with application startup acceleration functions

• Elastic scaling index configuration

Elastic scaling indicator configuration. SAE supports basic monitoring and application monitoring with multiple indicator combinations. You can flexibly choose based on the current application's attributes (CPU sensitive/memory sensitive/IO sensitive).

You can view and estimate the target value of indicators by reviewing historical data of corresponding indicators for basic monitoring and application monitoring (such as the past 6h, 12h, 1-day, 7-day peak values, P99, and P95 values). You can use pressure testing tools such as PTS to understand the number of concurrent requests that the application can respond to, the amount of CPU and memory required, and the application response mode under high load conditions to assess the peak size of application capacity.

The target value of the indicator needs to be weighed between availability and cost for strategic selection, such as

• Availability optimization strategy configuration index value is 40%

• Availability cost balancing strategy configuration indicator value is 50%

• Cost optimization strategy configuration index value is 70%

At the same time, elastic configuration should consider sorting out upstream and downstream, middleware, db, and other related dependencies, and configuring corresponding elastic rules or flow limiting and degrading methods to ensure that the availability of the entire link can be guaranteed during capacity expansion.

After configuring elastic rules, continuously monitor and adjust elastic rules to make the capacity closer to the actual load of the application.

• Memory metric configuration

Regarding memory indicators, consider using dynamic memory management for memory allocation for some application types (such as Java jvm memory management, Glibc Malloc, and Free operations). The application's idle memory has not been released to the operating system in a timely manner, and the physical memory consumed by the instance will not be reduced in a timely manner. Adding new instances does not reduce the average memory consumption, which cannot trigger scaling. Therefore, it is not recommended to use memory indicators for this type of application.

• Java application runtime optimization: Release physical memory, enhance memory metrics and business relevance

With the Dragonwell runtime environment, the ElasticHeap capability is enabled by increasing JVM parameters, supporting dynamic elastic scaling of Java heap memory, and saving the physical memory usage actually used by Java processes.

• Minimum instance count configuration

It is recommended to configure the minimum instance count for elastic scaling to be greater than or equal to 2, and configure multiple availability zones VSwitches to prevent the application from stopping working when instances are evicted or there are no available instances in the availability zone due to underlying node exceptions, ensuring the overall high availability of the application.

• Maximum instance count configuration

When configuring the maximum number of instances for elastic scaling, it is important to consider whether the IP number in the availability zone is sufficient to prevent the inability to add new instances. You can view the available IPs for the current application in the console VSwitch. If there are fewer available IPs, consider replacing or adding a VSwitch.

• Maximum elasticity reached

You can view the applications that currently have elastic scaling configuration enabled through the application overview, and timely discover the applications whose current instance count has reached a peak, and reassess whether their elastic scaling maximum configuration is reasonable. If the expected maximum number of instances exceeds the product limit (the current limit of 50 instances for a single application can be raised by providing work order feedback)

• Availability zone rebalancing

After elastic scaling triggers resizing, it may lead to uneven allocation of available zones. You can view the available zones to which the instance belongs in the instance list. If the available zones are unbalanced, you can achieve rebalancing by restarting the application operation.

• Automatically restore elastic configuration

When performing change order operations such as application deployment, SAE will stop the elastic scaling configuration of the current application to avoid conflicts between the two operations. If you expect the elastic configuration to be restored after the change order is completed, you can check System Automatic Recovery during deployment.

• Elastic History

SAE Elastic Effective Behavior Currently, events can be used to view the expansion and contraction time, expansion and contraction actions, as well as real-time, historical decision-making records, and decision-making context visualization functions, in order to measure the effectiveness of the elastic expansion and contraction strategy, and make adjustments if necessary.

• Elastic event notification

Combining multiple notification channels such as stapling, Webhook, SMS and phone calls, it is convenient to timely understand the elastic trigger status.

Finally, we share a customer case that uses the SAE elastic scaling function. During the COVID-19 in 2020, the business flow of an online education customer soared 7-8 times, and the hardware cost and business stability faced huge risks. If a traditional ECS architecture is adopted at this time, customers need to upgrade their infrastructure architecture in a very short time, which poses a significant challenge to users' cost and energy. However, if SAE is adopted, users can enjoy the technical dividend brought by Serverless at zero transformation cost. Combining SAE's multi-scenario elastic policy configuration, elastic adaptation, and real-time observability, it ensures the business SLA of user applications during peak periods, and through extreme elastic efficiency, saves hardware costs by up to 35%.

To sum up, in the direction of elastic development, especially in the Serverless scenario, more emphasis is placed on the ability to cope with sudden traffic. The goal is to achieve near on-demand use of application resources and availability of services throughout the process through indicator monitoring and extreme elastic capabilities without the need for capacity planning. SAE continuously optimizes the entire lifecycle of elastic components and applications to achieve second level elasticity, and has core competitiveness in terms of elasticity, scenario richness, and stability. It is the best choice for transforming traditional applications into Serverless.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us