Operating System Optimization Based on Cloud Infrastructure
Alibaba Cloud offers the Alibaba Cloud Linux operating system image, a distribution built on the OpenAnolis community's Anolis OS, crafted to provide a secure, stable, and high-performance running environment compatible with RHEL/CentOS ecology for cloud applications. It has been deeply optimized for cloud infrastructure to provide a better cloud OS experience. It is optimized for Alibaba Cloud's infrastructure to continuously improve system startup speed and runtime performance. After being polished by a vast array of Alibaba and Alibaba Cloud products, it provides excellent stability. It offers feature optimization in the following areas:
Kernel: Alibaba Cloud Linux2 is customized based on Linux kernel 4.19 LTS; Alibaba Cloud Linux 3 is based on Linux kernel 5.10 LTS. The system continually adds new features suitable for cloud scenarios, continuously improves kernel performance and fixes significant defects, and provides kernel boot parameters and system configuration parameters that are customized for the ECS instance environment.
Boot startup speed: Significantly optimized startup speed for the ECS instance environment. In actual tests, it reduces startup time by about 60% compared to other operating systems.
Runtime system performance: Optimizes scheduling, memory, and I/O subsystems, improving performance by approximately 10%~30% in some open-source benchmark tests compared to other operating systems.
ECS Instance Family Options Optimized for Specific Business Scenarios
ECS provides a variety of choices based on different scenario requirements. Product sequences include general-purpose computing, heterogeneous computing, and high-performance computing, supporting various types of enhanced instances for vertical scenarios, such as network-enhanced, storage-enhanced, memory-enhanced, security-enhanced, big data, high frequency, and heterogeneous computing instances, providing cost-effective products. For specific high-performance scenarios, users can choose according to their business needs. Typical examples include:
Elastic Bare Metal Instance: A new computing server product created based on Alibaba Cloud's next-generation virtualization technology, combining the elasticity of a virtual machine with the performance and features of a physical machine. Compared to previous-generation virtualization technology, the next-generation virtualization technology not only retains the elastic experience of general-purpose cloud servers but also retains the performance and characteristics of physical machines, fully supporting nested virtualization technology. Elastic Bare Metal Instances combine the advantages of physical machines and cloud servers, realizing incredibly stable computational capability. Through Alibaba Cloud's independently developed virtualization 2.0 technology, business applications can directly access the processor and memory of Elastic Bare Metal Instances without any virtualization overhead. Elastic Bare Metal Servers have the complete processor features of a physical machine (such as Intel VT-x) and the resource isolation advantage of a physical machine, making them particularly suitable for cloud-based deployment of traditional non-virtualization scenario applications.
GPU Cloud Server: GPU cloud servers are based on GPU and CPU application computing servers. GPUs have unique advantages in executing complex mathematical and geometric computations, especially floating-point calculations and parallel computations, where GPUs can provide a computation capability that is hundreds of times that of CPUs. GPU features include a large number of arithmetic and logic units (ALUs) capable of handling large-scale concurrent computing, high-throughput computing that supports multi-threading, and relatively simple logic control units. They are suitable for video transcoding, image rendering, AI training, AI inference, and cloud-based graphic workstations.
Super Computing Cluster (SCC): On the basis of Elastic Bare Metal Servers, the high-speed RDMA (Remote Direct Memory Access) interconnection support significantly improves network performance and improves the acceleration ratio of large-scale clusters. Therefore, while SCC provides high bandwidth and low latency quality networks, it also boasts all the advantages of Elastic Bare Metal Servers. SCC is mainly used in high-performance computing and artificial intelligence, machine learning, scientific computing, engineering computing, data analysis, audio and video processing, and other scenarios. Within the cluster, nodes are interconnected through the RDMA network, providing a network with high bandwidth and low latency and meeting the high parallelism requirements for high-performance computing and artificial intelligence, machine learning, and other applications. At the same time, the RoCE (RDMA over Convergent Ethernet) network speeds up to InfiniBand network-level performance and supports a wider range of Ethernet-based applications.
Reasonable Use of Elastic Resources
Cloud computing products provide flexible elasticity functions and policies, and can accomplish better adaptation to both irregular business volume fluctuations and regular business volume fluctuation performance requirements. Elastic resources primarily include the following types:
Elastic Scaling Elastic Scaling Service (ESS, also known as Auto Scaling) is a service that automatically adjusts computational capacity (i.e., the number of instances) according to business requirements and strategies. The instance type can be specified as an ECS instance or ECI instance. As a widely applied function in the cloud, Elastic Scaling offers multiple scaling modes, including fixed quantity mode, health mode, timed mode, custom mode, and dynamic mode, while also providing users with plenty of flexibility with features such as lifecycle hooks and cooldown periods.
ACK Elastic Scaling for Container Service Elasticity is a commonly adopted feature of ACK, with typical scenarios including online business elasticity, large-scale computing training, deep learning GPU or shared GPU training and inference, and timed cyclic load changes. Elastic scaling is divided into two dimensions:
Scheduling layer elasticity, mainly responsible for adjusting the scheduling capacity changes of the load. This includes types such as HPA, VPA, CronHPA, and Elastic-Workload. For example, HPA is a typical scheduling layer elasticity component, which can adjust the number of application replicas. The adjusted number of replicas will change the current scheduling capacity occupied by the load, thereby achieving scheduling layer scaling.
Resource layer elasticity, mainly when the cluster's capacity planning cannot meet the cluster's scheduling capacity, it will supplement the scheduling capacity by popping ECS or ECI resources. This includes types such as cluster-autoscaler, virtual-node, and virtual-kubelet-autoscaler.
In actual applications, the two are often used in combination, and in high-performance scenarios, there are higher requirements for the popping speed of containers. For scheduling layer flexibility, the ack-autoscaling-placeholder component can be used to provide a buffer area for the cluster's automatic expansion. For resource layer elasticity, Alicloud Image Builder can be used to automatically build images, and then combined with the custom image function of the ACK cluster node pool, nodes can be expanded quickly.
Function Compute Function Compute inherently integrates elastic properties, and instances expand and contract automatically according to function calls, creating instances when the call increases, and destroying instances when the request decreases. The entire process is triggered automatically by requests to create instances. If an instance does not process requests for a certain period, it will be automatically destroyed. The on-demand mode reduces the difficulty of managing application resources, but it also causes cold starts and latency and other performance issues. Cold start refers to the code download, function instance container startup, runtime initialization, code initialization, and other links in the function call chain. Once the cold start is completed, the function instance is ready, and subsequent requests can be executed directly. In high-performance scenarios, corresponding solutions are often needed to mitigate the impact of cold start latency. User and platform cooperation can optimize cold start. Function Compute has already made a lot of optimization on the system side of the cold start. For user-side cold starting, it is recommended to optimize in the following areas:
Slim down the code package: Developers should minimize the code package. Remove unnecessary dependencies. For example, execute npm prune command in Node.js and autoflake in Python. Also, some third-party libraries may contain test case source code, useless binary files, and data files, etc. Deleting useless files can reduce the time for function code download and decompression.
Choose the right function language: Due to differences in linguistic philosophy, Java runtime cold startup time is usually higher than other languages. For applications sensitive to cold start latency, using lightweight languages like Python can greatly reduce tail latency when there is no significant difference in hot startup latency.
Choose the right memory: With a certain amount of concurrency, the larger the function memory, the more CPU resources are allocated, so the cold start performance is better.
Reduce the probability of cold start
Preheat the function using a timed trigger.
Use the Initializer callback, Function Compute will asynchronously call the initialization interface, eliminating the time for code initialization. During the Function Compute system upgrade or function update process, you are not aware of the cold start.
In the actual application process, it is generally difficult to eliminate the user-side cold start. For example, in deep learning inference, when a large amount of model file needs to be loaded; when the function needs to interact with the legacy system and must use a client with a long initialization time. In these scenarios, if the function is very sensitive to latency, you can set reserved mode instances for the function, or use reserved mode instances and on-demand mode instances in the same function.
The allocation and release of reserved mode instances are managed by you, and you are billed based on the runtime of the instance. When the load's demand for resources exceeds the capacity of the reserved mode instance, the system automatically uses on-demand mode instances, thus balancing performance and resource utilization. Through reserved mode instances, you can allocate computational resources in advance according to the function's load change. The system can still use reserved mode instances to process requests when scaling on-demand mode instances, thereby completely eliminating the delay brought by a cold start.