A dedicated OS for custom containers running on the cloud-LifseaOS | Dragon lizard technology-Alibaba Cloud Developer Community

Introduction

at the Yunqi Conference in October 2021, the cloud native OS Lifsea was officially released and integrated into the hosting module pool of Alibaba Cloud container service ACK Pro. This is an optional operating system option.

Not long ago, the core code of LifseaOS was officially open-source in the long lizard community. Users can build and customize their own container-specific OS based on LifseaOS open source code.

WHY LifseaOS?

When it comes to LifseaOS, we have to mention its main scenario: containers.

From the earliest UNIX chroot to Linux LXC, the early container runtime technology based on cgroup and namespace has been continuously evolving, but there has been no phased breakthrough. Until 2013, docker directly promoted the rapid popularization of containers. After a few years of development, containers have become the mainstream IT infrastructure technology and have been widely used. The rapid development of containers docker plays an important role. Looking back at the initial work of docker at that time, we can find that it has not undergone disruptive technological changes. Its core innovation mainly includes the following two parts:

These two key innovations have brought about a revolution in development, integration, and deployment. . First, the image capability provides a convenient way for devops. Developers can control the entire runtime environment during the development process, and directly deploy their own development results on the production platform, without considering environment factors such as operating system compatibility and library dependency, docker's slogan "Build,Ship and Run Any App,Anywhere" is implemented ". The emergence of restful API makes container lifecycle management more convenient. Using orchestration tools to manage containers, SRE can quickly deploy, upgrade, and unpublish applications without discrimination, it realizes the qualitative flying from "pet" to "cattle group" for application management.

With the development of containers, it is derived from containers. Container Orchestration, container storage, and container network and other fields, these fields are closely combined to form a "cloud-native" ecosystem, and a complete "cloud-native operating system" has been gradually formed around K8S since 2015 ". K8S allows you to quickly and efficiently deploy containers in a distributed cluster without paying attention to complex cluster resource allocation and container scheduling. To fully support Kubernetes, cloud vendors have also implemented a large number of Kubernetes support connections, providing CNI(Container Network Interface) and CSI(Container Storage Interface) that are compatible with their I-layer infrastructure. And the corresponding cluster-autoscaler and other components enable K8S to perfectly manage its own storage, network, and computing resources.

In the process of cloud-native infrastructure, a component that belongs to Infra is slow. This is the operating system, which is known as the OS. Although the sense of presence is not very strong, OS, as the underlying software for connecting hardware and uplink services, silently provides applications with the capabilities of single-server resource management and runtime environment construction, it plays an important role. However, in cloud-native scenarios, traditional operating systems have gradually shown various discomfort ":

  • bloated size: to be compatible with different scenarios, traditional operating systems include a variety of hardware drivers, software packages, system libraries, and system services. The operating system has a wide range of back-end services and a large size. In the cloud-native container scenario, most of the necessary services have been containerized and deployed to nodes as containers. Version and configuration management can be implemented through containers, it gradually replaces the system services on the traditional OS. At the same time, cloud hardware resources are often simplified through the virtualization abstraction of cloud vendors, and do not need to support various hardware. The container image itself has the runtime self-contained capability, so many traditional OS capabilities appear thick and redundant, these heavy components also slow the startup of the entire OS and occupy considerable system resources (CPU, memory, etc.).
  • Scattered versions: to support different demands, the operating system provides a variety of different softwares and manages versions based on software packages. Each software package has its own independent functions, codes, and version numbers, you can add or delete software packages based on your needs. In this way, the OS status on each host is composed of a large number of different software package versions, and is generally managed for a certain software package during daily maintenance. In cloud-native scenarios, the number of cluster computing nodes is increasing. During production, bugfix and problem locating may be used to manage a package on a node (upgrade, configuration modification, etc.), if there is no complete cluster OS O & M mechanism, it is very likely that the status of the OS in the cluster is inconsistent, if the version of the dependent components is different in the phased process, the entire release process may be blocked, causing great difficulties to O & M personnel.
  • Security risks: on the one hand, traditional operating systems contain a large number of software packages and system services that are not required in cloud-native scenarios, bringing greater attack surfaces. On the other hand, Most O & M personnel of traditional operating systems log on to the system through ssh to perform black-screen O & M operations. The process is difficult to trace, and misoperations can easily lead to disastrous consequences.

The above problems are mainly reflected in O & M. Let's look back and see that the application O & M personnel had similar problems before docker appeared: how to ensure consistent application running environment matching under different conditions, and how to manage applications conveniently and quickly. docker solves the problems at the application layer. Can we use docker to solve OS O & M problems?

In fact, there are some container optimized operating systems in the industry, which are commonly referred to as ContainerOS, including AWS bottlerocket, Redhat Fodera CoreOS, and Rancher RancherOS. Most of them have the following characteristics:

  • lightweight : the operating system only contains enough software packages and system services to support container operation, greatly reducing attack surfaces and enabling quick startup.
  • Atomic upgrade Rollback : based on the design principle of immutable infrastructure, the read-only root file system is provided to ensure that the system is not maliciously tampered with. The management of the operating system is based on images and does not provide package management software such as YUM, the entire system is upgraded and rolled back based on images. Bottlerocket implements the atomic upgrade of the image by using A/B dual-partition mode, CoreOS manages an OS version by using rpm-ostree just like managing A git code warehouse, rancherOS container all system services to manage operating system images by using containers.
  • Cloud Native components are integrated by default. : cloud-native components such as docker, containerd, and kubernetes are installed by default. The operating system is out of the box and does not require additional installation operations. This is easy to use.
  • Controlled O & M channels : The system removes the sshd service and does not allow you to directly log on to the system for O & amp; M. It also provides a variety of API operations for host O & amp; M, in addition, a dedicated O & M container is provided as the final "retreat" to log on to the system.

These features also confirm our thinking: we can use images to solve the problem of scattered versions and use APIs to solve the problem of cluster operation and maintenance. We also find that if we can simplify the operation and maintenance API, can we also use the OS as a resource that K8S can manage, so that K8S can manage the OS like a container?

LifseaOS: an operating system created for the cloud

based on the above considerations, we have launched LifSeaOS, a cloud-native OS.

LifseaOS it continues the technical genre of CoreOS rpm-ostree. , based on the dragon lizard operating system (OpenAnolis) released by the dragon lizard community (Anolis OS) as the basis for software package selection.

LifseaOS uses the rpm-ostree function to implement atomic upgrade and rollback of images, allowing users to perform rolling upgrade on OS images in the cluster dimension and manage the operating system of the entire cluster just like managing Niu Qun; A large number of cropping optimizations have been made to make the overall OS lighter, faster, and safer.

At the same time, we provide a small tool for OS O & M (functions are still being enriched) to abstract and converge conventional OS O & M, with Alibaba Cloud assistant or automated O & M orchestration service, you can perform O & M operations on the OS by calling O & M tools to reduce open operations on the operating system and perform corresponding audits.

The more important role of API-based O & M is to pull OS O & M to the cloud-native direction. We can use a K8s controller to connect to the O & M API. Combined with the preceding OS versioning, let K8s manage a HostOS just like a container.

Of course, the features of LifseaOS are not only the image versioning and API operation described above, but also its name directly states that LifseaOS is created for the cloud, for the container born OS qualifications trait :

Lightweight

by default, LifseaOS integrates containerd and kubernetes components, and only retains the system services and software packages required for running kubernetes pods. The entire system only has about 200 software packages. Compared with traditional operating systems (Alibaba Cloud Linux 2/3 and CentOS) for more than 500 software packages, the number is reduced by 60%, which is more lightweight.

The heavy cloud-init (a cloud host metadata management component commonly used by cloud vendors) suite is replaced with the Ignition of CoreOS, and a large number of unnecessary functions are cropped, only the basic disk expansion is retained, hostname configuration, chronyd time zone synchronization server configuration, and user-data script execution. Removes unnecessary kernel modules, systemd services (such as systemd-logind and systemd-resolved), and many low-utility gadgets attached to systemd.

Fast

LifseaOS is positioned to run on the operating system of virtual machines on the cloud, so it does not involve too many hardware drivers. The necessary kernel driver module is changed to the built-in mode to eliminate initramfs,udev rules are also greatly simplified. In this way, the startup speed is greatly improved. g7.large dimension ECS instance as an example, the first startup time of the LifseaOS is kept at about 2 seconds. :

for traditional operating systems, Alibaba Cloud Linux 3 is used as an example, the first startup time is more than 1 minute:

Security

LifseaOS read-only permission is granted to the root file system. Only the/etc and/var directories can be written to meet basic system configuration requirements. This design not only conforms to the principle of infrastructure immutability in cloud native scenarios, but also prevents escaping containers from tampering with the host file system. python is not supported but the shell is still retained (because ACK needs to be initialized with a series of shell scripts during cluster deployment, which will be further removed later).

In addition, LifseaOS removes the sshd service and prohibits users from directly logging on to the system to perform a series of operations that may not be traceable. Of course, considering the needs of special O & M or emergency O & M, lifseaOS provides a dedicated O & M container to meet non-routine O & M requirements. O & M containers need to be pulled up on demand through APIs. By default, the container is disabled.

Atomic

LifseaOS does not support the installation, upgrade, and uninstallation of a single rpm package, and does not provide yum, therefore, the rpm-ostree software package in the Fedora CoreOS is removed and only the ostree function is retained (the former provides the rpm Package-based management function, while the latter only manages files). The update and rollback based on the entire image greatly ensures the consistency between the software package version and the system configuration of each node in the entire cluster. Each image goes online after a strict internal test. Compared with the uncertainty caused by the upgrade of a single rpm Package in traditional operating systems, the image-based test release can ensure the stability of the upgraded system.

Summary

finally, you are welcome to join the OS SIG of the dragon lizard community to build a dedicated container operating system developed for cloud native.

Access link address

link to SIG of dragon lizard community:

https://openanolis.cn/sig/container-os

LifseaOS open source code link:

https://gitee.com/anolis/lifsea-config

https://gitee.com/anolis/lifsea-assembler

-- Finished--

join the dragon lizard community

join the WeChat group: add the community Assistant-dragon lizard community Xiaolong (WeChat: openanolis_assis), note [dragon lizard] pull you into the group; Join the DingTalk group: scan DingTalk group QR code below. Developers and users are welcome to join the long lizard community (OpenAnolis) to jointly promote the development of the long lizard community and create an active and healthy open-source operating system ecosystem!

About dragon lizard community

dragon Lizard community (OpenAnolis) enterprises and institutions, institutions of higher learning, scientific research institutions, non-profit organizations, individuals and other non-profit open-source communities formed on the basis of voluntary, equal, open-source, and collaboration. The dragon lizard community was established in September 2020 to build an open-source, neutral, and open upstream Linux distribution community and an innovative platform.

The short-term goal is to develop the dragon lizard operating system (Anolis OS) as an alternative version of CentOS and rebuild a new version compatible with mainstream Linux vendors. The medium and long term goal is to explore and build a future-oriented operating system, establish a unified open-source operating system ecosystem, incubate innovative open-source projects, and prosper the open-source ecosystem.

Dragon Lizard OS 8.4 has been released. It supports x86_64, ARM64, and LoongArch architectures, fully adapts to Intel, Feiteng, Haiguang, Zhaoxin, Luopeng, and Dragon core chips, and provides full-stack national secret support.

Welcome to download: https://openanolis.cn/download

join us to build an open-source operating system for the future!

https://openanolis.cn

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now