By Liu Bo
When it comes ot hiring talent, startups and large corporations each have their advantages, but what are the advantages of large tech firms? Well-established companies of Silicon Valley such as Facebook and Google are well known for providing free food for its employees. But more importantly, having cutting-edge technology and infrastructure are more significant when it comes to improving the "happiness index" of employees, especially application developers. These technologies and innovations are aimed at addressing a common question, are there any easy-to-develop applications in the world? Perhaps the emerging cloud-native ecosystem can shed some light on this problem.
In the cloud-native ecosystem, a container service consists of images and container engines. As a core cloud-native application product, container images are armed with the complete operating system and application runtime environment. Application iteration is simpler and more frequent due to the use of this immutable architecture.
This article will focus on container images and share the related knowledge, ideas, and practices in this industry.
The container image is officially analogized as a common container in an environment. With different specifications, the container itself is immutable with different contents.
For images, the immutable parts include all the elements required to run an application, such as MySQL. Developers can use some tools, such as Dockerfile, to build container images and upload them to the Internet after signing. Then, people that need to run the software can specify names, such as
_example.com/my-app_, to download, verify, and run these container images.
Before the OCI Image Format project, there were two sets of widely used image format specifications, AppC and Docker 2.2. However, these two specifications have gradually assimilated with each other in their respective development. So, the OCI organization launched OCI Image Format Spec (based on Docker 2.2), specifying images that meet the specification. By doing so, developers can run a container on all container engines with the container packaged and only signed once.
This specification defines an OCI image:
This specification defines an OCI Image, consisting of a manifest, an image index (optional), a set of filesystem layers, and a configuration.
A typical container workflow starts with the developers creating a container image (BUILD in the figure), uploading it to the image storage center (SHIP in the figure), and finally, deploying it in a cluster (RUN in the figure).
The design of the container image is very brilliant. First of all, it contains the excellent idea of a complete operating system is a package, which takes everyone outside the box of the installation packages. Also, it proposed killer features that improves the developer experience, such as Dockerfile, and can use a hierarchical structure to save time and space.
However, an excellent design does not necessarily equal excellent practice. The following section describes the specific problems.
Generally, slow container startup occurs when users start a large size container image since there are three steps required in the container preparation. The following is an example of OverlayFS:
Among the preceding steps, the entire image needs to be downloaded when downloading an image, which means the on-demand data file loading is not available. In addition, image download is restricted by the network bandwidth. When the container image size is up to gigabytes, the download takes a relatively long time, damaging an excellent user experience from the containers.
The smallest unit that can be shared among different images is the layer in the images. One of its disadvantages is that it is inefficient in deduplication, and the causes are listed below:
Therefore, when containers with different images are scheduled to the same machine for running, the cost generated when the storage space is occupied by the images in the local filesystem cannot be ignored.
The provider refers to the image center of the container service.
The causes are listed below:
Although the image center supports garbage collection for deduplication, the layer still serves as its unit. Therefore, the deduplication is implemented only between the layers with the same hash value.
With the development of the software supply chain, various attack measures against it are also emerging. Security protection is a very important part of the software supply chain, which is reflected in the security enhancement of both the software and the supply chain. The application runtime environment is preinstalled with the container images. As a result, the security of the container images, including vulnerability scanning and signature for images, becomes a must for the container service providers.
Considering the problems mentioned above, small and large companies in the industry have had an all-hands-on-deck. Here are a few typical projects:
FUSE is used to load the required data remotely and on-demand.
Designing an image benchmark establishes several interesting theoretical bases:
Slacker improve the startup speed by 5 to 20 times with on-demand loading and a reduction in the number of image layers.
Oracle uses Linux SquashFS to replace the image layer content of the targz storage container and eliminates the unpacking tar.
Since 2019, complaints about the images have increased gradually and lasted for more than a year. Starting from June 2020, the OCI community took more than a month to discuss the defects of the current OCI image format specification and the requirements for the OCIv2 image format (*).
(*): OCIv2 is just a name for promotion here. OCIv2 is an improvement of the current OCI image format specification, not a completely new one.
After discussion, two main defects were concluded:
The tar format does not distinguish tar archive entries ordering. This causes a problem. If a user builds the same image on different machines, as a result, different images may be obtained due to the different filesystems used. For example, if the foo enters tar before bar on filesystem A, and bar enters tar before foo on filesystem B, then these two images are different.
After the tar format is compressed by gzip, seeking is not supported. Therefore, image layers of targz must be downloaded and decompressed before running the container, and on-demand data file loading is not available.
The Layer as the Basic Image Unit
This time, the image format is discussed starting from one e-mail and a copy of a shared document, contributing to many online OCI community discussion meetings simultaneously. The conclusion is also very encouraging. This shared document explains the detailed description of the requirements for the OCIv2 image format, which are classified below:
|Efficiency||Ease of Use||Security|
| - Less duplicate data
- Unambiguous less filesystem metadata (*)
- On-demand load image data
- Less uploaded data
| - Changeable image format
- Filesystem format that can be mounted
| - Image content list
- Verifiable and/or repairable
- Works on untrusted storage
(*): It is unnecessary for the Metadata that works on a specific machine, such as file timestamp, to exist in the images.
These requirements make it clear that container images should focus on user-friendliness, efficiency, and security to achieve collaborative optimization in the build, ship, and run phases.
Alibaba Cloud has been actively promoting and developing the cloud-native ecosystem, providing the infrastructure, Alibaba Cloud Container Registry (ACR), as the first stop of users' cloud-native containerization. ACR provides container images, OCI Artifacts management, such as Helm Chart management and distribution services. Meanwhile, Alibaba Cloud is now deepening its understanding of container image formats based on the status quo in the container business. By doing so, Alibaba Cloud aims to constantly summarize container image formats that meet development needs. The summary is listed below. The new image formats need to:
Different from the discussion in the community, Alibaba Cloud focuses on the design of a set of optimized full-procedure image solutions for customers to put into production.
After clarifying the preceding requirements during technological development, a new image format named Rafs was designed for Alibaba Cloud sandbox containers. At the same time, the container image service was introduced for the Dragonfly project under CNCF, reducing the image download time. What's more, by providing consistent validation of end-to-end image data, users can manage container applications securely and quickly.
Rafs divides a container image into two layers, metadata and data:
In addition to using the image format of Rafs, Nydus includes a FUSE user-state filesystem process responsible for container image parsing.
Nydus parses the FUSE or virtiofs protocol to support the traditional runc container or Alibaba Cloud sandbox container. Container repositories, Object Storage Service (OSS), Network Attached Storage (NAS), and the supernodes and peer nodes of Dragonfly can be used as image data sources of Nydus. Moreover, Nydus configures a local cache to avoid pulling data from a remote data source in each startup.
Based on this design architecture, Nydus provides optimizations respectively in build, ship, run, and compatibility:
|- Block-level image data deduplication minimizes storage resources for users.|| - With different image backends, image data can be stored in the image repository, NAS, and object storage like Simple Storage Service.
- Integrates with Dragonfly's p2p capabilities well
|- Container images are downloaded on-demand, enabling users to start containers without downloading the complete images.
- The image only has the data that is finally available, and there is no need to save or download expired data.
- Consistency validation of end-to-end data provides users with better data protection.
|- Compatible with OCI distribution and artifacts standards with ready-to-use capabilities|
At the beginning of the design, Nydus chose a file-based design instead of a block-based design. Why?
The main reason is that additional container functions were supposed to be implemented based on image acceleration. This is based on the acquisition of file metadata from the images. However, the block-based design only uses disk LBA and cannot obtain information on its upper layer (the file system.)
With file metadata, the following additional functions are realized easily:
Although the layered image mechanism of the OCI image facilitates development, it also has many deficiencies when running in large-scale clusters. To this end, the OCI community is now trying to make it faster, securer, and more economical with the perceptibility of image content. On this basis, in line with the principle of benefitting customers, Alibaba Cloud has put forward the requirements for public cloud images, such as stability and pre-reading. Furthermore, image acceleration solutions for Alibaba Cloud sandbox containers have been developed to unify and optimize the full procedure of build, ship, and run. Thus, with a better user experience, users can benefit from cloud-native infrastructure development.
Alibaba Clouder - December 20, 2018
Alibaba Developer - February 26, 2020
Alibaba Container Service - July 28, 2021
Alibaba System Software - August 6, 2018
Alibaba Developer - October 15, 2018
Alibaba Cloud Serverless - May 26, 2021
Alibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.Learn More
A secure image hosting platform providing containerized image lifecycle managementLearn More
Accelerate and secure the development, deployment, and management of containerized applications cost-effectively.Learn More
Accelerate software development and delivery by integrating DevOps with the cloudLearn More
More Posts by Alibaba Developer