Container Technology: Container Image

By Liu Bo

Preface

When it comes ot hiring talent, startups and large corporations each have their advantages, but what are the advantages of large tech firms? Well-established companies of Silicon Valley such as Facebook and Google are well known for providing free food for its employees. But more importantly, having cutting-edge technology and infrastructure are more significant when it comes to improving the "happiness index" of employees, especially application developers. These technologies and innovations are aimed at addressing a common question, are there any easy-to-develop applications in the world? Perhaps the emerging cloud-native ecosystem can shed some light on this problem.

In the cloud-native ecosystem, a container service consists of images and container engines. As a core cloud-native application product, container images are armed with the complete operating system and application runtime environment. Application iteration is simpler and more frequent due to the use of this immutable architecture.

This article will focus on container images and share the related knowledge, ideas, and practices in this industry.

The Concept of Container Image

Container Images

The container image is officially analogized as a common container in an environment. With different specifications, the container itself is immutable with different contents.

For images, the immutable parts include all the elements required to run an application, such as MySQL. Developers can use some tools, such as Dockerfile, to build container images and upload them to the Internet after signing. Then, people that need to run the software can specify names, such as _example.com/my-app_, to download, verify, and run these container images.

Container Initiative (OCI) Image Format Specification

Before the OCI Image Format project, there were two sets of widely used image format specifications, AppC and Docker 2.2. However, these two specifications have gradually assimilated with each other in their respective development. So, the OCI organization launched OCI Image Format Spec (based on Docker 2.2), specifying images that meet the specification. By doing so, developers can run a container on all container engines with the container packaged and only signed once.

This specification defines an OCI image:

This specification defines an OCI Image, consisting of a manifest, 
an image index (optional), a set of filesystem layers, and a configuration.

Container Workflow

A typical container workflow starts with the developers creating a container image (BUILD in the figure), uploading it to the image storage center (SHIP in the figure), and finally, deploying it in a cluster (RUN in the figure).

Problems in the Development of Container Image Technology

The design of the container image is very brilliant. First of all, it contains the excellent idea of a complete operating system is a package, which takes everyone outside the box of the installation packages. Also, it proposed killer features that improves the developer experience, such as Dockerfile, and can use a hierarchical structure to save time and space.

However, an excellent design does not necessarily equal excellent practice. The following section describes the specific problems.

Problems for Container Image Users

Problem 1: Slow Container Startup

Generally, slow container startup occurs when users start a large size container image since there are three steps required in the container preparation. The following is an example of OverlayFS:

Download the image.
Unpack the image.
Use OverlayFS to aggregate the writeable layer of the container and the read-only layer of the image to provide container runtime environment.

Among the preceding steps, the entire image needs to be downloaded when downloading an image, which means the on-demand data file loading is not available. In addition, image download is restricted by the network bandwidth. When the container image size is up to gigabytes, the download takes a relatively long time, damaging an excellent user experience from the containers.

Problem 2: High Local Storage Costs

The smallest unit that can be shared among different images is the layer in the images. One of its disadvantages is that it is inefficient in deduplication, and the causes are listed below:

Duplicate data exists inside the layer.
A large amount of duplicate data may exist between layers, and it will be used as different layers even with small differences.
According to the design of OCI Image Spec for deleting files and hard links, files that have been deleted by the upper layer may still exist in the lower layer and are included in the image.

Therefore, when containers with different images are scheduled to the same machine for running, the cost generated when the storage space is occupied by the images in the local filesystem cannot be ignored.

Problems for Container Image Providers

The provider refers to the image center of the container service.

Problem 1: Huge storage waste
There are a large number of similar images.

The causes are listed below:

The preceding disadvantages of the layers may cause similar images in the container image center.
The OCI image uses tar and gzip format to express the layers in the image, but the tar format does not distinguish tar archive entries ordering. As a result, if a user builds the same image on different machines, different images may be obtained due to the use of different filesystems. After the user uploads the images, the image center may have several different images with the same content.
There is low efficiency in image deduplication.

Although the image center supports garbage collection for deduplication, the layer still serves as its unit. Therefore, the deduplication is implemented only between the layers with the same hash value.

Problem 2: New requirements from cloud-native software supply chains

With the development of the software supply chain, various attack measures against it are also emerging. Security protection is a very important part of the software supply chain, which is reflected in the security enhancement of both the software and the supply chain. The application runtime environment is preinstalled with the container images. As a result, the security of the container images, including vulnerability scanning and signature for images, becomes a must for the container service providers.

Thoughts and Discussions on Container Images

Attempts by the Industry

Considering the problems mentioned above, small and large companies in the industry have had an all-hands-on-deck. Here are a few typical projects:

CernVM-FS

FUSE is used to load the required data remotely and on-demand.

Slacker

Designing an image benchmark establishes several interesting theoretical bases:

Containers take a long time to start.
The amplification factor of data reading and writing during startup is very high, and only 6% of the data is used during startup.
After analyzing the number of layers in 57 Docker images, it is found that the number of layers in more than half of the images is greater than 9.

Slacker improve the startup speed by 5 to 20 times with on-demand loading and a reduction in the number of image layers.

SquashFs

Oracle uses Linux SquashFS to replace the image layer content of the targz storage container and eliminates the unpacking tar.

Discussions in OCI Community

Since 2019, complaints about the images have increased gradually and lasted for more than a year. Starting from June 2020, the OCI community took more than a month to discuss the defects of the current OCI image format specification and the requirements for the OCIv2 image format (*).

(*): OCIv2 is just a name for promotion here. OCIv2 is an improvement of the current OCI image format specification, not a completely new one.

Defects of the OCI Image Format Specification

After discussion, two main defects were concluded:

tar format

The tar format does not distinguish tar archive entries ordering. This causes a problem. If a user builds the same image on different machines, as a result, different images may be obtained due to the different filesystems used. For example, if the foo enters tar before bar on filesystem A, and bar enters tar before foo on filesystem B, then these two images are different.

After the tar format is compressed by gzip, seeking is not supported. Therefore, image layers of targz must be downloaded and decompressed before running the container, and on-demand data file loading is not available.

The Layer as the Basic Image Unit
- Content Redundancy: The transmission and storage of the same content between different layers are considered as redundancy. The existence of redundant content cannot be determined when content is not read.
- Non-Concurrency: A single layer as a whole cannot be transmitted or extracted concurrently for the same layer.
- Verification Failure of Small Pieces of Data: The data of the entire layer can only be verified after the entire layer has been downloaded.
- Other Problems: Cross-layer data deletion is difficult to handle perfectly.

Format Requirements for Next-Generation Images

This time, the image format is discussed starting from one e-mail and a copy of a shared document, contributing to many online OCI community discussion meetings simultaneously. The conclusion is also very encouraging. This shared document explains the detailed description of the requirements for the OCIv2 image format, which are classified below:

Efficiency	Ease of Use	Security
- Less duplicate data - Unambiguous less filesystem metadata (*) - On-demand load image data - Less uploaded data	- Changeable image format - Filesystem format that can be mounted - Scalability	- Image content list - Verifiable and/or repairable - Works on untrusted storage

(*): It is unnecessary for the Metadata that works on a specific machine, such as file timestamp, to exist in the images.

These requirements make it clear that container images should focus on user-friendliness, efficiency, and security to achieve collaborative optimization in the build, ship, and run phases.

Thoughts of Alibaba Cloud on Container Images

Alibaba Cloud has been actively promoting and developing the cloud-native ecosystem, providing the infrastructure, Alibaba Cloud Container Registry (ACR), as the first stop of users' cloud-native containerization. ACR provides container images, OCI Artifacts management, such as Helm Chart management and distribution services. Meanwhile, Alibaba Cloud is now deepening its understanding of container image formats based on the status quo in the container business. By doing so, Alibaba Cloud aims to constantly summarize container image formats that meet development needs. The summary is listed below. The new image formats need to:

Meet the container concept of build once, run anywhere
Achieve efficient use of storage resources on the image center and container running nodes
Run faster on the full procedure (build, ship, run) of a container image than on the existing OCI image format
Improve its performance in security
Provide maximum compatibility with existing infrastructure and benefit most users

Image Acceleration of Alibaba Cloud Sandbox Containers

Different from the discussion in the community, Alibaba Cloud focuses on the design of a set of optimized full-procedure image solutions for customers to put into production.

After clarifying the preceding requirements during technological development, a new image format named Rafs was designed for Alibaba Cloud sandbox containers. At the same time, the container image service was introduced for the Dragonfly project under CNCF, reducing the image download time. What's more, by providing consistent validation of end-to-end image data, users can manage container applications securely and quickly.

Rafs: Image Format

Rafs divides a container image into two layers, metadata and data:

Metadata Layer: The metadata layer is the self-checking hash tree. Each file and directory are a node in the hash tree with a hash value attached. The hash value of a file node is determined by the data of files, and the hash value of a directory node is determined by the hash values of all files and directories under the directory.
Data Layer: The data of each file is sliced by fixed size and saved to the data layer. Sliced data can be shared by different files in different files or different images.

Nydus: Container Image Service of Dragonfly

In addition to using the image format of Rafs, Nydus includes a FUSE user-state filesystem process responsible for container image parsing.

Nydus parses the FUSE or virtiofs protocol to support the traditional runc container or Alibaba Cloud sandbox container. Container repositories, Object Storage Service (OSS), Network Attached Storage (NAS), and the supernodes and peer nodes of Dragonfly can be used as image data sources of Nydus. Moreover, Nydus configures a local cache to avoid pulling data from a remote data source in each startup.

Based on this design architecture, Nydus provides optimizations respectively in build, ship, run, and compatibility:

Build	Ship	Run	Compatibility
- Block-level image data deduplication minimizes storage resources for users.	- With different image backends, image data can be stored in the image repository, NAS, and object storage like Simple Storage Service. - Integrates with Dragonfly's p2p capabilities well	- Container images are downloaded on-demand, enabling users to start containers without downloading the complete images. - The image only has the data that is finally available, and there is no need to save or download expired data. - Consistency validation of end-to-end data provides users with better data protection.	- Compatible with OCI distribution and artifacts standards with ready-to-use capabilities

Reasons for the Design Based on Files

At the beginning of the design, Nydus chose a file-based design instead of a block-based design. Why?

The main reason is that additional container functions were supposed to be implemented based on image acceleration. This is based on the acquisition of file metadata from the images. However, the block-based design only uses disk LBA and cannot obtain information on its upper layer (the file system.)

With file metadata, the following additional functions are realized easily:

Image Optimization Suggestion: When building the container, the users are informed of the files that have not been accessed, which can help optimize the images.
Pre-Reading: When running the container, pre-load files by guessing the files to be read and sending them before the read operation to optimize the access speed.
Security Audit: When running the container, if the container access mode of the image content is significantly different from other containers, it may be considered a security risk.
Detection of Change Risks: When running the container, if the content access mode of an image after the upgrade is significantly different from the previous ones, either the program changes intentionally or bugs are introduced. Then, the developers need to be informed of such changes.

Summary

Although the layered image mechanism of the OCI image facilitates development, it also has many deficiencies when running in large-scale clusters. To this end, the OCI community is now trying to make it faster, securer, and more economical with the perceptibility of image content. On this basis, in line with the principle of benefitting customers, Alibaba Cloud has put forward the requirements for public cloud images, such as stability and pre-reading. Furthermore, image acceleration solutions for Alibaba Cloud sandbox containers have been developed to unify and optimize the full procedure of build, ship, and run. Thus, with a better user experience, users can benefit from cloud-native infrastructure development.

Community

Container Technology: Container Image

Preface

The Concept of Container Image

Container Images

Container Initiative (OCI) Image Format Specification

Container Workflow

Problems in the Development of Container Image Technology

Problems for Container Image Users

Problems for Container Image Providers

Thoughts and Discussions on Container Images

Attempts by the Industry

Discussions in OCI Community

Defects of the OCI Image Format Specification

Format Requirements for Next-Generation Images

Thoughts of Alibaba Cloud on Container Images

Image Acceleration of Alibaba Cloud Sandbox Containers

Rafs: Image Format

Nydus: Container Image Service of Dragonfly

Reasons for the Design Based on Files

Summary

References

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Container Service for Kubernetes

ACK One

Container Registry

Cloud-Native Applications Management Solution