Getting Started with Kubernetes | Basic Concepts of Kubernetes Containers

By Fuwei (Yuge), Senior Development Engineer at Alibaba Cloud

1) Containers and Images

What is a Container?

Before getting into a detailed definition of a container, let's briefly review how an operating system (OS) manages processes. We can see processes by logging on to the OS and performing operations such as running the ps command. These processes include system services and users' application processes. Let's take a look at the characteristics of these processes.

First, these processes are visible to and can communicate with each other.
Second, these processes use the same file system and therefore can read and write the same files.
Third, these processes use the same system resources.

Next, it is significant to identify the problems that may arise out of the characteristics listed above.

Due to the fact that these processes are visible to and can communicate with each other, a process with advanced permissions may attack other processes.
Also, since these processes use the same file system, some problems may occur. As these processes may add, delete, modify, and query existing data, a process with advanced permissions may delete the data of other processes, which compromises the normal operation of other processes. In addition, dependencies between these processes may conflict, which makes operations and maintenance (O&M) more complex.
Due to the fact that these processes use the resources of the same host, resource preemption may occur between applications. As a result, when an application consumes a majority of CPU and memory resources, other applications may fail to provide services properly.

To address these three issues, let's understand how to provide an independent running environment for these processes?

To solve the problems resulting from the fact that different processes use the same file system, the Linux and UNIX operating systems convert subdirectories into root directories by using chroot system calls and therefore, achieve view-level isolation. In addition, chroot allows the processes to have separate file systems to ensure that other processes are not affected when addition, deletion, modification, or query operations are performed on these file systems.
To solve the problems resulting from the fact that processes are visible to and can communicate with each other, the namespace technology is used to isolate the processes at the resource view level. The combination of the chroot system and namespace technology allows processes to run in an independent environment.
However, in the independent environment, processes still share the resources of the same OS, and some processes may exhaust the resources of the entire system. To reduce the impact between processes, use Cgroups to set resource utilization limits for certain processes, such as setting the maximum available number of CPUs and the maximum memory size.

Next, let's understand how to define a set of processes in this fashion.

A container is a set of processes that is isolated at the view level and restricted in resource utilization, with a separate file system. "View-level isolation" means that only part of the processes is visible and that these processes have independent hostnames. The restriction on resource utilization may involve the memory size and CPU count. A container is isolated from other resources of the system and has its own resource view.

A container also has a separate file system. This system uses system resources and therefore does not need kernel-related code or tools. Due to this, it is critical to provide the container with the required binary files, configuration files, and dependencies. The container may run provided that the file collection required for running the container is available.

What Is an Image?

A container image refers to a collection of all files required for running a container. Generally, Dockerfile is used to build an image. Dockerfile provides a very convenient syntactic sugar to help us describe every building step. Certainly, each building step is performed on the existing file system and makes changes to the content of the file system. These changes are referred to as a changeset. Obtain a complete image by sequentially applying these changes to an empty folder. The changeset features layering and reusability to provide the following benefits:

First, distribution efficiency may improve. If a large image is divided into small blocks, the distribution efficiency of the image is improved, as data is downloaded concurrently.
Second, when some data is stored locally, there is only the need to download the data that is locally unavailable, because image data is shared. For example, to download a Golang (Go) image, if an Alpine image is locally available, you only need to download data that is not included in the Alpine image, because a Go image is based on an Alpine image.
Third, save lots of disk space as image data is shared. Assume that an Alpine image and a Go image are stored locally, the size of the Alpine image is 5 MB and that of the Go image is 300 MB. In this case, the total disk usage is 305 MB. However, with the reusability capability, the total disk usage may reduce to 300 MB.

How to Build an Image

As shown in the following figure, Dockerfile describes how a Go application is built.

1) The FROM line indicates the image based on which building steps are performed. As mentioned earlier, images can be reused.
2) The WORKDIR line indicates the directory in which the subsequent building steps are performed. Its function is similar to the cd command in a shell.
3) The COPY line indicates copying files on the host to the container image.
4) The RUN line indicates performing the corresponding action on a specific file system. Obtain an application after running this command.
5) The CMD line indicates the default program name used in the image.

After Dockerfile is available, build the required application by running a Docker build command. Building results are stored locally. Generally, an image is built in a packaging tool or other isolated environments.

Now, the next question is how do these images run in the production or test environment? In this case, a broker or central storage, which is known as the Docker registry, is required. This registry stores all the generated image data. To push a local image to the image repository, run the Docker pull command. This helps to download and run the corresponding data in the production or test environment.

How to Run a Container

To run a container, you need to complete three steps:

Step 1: Download the image from the image repository.
Step 2: After the image is downloaded, view the local image by running the docker images command. In this case, select the desired image from the returned full list.
Step 3: After selecting the desired image, execute the docker run command to run the image and obtain the desired container. Also, obtain multiple containers by running the command several times.

An image is equivalent to a template, and a container is the running instance of an image. Therefore, build an image and run everywhere.

Summary

A container is a set of processes that is isolated from the rest of the system, such as other processes, network resources, and file systems. An image is the collection of all files required by a container and is built just once for running everywhere.

2) Container Lifecycle

Lifecycle of Container Runtime

A container is a set of isolated processes. By running the Docker run command, you may specify an image to provide a separate file system and specify the corresponding running program. The specified running program is referred to as an initial process. The container starts with the startup of the initial process and exits with the exit of the initial process.

Therefore, the lifecycle of a container is considered to be the same as that of an initial process. Certainly, a container may include more than an initial process. The initial process generates sub-processes or manages O&M operations resulting from the execution of the Docker exec command. When the initial process exits, all its sub-processes exit to prevent resource leakage. However, this method incurs some problems. As programs in an application are usually stateful and may generate important data. After a container exits and is deleted, its data is lost. This is unacceptable for the application, and therefore, there is a need to persist important data generated by containers. To meet this demand, a container directly persists data into a specified directory, which is referred to as a volume. The most prominent feature of the volume is that the lifecycle of the volume is independent of the container. To be specific, container operations such as creation, running, stopping, and deletion are irrelevant to the volume because the volume is a special directory that helps persist data in the container. Mount the volume to the container so that the container writes data to the corresponding directory, and the exit of the container does not incur data loss. Generally, a volume is managed in two ways:

Directly mount the directory of the host to the container through binding. This method is simple but generates O&M costs since the volume needs to manage all hosts when it relies on the directory of a host.
The other method is to assign directory management to a runtime engine.

3) Architecture of a Container Project

Architecture of the Moby Container Engine

Moby is the most popular container management engine available today. The Moby daemon manages containers, images, networks, and volumes at upper layers. The most important component on which the Moby daemon relies is containerd. The container component is a container runtime management engine that is independent of the Moby daemon. It manages containers and images at the upper layers.

Similar to a daemon process, containerd includes a containerd shim module at the underlying layer. This design is due to the following reasons:

First, flexible plug-in-based management is required, because containerd needs to manage the lifecycle of containers that may be created by different container runtimes. Shims are developed for different container runtimes. Therefore, a shim is independent of containerd and serves as a plug-in to manage containerd.
Second, the plug-in form of a shim enables the shim to be dynamically hosted by containerd. If this capability is not available, when the Moby daemon or containerd daemon unexpectedly exits, containers are left unattended and then disappear and exit with the Moby daemon or containerd daemon, which affects the applications' operation.
Lastly, Moby or containers may be upgraded at any time, but cannot be upgraded in-place without affecting service upgrades. The shim mechanism allows in-place upgrading, and therefore the containerd shim is very important.

This article provides a general introduction to Moby. For a detailed description, refer to subsequent articles in this series.

4) Comparison Between a Container and a VM

Differences Between a Container and a VM

A virtual machine (VM) simulates hardware resources such as the CPU and memory by using the hypervisor-based virtualization technology, wherein a guest OS is built on the host. This process is referred to as the installation of a VM.

Each guest OS, such as Ubuntu, CentOS, or Windows, has an independent kernel. All applications running in such a guest OS are independent of each other. Therefore, using a VM achieves better isolation. However, implementing this isolation requires the virtualization of certain computing resources, which results in the waste of existing computing resources. In addition, each guest OS consumes a large amount of disk space. For example, the Windows OS consumes 10 GB to 30 GB disk space, and the Ubuntu OS consumes 5 GB to 6 GB disk space. Furthermore, the startup of a VM is very slow in this condition. The weaknesses of the VM technology drive the emergence of container technology. Containers are oriented to processes and therefore require no guest operating systems. A container needs only a separate file system that provides it with the required file collection. In this case, all the file isolation is process-level. Therefore, the container starts up faster and requires less disk space than the VM. However, the effect of process-level isolation is not as ideal as that of VMs.

In a nutshell, both containers and VMs have pros and cons; currently, container technology is developing towards a stronger isolation capability.

Conclusion

This article gives a quick walkthrough of the following basic concepts of Kubernetes containers

A container is a set of processes and has its unique view.
An image is a collection of all files required for running a container, and can be built once and run everywhere.
The lifecycle of a container is the same as that of an initial process.
Both containers and VMs have their own pros and cons; currently, container technology is developing towards stronger isolation.

Community

Getting Started with Kubernetes | Basic Concepts of Kubernetes Containers

1) Containers and Images

What is a Container?

What Is an Image?

How to Build an Image

How to Run a Container

Summary

2) Container Lifecycle

Lifecycle of Container Runtime

3) Architecture of a Container Project

Architecture of the Moby Container Engine

4) Comparison Between a Container and a VM

Differences Between a Container and a VM

Conclusion

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Container Service for Kubernetes

ACK One

AgentBay

Cloud-Native Applications Management Solution