By Deng Qinglin (Qingling), Alibaba Technical Expert
This article was reproduced from an internal document produced by Deng Qinglin (Qingling), an Alibaba technical expert. He was transferred from the Alibaba Cloud Console team to the ECI R&D team, which is the foundation for Serverless Kubernetes. Afterward, he began to study Kubernetes and spoke about his thoughts on the emergence and operations of Kubernetes from a business development perspective.
In the second half of 2019, I was transferred to a new job and began to learn about Kubernetes. Although my understanding of Kubernetes is still incomplete, I would like to share some of what I have learned. Hopefully, this article can help you get started with Kubernetes. If you find any errors in this article, please kindly let me know so that I can correct them.
There are lots of articles that provide introductions to Kubernetes and official Kubernetes documents are also very easy to understand. Here, I want to discuss the emergence of Kubernetes and its operations from a business development perspective.
As living standards in China continue to rise, almost every household now has a car. Mr. Wang foresees the car disposal business developing rapidly over the next five years. In 2019, the government issued the new policy "Measures for the Management of End-of-Life Vehicle Recycling", which eliminated the "special industry" status of vehicle scrapping and recycling and will open up the industry to market-based competition.
Mr. Wang felt that this was a good opportunity to start a business. So, he found a few interested partners and created a platform called Taoche.com.
Taoche.com was initially an all-in-one Java application deployed on a physical server. (Well, Mr. Wang, you really should learn more about Alibaba Cloud before the deployment!) As the business grew, the server could no longer cope, so they upgraded from a 64c256g server to a 160c1920g model. The cost was a bit high but the upgraded server could provide the necessary system resources.
After one year of business development, the 160c1920g model was no longer sufficient. The company had to break up their services for distributed deployment. To solve the problems they encountered in the distributed transformation process, the company introduced a series of middleware products, such as HSF, TDDL, Tair, Diamond, and MetaQ. After this difficult business architecture transformation, they successfully split the all-in-one Java application into multiple smaller applications, repeating the development history of Alibaba's middleware, and our move away from IBM, Oracle, and EMC infrastructure.
After the transition to a distributed architecture, Mr. Wang's team had to manage more servers, with a confusing variety of server batches, hardware specifications, and operating system versions. This led to various problems in application runtime and O&M.
By using virtual machine technology, it is possible to shield the differences in various underlying hardware and software. Although the hardware is different, it looks the same from the application's perspective. At that time, virtualization ate up a great deal of performance.
What about Docker? Docker is based on native Linux technologies, such as cgroup, which shields the underlying differences with little performance impact. This made it a really good choice. In addition, service delivery based on Docker images makes continuous integration and continuous delivery (CI/CD) very easy.
As the number of Docker containers grows, businesses have to face the new challenge of scheduling and communication between Docker containers. Taoche.com was no longer a small company. They were running thousands of Docker containers. If the business continues on its present course, it will soon have more than 10,000 containers.
They needed a system that could automatically manage their servers (including checking the health of servers, free memory, and CPU resources) and select the optimal server for containerization based on CPU and memory requirements. The system must also control communication between containers. For example, we would not want the containers of one department to be able to access the internal service containers of another department. This kind of system is called a container orchestration system.
First, Taoche had to answer a question, "How do we implement a container orchestration system when dealing with many servers?"
Assuming that they have already implemented an orchestration system, a portion of their servers will be used to run the orchestration system, and the remaining servers will be used to run business containers. The servers running the orchestration system are called the master node and the servers running the business containers are called worker nodes.
The master node is responsible for cluster management and it must provide required management interfaces. Among these interfaces, one helps administrators perform O&M operations on the cluster, and another is responsible for interactions with worker nodes, such as resource allocation and network management.
For the component that provides the management interface on the master node, we call it the kube-apiserver. Meanwhile, two clients are required to interact with API Server:
Now, the O&M administrators, master node, and worker nodes of the cluster can interact with each other. The O&M administrator can use kubectl to issue the command "create 1,000 containers from the Taoche User Center v2.0 image". After receiving this request, the master node performs computing scheduling based on the resource information of the worker nodes in the cluster. Then, the master node calculates the worker nodes for the 1,000 containers that need to be created and sends the relevant creation requests to the corresponding workers. The component responsible for scheduling is called the kube-scheduler.
How does the master know the resource consumption and running statuses of the containers on each worker? Simply put, we can use the kubelet clients on workers to actively report node resource and container running conditions on a regular basis. Then, the master stores this data and can use it for subsequent scheduling and container management. To store this data, we can write it to files or a database. There is also an open-source storage system called etcd that can meet data consistency and high availability requirements. It is easy to install and offers good performance.
Once we have the operational data for all the worker nodes and containers, we can do a lot with it. In the preceding example, 1,000 containers were created from the Taoche User Center v2.0 image. Assume five of these containers are running on worker node A. If node A suddenly encounters a hardware fault and becomes unavailable, the master node removes node A from the list of available worker nodes. Next, the five User Center v2.0 containers that were originally running on this node need to be reassigned to other available worker nodes. This allows the system to restore the number of User Center v2.0 containers to 1,000. After that, we need to adjust the network communication configuration for the containers so they can communicate correctly. Here, the series of components involved in this process are called "controllers". These controllers include node controllers, replication controllers, and endpoint controllers. The system also provides a centralized runtime component for these controllers called the kube-controller-manager.
Now, let's look at how the master implements and manages network communication between containers. First, each container must have a unique IP address that is used to communicate with other containers. The containers that need to communicate with each other may run on different worker nodes. Such communication will involve network communication between worker nodes. Therefore, each worker node must have a unique IP address. Since containers communicate with each other through their container IP addresses and do not perceive the IP addresses of worker nodes, the worker nodes must provide routing information for container IP addresses. For this purpose, we can use technologies such as iptables and IPVS. If the container IP addresses or the number of containers change, the iptables or IPVX configuration must be updated accordingly. We need a special component on the worker nodes that is responsible for listening to and adjusting the route forwarding configuration. This component is called the kube-proxy.
So far, we have solved network communication between containers. However, when coding, we should call a service through the domain name or VIP, rather than through the container IP address that may change dynamically. We need to encapsulate a service in addition to the container IP address. This service can be the VIP or domain name of a cluster. To do this, we need an internal DNS resolution service.
Although we already have kubectl and can easily interact with the master, an additional web management interface would make things easier. We may also want to view the container resource or operational logs of components related to the entire cluster.
Components such as the DNS, web management interface, container resource information, and cluster logs that can improve the user experience are collectively referred to as plug-ins.
Now, we have successfully built a container orchestration system. Next, we will briefly summarize the components mentioned in this article:
These are the major components in Kubernetes. As parts of the Kubernetes production-level container orchestration system, each of these components deserves a more detailed discussion.
Although Taoche had successfully implemented a container orchestration system and they were happy with it, Taoche's President Wang (no longer simply Mr. Wang) thought that the orchestration system's R&D and O&M costs were too high and wanted to find a way to reduce these costs. President Wang was looking for an orchestration system that would allow employees to focus on business development instead of cluster O&M management. When he and the company's technical team learned about the serverless concept, they were immediately interested and wondered if they could build a serverless container orchestration system.
As luck would have it, they discovered a product called Serverless Kubernetes, but that's another story for another day.
Alex - November 8, 2018
Alex - November 8, 2018
Alibaba Clouder - June 9, 2020
Alibaba Clouder - August 8, 2017
XianYu Tech - March 11, 2020
Alibaba Clouder - March 18, 2020
Alibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.Learn More
A secure image hosting platform providing containerized image lifecycle managementLearn More
A high-performance container manage service that provides containerized application lifecycle managementLearn More
Elastic Container Instance (ECI) is an agile and secure serverless container instance service. You can easily run containers without managing servers. Also you only pay for the resources that have been consumed by the containers. ECI helps you focus on your business applications instead of managing infrastructure.Learn More
More Posts by Alibaba Developer