36 Counts of R&d Efficiency Improvement
36 counts of R&D efficiency improvement.In this article, we propose the concept of immutable construction from the final state of software delivery. In the process of software development, how can we enjoy the dividends of the industrial ecology and standardize the software delivery process? What should containers in software delivery look like?
Column Planning｜Ya Chun
Volunteer Editors｜Feng Chaokai, Orange Bee
Earlier, we cited an example in "Containers Change the World" (author: Mark Levinson). The book mentioned that in the 1950s and 1960s, the use of containers reduced the overall freight cost by 95%, and most of the Dockers are facing unemployment.
36 counts of R&D efficiency improvement.This matter seems simple, but it has brought a very big impact on economic globalization. Later, orders from American companies can be placed in China, and China has become a "world factory", which has a lot to do with it. Behind the container is standardization and an industry chain based on unified standards. There are two important points here, one is standardization, and the other is immutable.
So, in the process of software development, how can we enjoy the dividends of the industrial ecology and standardize the software delivery process? What should containers in software delivery look like?
36 counts of R&D efficiency improvement.How to ensure the standardization of the software delivery process
In the past ten years, the form of software delivery has undergone great changes, from buying physical machines, building computer rooms to virtual machines and now containers. Why is there such a change in the middle?
The underlying technology of the container itself is namespace and cgroup , but these two things appeared ten or twenty years ago. The earliest application of these technologies are cloud vendors that have clear demands on resource utilization and isolation. For example, Alibaba Cloud does not want things running on machines from different users to be linked to each other. The best way is to limit the resources of each user, such as CPU, memory, etc. With this demand, LXC and other methods will be used to isolate and limit resources. But this still doesn't produce the container. why? The problem is that each cloud vendor can only do it internally, but cannot distribute it externally. So the great thing about Docker is not that it has done much innovation at the bottom, but that it provides a container image that can be distributed externally.
Container images are a form of distribution. We can distribute container images to others, or let others inherit our images. At the same time, Docker also provides Dockerfile . Dockerfile allows us to describe an image in the form of a file. Once a mirror can be defined it is possible to collaborate. With this ability, containers are quickly accepted by everyone. Therefore, the acceptance of containers seems to be a process of technological development, but in fact, it is an inevitable result of the development of cloud native and cloud market .
What many of us think of as a "container" is a container, and this container is often considered a docker container. Many containers are supported in K8s, and docker containers are used in most cases. The advantages of docker containers are the two points just mentioned: images and Dockerfiles . These two points allow docker images to be distributed like containers.
In addition, containers also provide good resource isolation, which can be isolated at a relatively small granularity. Although the virtual machine is also isolated, its granularity is relatively large. Not only that, containers also provide a very elastic resource management method, which is a great improvement over both virtual machines and physical machines . In essence, it is a process on a physical machine, which is the essential difference between it and a virtual machine.
After understanding what a software container is, let's learn about the composition of a container image.
As shown in the figure above, this figure very vividly shows the internal structure of the container image. When we execute dockerbuild to build the image by ourselves, you will find that the log it comes out has many hash values, layer by layer. In fact, it consists of many layers. We create a container process through LXC or other technologies. This process uses namespace and cqroup to isolate and limit resources. Container images have a BaseImage . We know that running a program requires the operating system environment, such as dependent libraries. If this program is randomly deployed on a physical machine and a virtual machine, it will vary with the environment of the machine, which may lead to risks. So the container image gave a base image and put this thing inside. Further up is Addemacs and Addapache , these two layers we will write in the Dockerfile . Then the top is Writable, which is what we can really write when the container is running.
So what are the characteristics of container images? It is layered, and each layer can be reused. We have many containers on a certain machine. If the Base image is the same, it only needs to be the next time. It can be seen that the size of the image is piled up by all layers. The less things are piled up, the smaller the image will be. Container images have a minimal image called scratch, which is the most primitive base image. There's almost nothing in it, and building a very, very small container out of it might be a few megabytes in size. But if you are based on the Cntainer out of it might be a few megabytes in size. But if you are based on the CentOS base image, it may be the size of the upper G.
Container images have a very important concept called "One process per container" (container life cycle = process life cycle). We can think that a container is a process on K8s. If K8s is compared to an operating system, then a container is a process running on it. The life cycle of a process can be managed. Although containers have so many advantages, they also encounter many problems when they are actually used
Let's talk about some common problems and suggestions for container images..
Common problems and practical suggestions for container images
Common problems with container images:
•Put everything in a container and use the container as a virtual machine.
•Set ENTRYPOINT to systemd : The running result and state of the process managed by systemd are inconsistent with the state of the container. It is possible that the process inside is dead or crashed, but systemd is still alive. From the outside, it seems that the container is fine.
•When privatized deployment, a bunch of exported image tar packages are included. A tarball is not layered, it doesn't know that there are many layers in it.
•Every time the basic image is distributed to the entire cluster, the network becomes particularly congested
Our practical recommendations are:
Try to use a lightweight base image and a certain image version.
•Reuse image content by layering to avoid repeated pulling.
•Avoid using systemd , including supervisord and daemon management services like this, for ENTRYPOINT.
•Use the local docker registry , etc. to copy images offline at layer-level granularity.
•To avoid doing a lot of pulls at the same time, you can use P2P methods (such as dragonfly) to improve the efficiency of image distribution.
Container images can standardize the software delivery process. Standardization is a means, not an end. Standardization is a technology that helps us reuse more efficiently.
Returning to the end state of software delivery, our aim is to provide a stable and predictable system.
The premise of achieving this goal is to have a definite operating environment and software products . The deterministic environment refers to the code (and its dependencies), the build environment, the build script and the expected output software product. How to do this will be shared later. Let's first look at how to ensure the consistency of software products.
36 counts of R&D efficiency improvement.How to ensure the consistency of software products
36 counts of R&D efficiency improvement.To ensure the consistency of software products, software products should have a definite format, a unique version, and can be traced back to the source code, and can be traced back to the production and consumption process, so that continuous delivery can better serve the product management and development of enterprises.
In the process of product construction, some problems are often encountered. For example, there is no Makefile , package.json, go.mod in the code base of the application and the dependencies cannot be determined, or the product can be built successfully but lacks a few dependencies, or the development environment is running normally in its own development environment and development occurs in the production environment. There are no bugs in the environment. The reason for these problems is that the build itself is mutable, and when you build mutable, it brings a whole bunch of problems. To do this, we need to make the artifact as expected through immutable builds.
To implement immutable builds, we need to ensure that we have:
•the same code
•same build environment
•same build script
the same code
36 counts of R&D efficiency improvement.For example, when programmers develop, if they do not specify the version of the dependency in the dependency description file (such as go.mod, package-lock.json, pom.xml, requirements.txt, etc.), the latest version will be used as the dependency by default, so that the output Artifacts will not be consistent as dependencies are updated, which risks being completely unexpected.
Same build environment
For the build environment, Dockerfile can be used to describe the environment under the container platform, and through Dockerfile we can use a consistent environment for the product. Many times we do not need to use many dependencies of the build environment in operation, and the volume of the build image is often quite amazing. At this time, we need to separate the build environment from the running environment to obtain the lightest possible image product.
Same build script
Correspondingly, it is also very important to use the same build script independent of the code implementation, and a certain environment-dependent version must be specified in the environment of the Dockerfile .
The resulting software artifact is the same only under the same code (and the same dependencies), the same build environment description, and the same build script environment. The emphasis here is to say that everything must be consistent. If the three are the same, the products produced are the same. Even if the construction time is different, the products produced are the same.
To do a good job of immutable infrastructure, we must first standardize the form of the final delivery product, and clarify the operation and maintenance management method of this delivery form. To ensure immutability, you must first do an immutable build, and then you can have a consistent software product.
NOTE: Build accuracy is always more important than build faster. The build information of the product is inaccurate, resulting in inconsistent build products, uncontrollable versions, and all subsequent work is wasted.
36 counts of R&D efficiency improvement.How to improve build efficiency
In the construction of this piece, a point that needs attention is how to improve the construction efficiency. Let's first look at a simple calculation problem:
This is a very large amount of data and a very large loss. Many times the engineering efficiency of a project is too low because the build is too slow. Long build times make product iteration very slow, and feature updates and bug fixes are also affected.
So how do we improve the efficiency of the build? Here are some of our practical recommendations:
1 basic principle : ensure the accuracy of construction, the accuracy of construction is always better than the efficiency of construction. It only makes sense to improve efficiency on the premise of ensuring accuracy.
5 suggestions :
•Application slimming : Check the dependencies of the application, whether the application package is too large, whether there are too many dependencies, whether unnecessary dependencies can be removed, and whether a smaller image can be built.
•Hierarchical construction : The bottom layer is built first and then reused by the upper layer, and then it can be incremental.
•Build cache : Pulling dependencies during the build process is time-consuming, and repeated pulls should be avoided.
•Network optimization : mainly to ensure low network latency between code, build machines and product libraries. Whether the code and build machine are on the same low-latency link. For example, the code is built on Github using cloud effects , and the delay at this time will be much higher than that of the intranet .
36 counts of R&D efficiency improvement.Repository mirroring : Repository mirroring can greatly reduce the time to pull dependencies. In the domestic network environment, if the dependency is obtained from the source warehouse, the delay may be very long. In this case, the mirror network can be used to reduce the delay. For example , nodejs developers often use Taobao's npm mirror source, while Python developers use Tsinghua's mirror source. For enterprises, they can also build their own mirror warehouses to improve reliability and reduce latency. Cloud Effects also uses image repositories to reduce pull time.
(Editor's recommendation: Flow is a pipeline tool in the cloud-native era. It uses container technology to free enterprises from the dependence on the virtual machine construction environment. You can even use different pipelines on the same pipeline according to your needs. Construction environment. In addition, the cloud effect pipeline Flow also provides container environments in various languages to meet different construction usage scenarios. Click at the end of the article to read the original text for details)
36 counts of R&D efficiency improvement.Summarize
In this article, we propose the concept of immutable builds from the final state of software delivery. Hope through: same source code + same environment + same build script => bring consistent software artifacts . And these things are stored in the source code, so the management of the source code is very important.
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00