Four steps for large-scale container landing
The large-scale landing of containers has become a "compulsory course" for enterprise development
The epidemic has accelerated the development process of enterprise digitalization. Low-latency and high-concurrency online scenarios frequently appear in the daily operations of enterprises. The demand for business innovation is also forcing enterprises to continue to use emerging technologies. Nowadays, Kubernetes has gradually become the infrastructure of the cloud-native era. Container technology is widely used in scenarios such as artificial intelligence, big data, blockchain, and edge computing. As a lightweight computing carrier, it provides high flexibility for more scenarios with agility. Under the dual pressures of daily operations and business innovation, more and more enterprises are embracing the large-scale implementation of containers from small-scale trials to fully embrace the large-scale implementation of containers to ensure the healthy and long-term development of their businesses.
According to the "2020 China Cloud Native User Survey Report" by the Academy of Information and Communications Technology, more than 60% of users have applied container technology in the production environment, nearly 80% of users' production needs need to be met with a node scale of 1,000 or more, and more than 13% of users use container technology. The scale has exceeded 5,000 nodes, and 9% of user containers have a scale greater than 10,000 nodes. With the further popularization of cloud native technology, more and more enterprises have switched their core business to containers, and the scale of container clusters in the production environment of enterprises has shown an explosive growth trend. The large-scale implementation of containers has become a "compulsory course" for enterprise development. The current open source version of Kubernetes can support up to 5,000 nodes and 150,000 Pods, which cannot meet the growing business needs.
What are the difficulties for large-scale implementation of containers?
Large-scale container clusters can provide greater business load capacity, higher traffic burst capability, and more efficient cluster management methods. As a practitioner and leader in the cloud-native field, Alibaba Cloud took the lead in achieving a breakthrough in the scale of 10,000 nodes and 1 million Pods in a single cluster. The number increased by 6.7 times. Based on the experience of serving millions of customers, Alibaba Cloud has developed a "four-step container scale implementation" approach, which can help enterprises overcome the difficulties in the process of large-scale container implementation and easily cope with the ever-increasing demand for scale.
Step 1: How to judge whether you need to scale up your container cluster?
When enterprises are faced with business or IT requirements such as traffic burst business, complex computing business, and the need to further improve operation and maintenance efficiency, the capacity of a single cluster has become the current bottleneck that hinders development. For example, businesses such as genetic computing and online flash sales will generate a large amount of load in a short period of time, which poses a severe challenge to the computing resources that a single cluster can accommodate. It is urgent that a single cluster can support large-scale nodes to run pods in batches. Based on this, enterprises will start to consider cluster expansion. However, the pursuit of large-scale clusters is not a silver bullet for instant success. Enterprises need to optimize cluster capabilities to achieve business value according to their own business development characteristics, and blindly pursue large-scale clusters. Will expand the risk of the entire failure domain.
Step 2: Scaling up containers is not simply expanding the scale. How to optimize a whole set of systems from bottom to top and get through the two channels of Ren and Du?
As an operating system in the cloud-native era, Kubernetes itself and the cloud environment it deploys are very complex and huge. Therefore, container scaling is a complete set of optimization systems from the underlying cloud resources to the upper-level applications. Enterprise users need to focus on optimization at three levels. 1. Break the restrictions on cloud resource quotas at the cloud product level; 2. Improve the resource scale ceiling at the cluster component level; 3. Optimize cluster configuration strategies at the Kubernetes resource level to ensure Resource scalability.
Step 3: It is difficult to guarantee that the original performance will not be damaged after the container is scaled up. How to further improve the performance and become a "flexible giant"?
After the scale of the container cluster is enlarged by N times, it poses a huge challenge to the performance of storage, cluster network, and application distribution. For example, the network traffic in a large-scale cluster data center is usually large, and the problems of network delay and jitter will also be magnified. , affecting the transmission efficiency and stability of the cluster network. There is also a common scenario of releasing and updating applications in batches under a large-scale cluster. The instantaneous image pull of 10,000 nodes will have a huge impact on the network, which has brought huge pressure on the image service and network bandwidth. The original intention of container scaling is to provide stronger technical support, not only to guarantee the original performance, but also to further improve the overall performance. Enterprise users can focus on optimizing from four aspects: Node&Pod scale efficiency, network efficiency (throughput and delay), DNS resolution efficiency, and mirroring acceleration.
Step 4: The most thrilling difficulty after the scale of the container is "stability"
If the cluster scale is the first step, then the stable operation of a cluster with tens of thousands of nodes is even more thrilling. The most important thing for a huge system is to control the fault domain and prevent avalanches. Compared with the scale, the stability of the scaled container is more important, because the recovery of a large-scale cluster cannot be solved simply by restarting. Once the avalanche starts, the overall collapse is inevitable, which seriously affects business continuity. For enterprises, the stability of large-scale clusters means the security of online business. Enterprise users need to consider pre-hemostatic plans, resource indexing and system component optimization, and monitoring all nodes to start the self-healing process at any time.
Alibaba Cloud helps enterprises implement large-scale implementation of containers in a one-stop manner
In response to various difficulties in the implementation of large-scale clusters in enterprises, Alibaba Cloud provides enterprise-level container cluster management capabilities based on ACK Pro, and provides a large number of performance optimizations on APIServer and scheduler, breaking resource scale restrictions, improving performance ceilings, and ensuring cluster stability sex. Through the self-developed high-performance container network Terway, the pod delay is optimized by 30%, and the performance overhead of large-scale services is reduced. It can not only solve the network bottleneck problem of large-scale clusters, but also provide almost native network performance on the cloud, making the cluster response faster. The enterprise-level mirror warehouse ACR EE supports exclusive storage, provides the ability to load mirrors on demand, reduces startup time by 60%, and can solve the problem of slow pulling mirrors from large-scale nodes. Integrating Alibaba Cloud's storage, network, and security capabilities, Alibaba Cloud provides enterprises with the best performance for large-scale container operation in one stop: more efficient network forwarding, more scalable storage, more efficient application and image distribution, and more stable Secure large-scale cluster management.
It is worth mentioning that at the recent 2020 Cloud Native Industry Conference, Alibaba Cloud became the first cloud service provider to pass the large-scale container performance test of the Institute of Information and Communications Technology, and obtained the highest level of certification - "excellent" level. In the large-scale evaluation of containers conducted by the Institute of Information and Communications Technology, Alibaba Cloud Container Service is far ahead of the vendors participating in the evaluation in terms of full-load stress testing, network delay, and network performance loss.
Based on this, Alibaba Cloud has enough flexible "service capability space" to customize container cluster services to meet the current needs according to the business of the enterprise. It has also exported many years of large-scale container technology to many eco-companies and ISV companies around Double Eleven. By supporting container clouds from all walks of life around the world, Alibaba Cloud Container Service has accumulated cloud-native application hosting middle-end capabilities that support unitized architectures, globalized architectures, and flexible architectures, managed more than 10,000 container clusters, and provided Enterprise-grade reliable service.
Alibaba Cloud has the largest container cluster in China, the richest cloud-native product family and the most comprehensive open source contributions, providing cloud-native bare metal servers, cloud-native databases, data warehouses, data lakes, containers, microservices, DevOps, Serverless, etc. More than 100 innovative products, covering new retail, government affairs, medical care, transportation, education and other fields. Alibaba Cloud Container Service is the only vendor in China that has been selected twice in Gartner's 2019 and 2020 "Competitive Landscape: Public Cloud Container Service" reports.
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00