Container service has fully entered the era of intelligence

Container technology has crossed the gap and is widely used in finance, communications, manufacturing, transportation and other industries. The workload supported by Kubernetes has also developed from a single Internet application in the early stage to databases, AI, big data, and so on, and covers diverse and dynamic cloud environments such as public cloud, private cloud, and edge cloud.

On November 5, at the 2022 Hangzhou Yunqi Conference, Alibaba researcher and Alibaba Cloud smart cloud native application platform container technology leader Yi Li delivered a keynote speech at the Cloud Native Summit, announcing the comprehensive intelligent upgrade of Alibaba Cloud container services, helping enterprises lean on the cloud, promoting cost reduction and achieving high-quality development of IT architecture on the cloud.

Container services help enterprises digital innovation

After seven years of development, Alibaba Cloud container service product line has become the enterprise's cloud native operating system. Based on the Alibaba Cloud container platform, Alibaba Group has achieved 100% of its business on the cloud.

In 2021, Alibaba Cloud released ACK Anywhere to further expand the product range, covering all scenarios from public cloud, edge cloud, and local data center. Let all places that need cloud capabilities be based on a unified container infrastructure.

Thanks to the large-scale container application practices of Alibaba Group and Alibaba Cloud, Alibaba Cloud container product capabilities have been widely recognized by the industry. In the first quarter of 2022, in the analyst report of the global public cloud container platform released by the authoritative consulting agency Forrester, ACK ranked firmly in the global leader quadrant, which is the first time that Chinese technology companies entered the quadrant; In the second quarter of 2022, in the global container management solution report released by Omida, due to the perfect product system in the public cloud, private cloud, hybrid cloud and other environments, ACK became the global leader, with the product capacity and scale leading in China; In August 2022, in the CSDN 2022 Chinese developer survey report, 52% of domestic developers chose Alibaba Cloud container cloud platform.

In the past few years, cost reduction and efficiency increase have become an important issue for many enterprise IT managers. Enterprises have come to the era of lean cloud use. To improve resource efficiency and R&D efficiency, IT management efficiency becomes the key.

Four new upgrades, Alibaba Cloud container service enters the era of intelligence

Intelligence is the inevitable trend of container platform development. Today, based on the experience of large-scale container combat in the past 10 years, Alibaba Cloud has promoted the comprehensive upgrade of container service ACK to the four dimensions of infrastructure layer, container orchestration layer, application architecture layer and operation governance through data means and intelligent algorithms, and entered a new stage of intelligence.

Upgrade 1: New computing power

At the infrastructure level, new computing power oriented to cloud native optimization is used to improve computing efficiency.

In 2021, Alibaba Cloud released a new generation of cloud native CPU, Rely on Sky 710, based on the ARMv9 architecture, which has been applied in e-commerce and Alibaba Cloud on a large scale, achieving excellent cost performance.

Compared with the X86 chip, the cost performance of typical Web applications is 50% higher and that of video codec applications is 80% higher. Rely on the chip for cloud native optimization, vCPU adopts independent physical core, and there is no performance competition in the hyper-threading architecture. It can provide more deterministic performance.

Through the topology-aware scheduling optimization of the chip microarchitecture, ACK helps improve the web application throughput by 20% compared with the open-source K8s implementation.

In order to better support AI, HPC and other I/O-intensive applications. ACK officially provides support for eRDMA high-performance container network. The network implementation optimized by software and hardware can provide higher bandwidth and lower latency. Application in AI training accelerated by 20%, and microservice throughput increased by 10%;

ACK supports the efficient reuse of eRDMA devices in multiple containers, meeting the requirements of container application deployment density.

In order to better support stateful application containerization, Alibaba Cloud has released a new generation of container network file system CNFS 2.0, which uses full-link acceleration technology to achieve:

• The access of container applications to back-end storage systems is parallelized to improve the utilization of network bandwidth. The throughput of remote NAS storage can be increased by 100% to meet the needs of high-performance AI training and genetic computing.

• Utilizing metadata cache and unique release mechanism, the metadata access performance of remote file storage has been improved by 18 times, which is very suitable for scenarios where Web applications and CI/CD need to access massive small files.

• Transparent lifecycle management of files is supported. Cold data accessed at low frequencies can be automatically placed in low-cost NAS low-frequency media or OSS, reducing storage costs by more than 50%.

It supports the observation of the full link of NAS/CPFS/OSS and helps developers better diagnose and optimize I/O performance problems.

Enterprises and individuals are increasingly concerned about data privacy protection, and confidential computing technology has emerged as the times require. One of the important technologies is data protection through the chip's Trusted Execution Environment (TEE). For applications implemented in TEE, there is no need to worry about threats from other applications, other tenants or platform parties.

In order to further promote the popularity of confidential computing, the Alibaba and Ant teams have cooperated with companies such as Red Hat and Intel in the Kata Container community to combine container computing with trusted execution environment, and launched the confidential container project. At the same time, Intel ® SGX、Intel ® TDX and other different TEE implementations provide a consistent container interface.

Based on the new generation of confidential container architecture, developers can ensure that applications are built and distributed through the trusted software supply chain; The container application runs in a trusted execution environment, has a smaller attack surface, and all in-memory data is encrypted and protected by integrity; Application access to data is a trusted data storage service based on encryption.

The confidential container can provide efficient privacy-enhanced computing power in scenarios requiring private data processing, such as financial risk control, medical health, etc.

Upgrade 2: new platform

At the container layout layer, the resource utilization and operation and maintenance efficiency are improved through a new platform integrating intelligence and cloud edge.

K8s has now become the operating system in the cloud era. We hope to make full use of peak shaving and valley filling among various application loads to improve the resource utilization rate of K8s cluster. This is also what we often call "mixed department" ability.

Alibaba started the research and development of cloud native hybrid technology as early as 2016. After several rounds of technical architecture upgrading and double-11 tempering, Alibaba has now achieved a cloud native hybrid with a total business scale of more than 10 million cores, and the daily CPU utilization rate is about 50%.

Alibaba Cloud opened the cloud native hybrid project Koordinator this year, which includes three core capabilities:

• Differentiated SLO guarantee: provide QoS oriented resource scheduling mechanism on Kubernetes, such as delay-sensitive online tasks and preemptable computing tasks. By properly scheduling different QoS applications, we can improve resource utilization while ensuring the stability of applications.

• QoS aware scheduling: including CPU, GPU topology awareness, resource portrait, hot spot dispersion and other fine scheduling capabilities to help applications optimize runtime performance efficiency and improve stability.

• Task scheduling: It provides task scheduling related to big data and AI, such as batch scheduling, priority preemption and flexible quota, which can support computing tasks more efficiently

The Koordinator project is fully compatible with the standard K8s without any intrusive modifications. ACK also built in Koordinator product support:

• Through mixed dispatching, the resource utilization rate can be improved by 100% in typical scenarios;

• Through differentiated SLO guarantee, the impact of low-priority tasks on delay-sensitive tasks is less than 5% while improving resource utilization.

The complexity of Kubernetes is an important factor that hinders many customers' adoption. To this end, ACK released the AIOps suite to achieve fault prevention and rapid positioning through intelligent means.

Based on Alibaba's 10 years of experience in large-scale container operation and maintenance, it provides three major functions, including full-stack patrol inspection, upgrade inspection and intelligent diagnosis, through the combination of expert system and AI algorithm.

Intelligent diagnosis currently includes 200+diagnostic items, covering 90% of common problem scenarios. Taking the diagnosis of container network problems as an example, the troubleshooting link is long, complex and time-consuming.

In the business scenario of DOW, intelligent diagnosis can quickly locate the occasional jitter problem caused by network stack exceptions; Another example is e-sign. With intelligent diagnosis, you can complete the full-link troubleshooting of Ingress, container network, application and OS at the minute level.

ACK One is a distributed container platform for multiple regions and clusters, which can uniformly manage the K8s clusters of central cloud, local cloud, edge cloud and customer IDC. This year, Alibaba Cloud released the following functions on the basis of ACK One:

• Provide hosted ArgoCD service, and developers can realize cross-regional automated delivery of applications through GitOps;

• Realize through the elastic perception scheduler to provide flexible elastic computing power for hybrid cloud scenarios;

• Support the unified management of multi-cluster security policies and ensure the unified security baseline of enterprise systems

Let's look at a specific case. In the peak period of Zhaopin recruitment, with the help of the ACK One flexible scheduling strategy, tens of thousands of core ECS, ECI and other computing resources can be popped up in a few minutes to supplement the online service cluster of IDC, effectively responding to the flood of traffic.

ACK@Edge It is a container application platform for cloud edge and end collaboration. It is based on the OpenYurt project, which is open source by Alibaba Cloud ACK@Edge Make a new upgrade:

In the cloud-side collaboration scenario, Alibaba Cloud has launched an enhanced network edge node pool to achieve a secure and stable cloud-side network interconnection solution. Lilith, a well-known domestic game company, makes use of the enhanced edge network node pool to make overseas multi-regional servers and cloud VPC secure interoperability. The cost of network resources is reduced by 30% compared with that of private lines.

In the cloud collaboration scenario, Alibaba Cloud has launched the lightweight access function, which can manage container applications on resource-constrained devices through K8s. Yuan Rong Qixing is an automatic driving startup company ACK@Edge Manage the application of on-board equipment, reduce the cost of access resources by 50%, and improve the efficiency of release operation and maintenance by more than 60%.

Upgrade 3: New architecture

At the application architecture level, new architectures such as service grid are used to improve the agility, flexibility and resilience of applications.

The service grid has become the network infrastructure of cloud native applications. Alibaba Cloud Service Grid Service ASM has been upgraded in four dimensions:

• Support a variety of service governance frameworks, which can realize the interconnection and smooth migration of micro-service applications such as Spring Cloud, Apache Dobbo and service grid applications;

• Provide uniform identity definition for application services, and simplify the construction and implementation of zero trust policy;

• Provide the out-of-the-box Envoy plug-in market, which can expand the application scenarios of the service grid, including identity authentication, AI Serving and other scenarios;

• Achieve more accurate elastic expansion and contraction based on service SLA indicators

Zhenkunxing Industrial Supermarket is a digital industrial supplies service platform, which protects many enterprises from returning to work and production. With the help of ASM service grid, the performance of the platform has been improved. ASM-based software and hardware optimization technology improves TLS handshake performance by 75% and QPS by 30+%. For relevant technical details, please refer to the technical white paper jointly prepared by Intel and Alibaba Cloud (click to read the original text to download).

Focusing on the digitalization of the catering industry, Hekuo Smart Cloud has switched 100% of its core production systems in the business center to the service grid ASM, improving the application release efficiency by 70% and reducing the abnormal troubleshooting cost by 80%.

Upgrade 4: New Practice

In the field of operation governance, it will improve the management efficiency of enterprise IT in cost management, security governance and other aspects through a series of best practice product precipitation.

In order to help enterprises make good use of the cloud and manage the cloud, Alibaba Cloud has released the cloud native landing zone this time, providing best practices for enterprises to use the cloud natively. It includes eight modules, including architecture planning, security management, financial management, and automated operation and maintenance.

Through the best practices of cloud native landing zone, it has helped many domestic and foreign enterprise customers build a cloud architecture to meet their business demands for security, stability and cost.

Centering on the financial management part of LandingZone, Alibaba Cloud launched the ACK FinOps suite in combination with the practice of industry-finance integration and the concept of FinOps. Help enterprises realize cost visualization, optimization and control through digital and intelligent methods.

As a leader in the domestic Internet finance industry, China Insurance has shortened the corporate IT cost governance cycle from quarter to day through the ACK FinOps suite, and the resource idle rate has been reduced from 30% to less than 10%.

The identification team has improved the resource utilization rate of the cluster by 10% through the application of hybrid and elastic technology optimization; The overall calculation cost is reduced by more than 20%.

Around the security protection part of LandingZone, ACK and ACR provide complete product capabilities of DevSecOps to provide enterprises with safe and reliable software supply chain.

This year, Alibaba Cloud launched a cluster container security overview, which can help the security administrator to have a better sense of the cluster security water level, and can timely detect and deal with the security risks during cluster configuration, application image, and container operation.

Salesforce, the world's leading SaaS manufacturer, provides advanced CRM service applications on Alibaba Cloud. Based on the cloud native DevSecOps capability, thousands of risk image interceptions and blockages and 10000 workload deployment policy blockages have been realized within half a year. Based on the fully automated software supply chain security process, the application security delivery efficiency has been increased by three times.

In the future, we hope that more enterprises can work with Alibaba Cloud to make use of cloud native technology to improve efficiency and reduce costs, and carry out business innovation in the cloud.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us