Community Blog Three Evolution Directions of Cloud Application Architecture

Three Evolution Directions of Cloud Application Architecture

Cai Junjie, the Elastic Computing Chief Architect of Alibaba Cloud Intelligence, shares his industry experience in the cloud architecture field.

By Alibaba Cloud ECS


On December 10, at the 2021 ECS CloudBuild Summit, Cai Junjie, the Elastic Computing Chief Architect of Alibaba Cloud Intelligence, delivered a speech entitled Reliable, Agile, and Intelligent – Evolution of Cloud Application Architecture. He shared his industry experience in the cloud architecture field. The following part is the highlights of his speech.

1. Face Opportunities and Challenges and Use Technology to Drive Business Innovation

Today, every industry is facing different opportunities and challenges.


People's lifestyles and the production mode of society have experienced comprehensive digital transformation. Traditional enterprises and Internet enterprises have seen an online transformation in their production systems, office systems, commercial sales, and customer interactions. For example, transactions in takeout platforms or travel transportation systems can be completed online.

The external environment faced by enterprises is changing rapidly. For example, the preferences and needs of consumers are continuously changing with the consumption level and the broader environment. Many retail enterprises need to speed up the launch of products and improve the core competitiveness of products. This is also the case for To C Internet enterprises. At the Apsara Conference in October 2021, Inke, one of Alibaba Cloud's customers, shared that it launched new applications up to once a week.

What's more, the competition for application update speed is becoming fiercer. These challenges are accompanied by the unpredictable pandemic, regulatory policies, geopolitics, and other factors.

New technologies, such as artificial intelligence, 5G, and big data, have also given enterprises capabilities and tools to innovate and realize transformation and development.

The digitization of an enterprise means the key businesses have all been placed on the IT system. Therefore, the system needs to be stable and reliable enough. The rapidly changing market and the fiercely competitive environment also require more agile R&D efficiency and application architecture to support business innovation, thus helping enterprises gain advantages in market competition. It is also important to make good use of new technologies, such as AI and big data, to optimize business management and operations and realize intelligent business upgrades.

All of the above need an excellent IT architecture. An excellent architecture should be reliable, agile, and intelligent.


1.1 How Does the Cloud Help Enterprises Build an Excellent IT Architecture?

Each architecture model has its suitable scenarios. The different stages of the enterprise, the amount of human resources, and the level of developer skills will affect the choice of architecture. There is no such thing as the best architecture, only the most suitable architecture. Besides, the architecture needs continuous evolution to support business development.


In terms of business requirements and technology trends, architecture development should be more reliable, agile, and intelligent.

More enterprises are paying more attention to the construction of highly available architectures. They use various methods to improve application reliability, such as active-active, multi-zone, multi-region, and chaotic engineering. Microservices and Serverless are also hot topics in recent years. They are more agile than previous IT architectures. They can also be called more reliable architectures to some extent. At the same time, the emergence of the mobile Internet and the Internet of Things has also led to a big explosion of data. Scenarios of big computing demands are also increasing, such as big data and AI.

1.2 How Can Cloud Computing Help Customers Build a Reliable, Agile, and Intelligent Architecture?

1.2.1 Reliability

The reliability can be divided into the reliability of the infrastructure layer and the reliability of the application layer.


We can build something more reliable with a solid foundation. This is a popular saying among architects. Alibaba Cloud provides reliable basic resources. It is the first cloud vendor in the world to provide 99.975% single-instance availability SLA and 99.995% multi-zone and multi-instance availability. All of them are due to exclusive Alibaba Cloud Apsara, which has a large number of technological innovations, such as intelligent fault prediction capability with an accuracy rate of more than 70% and hot migration technology imperceptible to 95% of customers.

Alibaba Cloud's in-house X-Dragon architecture also allows a smooth and stable performance over the entire system. Customers can enjoy high performances when using Alibaba Cloud products.

The availability of a single instance and a single zone can only guarantee the high reliability of the current region but cannot resist regional failures caused by extreme weather or optical fiber breakage. Therefore, customers need to use disaster recovery solutions, such as multi-zone and multi-region deployment, to ensure high availability at the application layer.

We recommend using mature managed products, such as databases and middleware products. Alibaba Cloud invested a lot in these products, which are more reliable and convenient than self-built products. Ultimately, a highly available application must be a comprehensive architecture and a combination of highly reliable basic resources, highly stable hosting, and highly available design for the application itself.


In addition to its high reliability, the infrastructure needs to be transparent and open. After migrating to the cloud, many customers feel that the infrastructure layer has become a black box. Therefore, they require providers to tell them what is happening in the underlying infrastructure to conduct better active O&M.

This requirement is quite reasonable. Therefore, ECS will open the information at the infrastructure layer to users as much as possible by encapsulating it into different interfaces and events. For example, users can obtain the latest information about cloud servers, operating systems, and other infrastructure at any time. If the system anticipates that the customer's machine may be down or detects that the CPU and memory usage reaches the warning threshold, it will send an event. The customer can choose to subscribe.

According to users' feedback, the most attractive feature that drives them to use Alibaba Cloud products for a long time is that Alibaba Cloud provides comprehensive and diverse interfaces. Alibaba Cloud's interfaces are by far the most comprehensive, abundant, and detailed in China and among the best worldwide.

1.2.2 Agility

The world is changing too fast. The only way to deal with changes is to become faster than changes.

This requires an extremely agile architecture. Similarly, enterprises need to implement agility at two layers: agility at the application layer and agility at the infrastructure layer.


Enterprises are all making efforts to build agile and flexible organizational and software architectures. One of their goals is to reduce dependencies among R&D teams and make iterative evolution easier. So, enterprises are starting to adopt the microservice model, which is also an application architecture in line with cloud-native trends.

The agility of application architectures relies on the agility of infrastructure. The more agile the architecture, the faster it can respond to unusual business peaks, and the smoother the processing process. It can reduce costs and provide an unmatched customer experience. The instantaneous traffic, such as that in 12306 and Weibo trending, can barely be delivered and deployed in offline data centers in a short time, no matter how agile the upper layer architecture is.

Agile infrastructure requires two things: agile delivery and efficient management.


As a leading infrastructure vendor, the first thing that Alibaba Cloud needs to achieve is fast delivery.

Alibaba Cloud elastic computing provides a variety of out-of-the-box basic resources. Cloud servers have hundreds of specifications and provide extreme elasticity. In July 2020, Alibaba Cloud became the first (and only) cloud vendor to pass the performance test of large-scale cloud platforms held by the China Academy of Information and Communications Technology (CAICT). The CAICT staff watched Alibaba Cloud expand 10,000 cloud servers in 18 minutes, and it was not their fastest time.

At the Apsara Conference in October 2021, the R&D staff of Alibaba Cloud Elastic Container Instance (ECI) demonstrated the expansion of 3,000 PODs in six seconds. The powerful auto scaling capability of Alibaba Cloud elastic computing allows customers to quickly deliver and deploy underlying resources to cope with peak traffic or scale out for new businesses.

Alibaba Cloud provides a wide range of billing modes for different resource delivery methods to balance flexibility and cost-effectiveness.


If customer want to be truly agile, they need to be efficient when managing and using computing resources. This requires various automation capabilities. Alibaba Cloud provides a complete set of automated O&M tools, such as migration, deployment, O&M, and capacity management, covering the entire lifecycle of resources.

For example, Alibaba Cloud Resource Orchestration Service (ROS) can automate the deployment of tens of thousands of cloud servers. During the pandemic, DingTalk used ROS to complete the deployment of more than 10,000 cloud servers in just two hours, ensuring smooth business operation during the traffic peak.

1.2.3 Intelligence

All-round intelligence includes intelligence at the business application layer and intelligence at the infrastructure layer.


At the business layer, enterprises need to introduce machine learning, big data, and other necessary technologies according to their businesses' needs to realize intelligent customer service, autonomous driving, and other capabilities. These capabilities require mass data and computing power as the basis. To this end, Alibaba Cloud elastic computing provides tailor-made big data and local disk instances and GPU and NPU instances for these scenarios, providing the most suitable infrastructure for upper-layer business innovations.

Alibaba Cloud provides a wide range of AI services, machine learning, and big data frameworks at the PaaS layer. Customers can build upper-layer application intelligence with ease.

Alibaba Cloud's scheduling system, fault prediction, and O&M system have widely used artificial intelligence technologies at the infrastructure layer, making Alibaba Cloud the world's leading IaaS technology platform. At the same time, Alibaba Cloud uses artificial intelligence technology to provide customers with a smarter infrastructure and increase the user experience.


The latest prediction mode of Alibaba Cloud Auto Scaling products can perform modeling based on the CPU usage and inbound and outbound traffic in the internal network of the user scaling group in the last 14 days. It can also use machine learning algorithms to predict the overall usage in the next two days and perform scaling operations automatically. With this feature, regular scaling will not be a problem for customers.

Another feature is Alibaba Cloud's intelligent diagnosis and self-service repair tools. When users encounter ECS-related problems, they can submit tickets or contact customer service. The process takes a long time, which affects the user experience to a certain extent. The instance health diagnosis tool, which uses the NLP and other AI capabilities of the backend, can help users locate possible problems inside and outside ECS. It also provides corresponding repair solutions, shortening the problem-solving cycle from 24 hours to several minutes. Alibaba Cloud is the first cloud vendor in the cloud server field to provide a full-coverage diagnosis capability for users.

Another example is Alibaba Cloud's intelligent resource optimization service. It can identify resource mismatches for users based on their resource usage and recommend suitable instances for users based on their business loads. If the customer's resource utilization rate is low for a long time, the costs will increase. If the CPU load continues to be high, business instability may occur. This service will recommend upgrading the configuration or adding new resources.


On the whole, Alibaba Cloud elastic computing is not just a platform providing computing resources but has evolved into a cloud platform that supports services for the full application lifecycle. Alibaba Cloud helps customers build reliable, agile, and intelligent cloud architectures through strong, reliable, and full-scenario cloud servers, efficient and intelligent automated O&M suites, and flexible and elastic resource supply. In 2021, Alibaba Cloud launched Wuying cloud computer for office scenarios and Compute Nest to help partners migrate services to the cloud.


Making good use of the cloud to build an excellent application architecture brings many values to the business. It ensures business continuity and smooth operation, reduces business risk and costs, and improves the efficiency and happiness of teams.

Cai Junjie shared two practical cases at the end of his speech.


Originally, Shentong Express used physical data centers as the computing and data storage platform. The demand for resources during Double 11 will expand. After the promotion, the resources will be idle. After the cloud migration, almost all resources are purchased on a pay-as-you-go basis and released after Double 11. This is the real out-of-the-box feature without wasting resources. Compared with the traditional IDC architecture solution used during Double 11 in 2019, the X-Dragon bare metal server + container service solution during Double 11 in 2020 helped Shentong Express significantly increase business volume, while reducing 30% of the IT investment.

Geely Automobile's efficiency improved 20% based on the same hardware using X-Dragon Super Computing Cluster (SCC). The task queuing time was shortened by about three times. The cluster scale could automatically scale with the business. In the end, the simulation efficiency was improved by nearly 30%, and the time of model design and market launching was shortened by several months.

2. From Cloud Migration to Making Good Use of the Cloud - Grasp the Technology Dividend


Cloud Migration Has Become a Consensus in the Industry.

Cloud computing has been developed for more than ten years, but it is still at its early stage. We have observed that many customers have not taken full use of the advantages of the cloud. For example, the O&M field is where the cloud has the greatest impact, but customers are still in the semi-manual and semi-automated stage. Therefore, the focus of many enterprises has changed from cloud migration to making good use of the cloud. We believe making good use of the cloud will unleash huge technological dividends for enterprises over the next ten years.

0 0 0
Share on

Alibaba Cloud Community

900 posts | 201 followers

You may also like


Alibaba Cloud Community

900 posts | 201 followers

Related Products