Application Architecture Evolution in the Cloud Native Era

Background

Review the evolution of application service architecture. From the perspective of the processing methods of service callers and providers, it can be divided into three stages.

The first stage is centralized load balancing, which means that the service caller routes to the corresponding service provider through an external load balancing. Its obvious advantages are non-invasive to the application itself, and it can support multilingual and multi-framework development to implement the application itself. Load balancing is uniformly and centrally managed, and the entire deployment is simple. However, the disadvantage is also significant, as it is centralized, resulting in limited scalability, and the service governance capabilities of this centralized load balancing are relatively weak.

The second stage refers to the distributed governance of microservices, that is, the built-in governance capabilities of service callers are integrated into applications in the form of SDK libraries. The advantages are good overall scalability and strong service governance capabilities, but at the same time, we will note its disadvantages, including intrusion into the application itself, difficulty in supporting multiple languages due to dependence on the SDK, and complexity brought about by distributed management deployment.

The third stage is the current service grid technology. By sidecar these service governance capabilities, it is possible to understand and couple the capabilities of service governance with the application itself, which can better support multiple programming languages, while these sidecar capabilities do not need to rely on a specific technical framework. These Sidecar agents form a mesh of data planes through which traffic between all services is processed and observed. The control faces unified management of these Sidecar agents. But this brings a certain degree of complexity.

The following diagram shows the architecture of the service grid. As mentioned earlier, under the service grid technology, each application service instance is accompanied by a Sidecar proxy, and the business code is not aware of the existence of Sidecar. This Sidecar agent is responsible for intercepting application traffic, and provides three major functions: traffic governance, security, and observability.

In the cloud native application model, an application may contain several services, each composed of several instances, and the Sidecar agents of hundreds or thousands of applications form a data plane, which is the data plane layer in the figure.

How to uniformly manage these Sidecar agents is a problem to be solved in the control plane part of the service grid. The control plane is the brain of the service grid, responsible for issuing configurations for the Sidecar agent of the data plane, managing how the components of the data plane execute, and providing a unified API for grid users to easily manipulate grid management capabilities.

Generally speaking, after enabling the service grid, developers, operation and maintenance personnel, and SRE teams will solve application service management issues in a unified and declarative manner.

Cloud native application infrastructure supported by service grid

As a basic core technology for managing application service communication, service grid provides secure, reliable, fast, and application insensitive traffic routing, security, and observability for calls between application services.

It can be seen that the cloud native application infrastructure supported by the service grid brings important advantages, which are divided into six aspects.

Advantage 1: Unified governance of heterogeneous services

• Multilingual and multi framework interoperability and governance, and a dual mode architecture integrated with traditional micro service systems

• Refined multi-protocol traffic control, unified management of east-west and north-south traffic

• Automated service discovery for a unified heterogeneous computing infrastructure

Advantage 2: end-to-end observability

Integrated intelligent operation and maintenance integrating logging, monitoring, and tracking

• Intuitive and easy-to-use visual grid topology, color based health recognition system

• Built in best practices, self-service grid diagnostics

Advantage 3: Zero trust security

• end-to-end mTLS encryption, attribute based access control (ABAC)

• OPA declarative policy engine, globally unique workload identity

• Complete audit history and insight analysis with dashboards

Advantage 4: Combination of soft and hard performance optimization

• The first service grid platform based on Intel Multi Buffer technology to enhance TLS encryption and decryption

• NFD automatically detects hardware features and adaptively supports features such as AVX instruction set and QAT acceleration

• First batch of advanced certification through trusted cloud service grid platform and performance evaluation

Advantage 5: SLO driven application flexibility

• Service Level Objective (SLO) strategy

• Automatic elastic scaling of application services based on observable data

• Automatic handover and fault tolerance under multiple cluster traffic bursts

Advantage 6: Out of the box expansion&ecological compatibility

• Out of the box EnvoyFilter plug-in market, WebAssembly plug-in lifecycle management

• Unified integration with Proxyless mode, supporting SDK and kernel eBPF modes

• Compatible with the Istio ecosystem, supporting Serverless/Knative, AI Serving/KServe

The following figure shows the current architecture of the service grid ASM product. As the industry's first fully hosted Istio compatible service grid product, ASM has been architecturally consistent with community and industry trends from the beginning. The components of the control plane are hosted on the AliCloud side, independent of the user clusters on the data side. ASM products are customized and implemented based on community open source Istio, providing component capabilities to support refined traffic management and security management on the managed control side. The managed mode decouples the lifecycle management of Istio components and the managed K8s cluster, making the architecture more flexible and improving the scalability of the system.

In becoming an infrastructure for unified management of multiple heterogeneous types of computing services, the Managed Service Grid ASM provides unified traffic management capabilities, unified service security capabilities, unified service observability capabilities, and unified agent scalability capabilities based on WebAssembly, thereby building enterprise level capabilities.

How to Develop the Next Station of Service Grid Technology

To sum up, the integration of Sidecar Proxy and Proxyless modes means that the same control surface supports different data surface shapes. The same control surface refers to using the ASM hosting side component as a unified standard form of control entry. This control surface runs on the Alibaba Cloud side and belongs to the hosted hosting mode.

The data plane supports the integration of Sidecar Proxy and Proxyless modes. Although the data plane components are not hosted, they are also managed, which means that the lifecycle of these components is also managed uniformly by the ASM, including distribution to the data plane, upgrading, and unloading.

Specifically, in the Sidecar Proxy mode, in addition to the current standard Envoy proxy, our architecture can easily support other Sidecars, such as Dapr Sidecar. Currently, Microsoft OSM+Dapr uses this dual Sidecar mode.

In Proxyless mode, in order to improve QPS and reduce latency, SDK methods can be used. For example, gRPC already supports xDS protocol clients, and our Dubbo team is also on this path. I think this year, I and the Northern Latitude team can make some breakthroughs on this point together.

Another proxy mode refers to the kernel eBPF+node level proxy mode. This mode is a fundamental change to the sidecar mode. A node has only one proxy and has the ability to offload to the node. In this part, we will also launch some products this year.

Around service grid technology, there is a series of application centric ecosystems in the industry. Among them, Alibaba Cloud Managed Service Grid ASM supports the following multiple ecosystems. Listed below:

Life Cycle Management and DevOps Innovation in Modern Software Development

The core principles of service grid (security, reliability, and observability) support the lifecycle management of modern software development and DevOps innovation, providing flexibility, scalability, and testability for how to conduct architecture design, development, automated deployment, and operation and maintenance in a cloud computing environment. It can be seen that the service grid provides a solid foundation for handling modern software development, and any team building and deploying applications for Kubernetes should seriously consider implementing the service grid.

One of the important components of DevOps is the creation of continuous integration and deployment (CI/CD) to deliver containerized application code to production systems faster and more reliably. Enabling canary or blue-green deployments in the CI/CD pipeline can provide more powerful testing for new application versions in production systems and adopt a secure rollback strategy. In this case, the service grid helps with canary deployment in production systems. Currently, Alibaba Cloud Service Grid ASM supports integration with ArgoCD, Argo Rollout, KubeVela, Cloud Effects, Flagger, and other systems to achieve blue-green or canary release of applications, as follows:

The main responsibility of ArgoCD [1] is to monitor changes in the application orchestration in the Git warehouse, compare the actual running status of applications in the cluster, and automatically/manually synchronize and pull changes in the application orchestration into the deployment cluster. How to integrate ArgoCD into Alibaba Cloud Service Grid ASM for application release and update, simplifying operation and maintenance costs.

Argo Rollouts [2] provides more powerful deployment capabilities for blue-green and canary. In practice, the two can be combined to provide incremental delivery capabilities based on GitOps.

KubeVela [3] is an out of the box, modern application delivery and management platform. Using service grid ASM combined with KubeVela can achieve progressive grayscale publishing of applications, achieving the goal of smoothly upgrading applications.

Alibaba Cloud Cloud Efficiency Pipeline Flow [4] provides blue and green release of Kubernetes applications based on Alibaba Cloud Service Grid ASM.

Flagger [5] is another progressive delivery tool that automates the release process of applications running on Kubernetes. It reduces the risk of introducing new software versions into production by gradually shifting traffic to new versions while measuring metrics and running consistency tests. Alibaba Cloud Service Grid ASM already supports this progressive publishing capability through Flagger.

Microservice framework compatibility [6]

Support the seamless migration of Spring Boot/Cloud applications to the service grid for unified management and governance, and provide the ability to solve typical problems encountered during the integration process, including common scenarios such as how services inside and outside the container cluster are interconnected, and how different language services are interconnected.

Serverless container and automatic scaling based on traffic mode [7]

Serverless and Service Mesh are two popular cloud native technologies, and customers are exploring how to create value from them. As we delve into these solutions with our customers, issues often arise regarding the intersection of these two popular technologies and how they complement each other. Can we use Service Mesh to protect, observe, and expose our Knative serverless applications? Support for Knative based serverless containers on a managed service grid ASM technology platform, as well as automatic scaling capabilities based on traffic patterns, which can replace how to simplify the complexity of user maintenance of the underlying infrastructure through a managed service grid, allowing users to easily build their own serverless platform.

AI Serving[8]

Kubelow Serving is a community project led by Google that supports machine learning based on Kubernetes. Its next generation name is changed to KServe. The purpose of this project is to support different machine learning frameworks through cloud native methods, and achieve traffic control and model version update and rollback based on the service grid.

Zero Trust Security and Policy As Code [9]

On top of using Kubernetes Network Policy to achieve three-layer network security control, Service Grid ASM provides OPA (Open Policy Agent) based policy control capabilities that include peer to peer authentication and request authentication capabilities, Istio authorization policies, and more granular management.

Specifically, building a zero trust security capability system based on the service grid includes the following aspects:

• The foundation of zero trust: workload identity; How to provide a unified identity for cloud native workloads; ASM products provide simple and easy-to-use identity definitions for each workload under the service grid, and provide customized mechanisms to extend the identity building system based on specific scenarios, while being compatible with the community SPIFFE standard;

• Carrier of zero trust: Security certificates. ASM products provide mechanisms for how to issue certificates, manage their lifecycle, and rotate them. Identity is established through X509 TLS certificates, which are used by each agent. And provide certificate and private key rotation;

• Zero trust engine: Policy execution. A policy based trust engine is the key core of building zero trust. In addition to supporting Istio RBAC authorization policies, ASM products also provide more granular authorization policies based on OPA;

• Zero trust insight: Visualization and analysis. ASM products provide an observable mechanism for monitoring policy execution logs and indicators to judge the execution of each policy, etc;

Transforming into a cloud native application brings a lot of business value, one of which is elastic scaling, which can better cope with peak and trough traffic, achieving the goal of reducing costs and improving efficiency. The service grid ASM provides a non intrusive ability to generate telemetry data for communication between application services, and metric acquisition does not require modifying the application logic itself.

According to the four golden indicator dimensions monitored (latency, traffic, error, and saturation), the service grid ASM generates a series of indicators for managed services, supporting multiple protocols, including HTTP, HTTP/2, GRPC, and TCP.

In addition, the service grid has more than 20 built-in monitoring tags, supports all Envoy proxy indicator attribute definitions, the Common Expression Language CEL, and supports customized Istio generated indicators.

At the same time, we are also exploring new scenarios for broadening service grid driving. Here is an example of AI Serving [10].

This demand source also comes from our actual customers, whose use scenario is to run KServe on top of service grid technology to implement AI services. KServe runs smoothly on the service grid, achieving capabilities such as blue/green and canary deployment of model services, and traffic distribution between revisions. Supports automatically scalable Serverless inference workload deployment, high scalability, and intelligent load routing based on concurrency.

Summary

As the industry's first fully hosted Istio compatible Alibaba Cloud service grid product, ASM has been architecturally consistent with community and industry trends from the beginning. The components of the control plane are hosted on the Alibaba Cloud side, independent of the user clusters on the data side. ASM products are customized and implemented based on community Istio, providing component capabilities to support refined traffic management and security management on the managed control surface side. The managed mode decouples the lifecycle management of Istio components and the managed K8s cluster, making the architecture more flexible and improving the scalability of the system.

Starting from April 1, 2022, Alibaba Cloud Service Grid ASM has officially launched a commercial version, providing richer capabilities, larger scale support, and more complete technical support to better meet different customer demand scenarios.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us