Community Blog The Practice of Service Governance on Dubbo 3.0.0

The Practice of Service Governance on Dubbo 3.0.0

This article introduces Dubbo 3.0.0 and explains its details thoroughly.

By Shimian

1. An Introduction to Dubbo 3.0.0

Since Apache Dubbo was open-sourced in 2011, many large Internet and IT companies have gained experience in Dubbo through their practices over the years. Dubbo has gained a lot of recognition and has become one of the popular mainstream RPC frameworks inside and outside China for its user-friendliness, rich functions, and strong governance capabilities for Java.

However, the advent of the cloud-native era posed new challenges to Java microservice governance systems represented by Apache Dubbo and Spring Cloud. Applications are expected to start faster, the protocol penetration of application communication should be higher, and it should support more languages. For example, Spring launched its GraalVM-based Spring Native Beta solution in 2021. It has a millisecond-level startup capability and higher processing performance.

This proposed two major requirements for the next generation of Apache Dubbo:

1) It should retain the benefits brought by the existing out-of-the-box features and implementation practices. This is expected by many developers.

2) It should follow cloud-native ideas as much as possible to reuse the underlying cloud-native infrastructure and fit the cloud-native microservices model.

Dubbo 3.0.0 was born in the cloud-native era. Microservices built with Dubbo follow cloud-native ideas. They can reuse the underlying cloud-native infrastructure and fit the cloud-native microservices model better. This is reflected in the following aspects.

  • Service is allowed to be deployed on containers and Kubernetes platforms, and the service lifecycle can be aligned with the platform scheduling cycle.
  • The classic microservice architecture Service Mesh is supported, and Proxyless Mesh architecture is introduced to simplify the implementation and reduce the migration costs of Mesh. This provides more flexible choices.
  • As a bridge layer, Dubbo supports intermodulation and interconnection with heterogeneous microservice systems, such as SpringCloud and gRPC.

Under the cloud-native background, Apache Dubbo 3.0.0 chose to fully embrace cloud-native, upgraded the Dubbo architecture, and proposed a new service discovery model, next-generation RPC protocol, and cloud-native infrastructure adaptation.

2. Dubbo 3.0.0 Commercial Edition

I will introduce three cloud services related to microservice governance on Alibaba Cloud below: EDAS, MSE, and SAE

  • Enterprise Distributed Application Service (EDAS)

EDAS is an aPaaS service of Alibaba Cloud and an all-in-one deployment and release platform. It provides a series of capabilities, such as microservice governance, monitoring, stress testing, and throttling and degradation. It is also an AIOps platform.

EDAS 3.0 is the preferred online hosting platform for distributed architectures, enterprise digitalization, and cloud migration. It provides intelligent and automated solutions for user applications in multiple dimensions, such as microservice governance, application release and change, and intelligent O&M.

"At the PaaS level, we always embrace open-source technologies and remain compatible with the community edition in a timely manner. In terms of enterprise features, such as service governance and application monitoring, we provide a stable and mature service to lower the threshold for enterprises to build Internet-based applications. EDAS 3.0 is such a service." said Jiang Jiangwei, partner of Alibaba and Senior Researcher of the Intelligent Basic Products Division of Alibaba Cloud

For more information about EDAS, please click here

  • Microservice Engine (MSE)

MSE is an all-in-one microservice platform for the mainstream open-source microservice ecosystem. It helps users use open-source microservice technology to build microservice systems more stably, conveniently, and at a lower cost. MSE provides a registry center, fully hosted configuration center (compatible with Nacos, ZooKeeper, and Eureka), gateways (compatible with Zuul, Kong, and Spring Cloud Gateway), and enhanced service governance capabilities with non-intrusive open-source features.

Please stay tuned for more information about the MSE international webpage!

  • Serverless Application Engine (SAE)

SAE is the first application-oriented Serverless PaaS service, providing an all-in-one application hosting solution with lower costs and higher efficiency. SAE supports zero transformation for Spring Cloud, Dubbo, and HSF applications to migrate to the cloud. It provides capabilities, such as monitoring, diagnosis, automatic image building, Java complete-process acceleration, multiple release policies, and automatic elasticity within seconds. It also supports the deployment of applications through Jenkins, Apsara DevOps, or plug-ins, and applications in any language can be deployed through Docker images.

For more information about SAE, please click here

All the service governance capabilities of the three cloud services above are out-of-the-box and support all open-source Dubbo and Spring Cloud frameworks on the market in the past five years, including Dubbo 3.0.0. You do not have to modify any code or configuration to use these capabilities. You only need to connect your Dubbo 3.0.0 applications to EDAS, MSE, or SAE.

2.1 The Complete Details about Service Contracts

In the process of governance, service contracts are the basis for all functions. When verifying its availability, we need to know the method provided by the service, fill the template according to the method parameters automatically, and then perform testing. When configuring the traffic rules, we need to know the method parameters to configure the traffic rules according to the traffic characteristics. When configuring service degradation and service authentication, we need to adopt different degradation and authentication policies for different methods with different parameters and return values according to the specific name and parameter type of the method.

The open-source Swagger is doing well, but the service contract of MSE is simpler and more efficient. The open-source Swagger only needs to introduce dependency, configure @API annotations when coding, and start a Swagger Server to see the details. However, it has not adapted to Dubbo.

We need to make users use it without modifying any code or configuration, and application code and images of earlier versions must also be supported. If development is needed, the iteration efficiency will be affected due to high intrusive features. We chose the Agent plan. Thus, we only need to connect the application to One Agent without modifying any code and configuration to see the details of microservices on the MSE console.


If the application has been connected to Swagger, we can also ensure good compatibility. Finally, the section below shows what you can do with service contracts, which now supports Spring Cloud and Dubbo applications:

  • Check which services are registered by the application and the consumers of these services in detail
  • Check all the microservice methods provided by the application in detail
  • Check return values and parameters of all methods provided by the application in detail
  • Service testing, service degradation, and service authentication can directly obtain service contract data for subsequent governance rule configuration.

2.2 Traffic Control of the Complete Process

Currently, open-source does not support traffic to accurately hit the grayscale version of an application in the complete process and control the traffic to flow as accurately as expected in the complete process in a microservice scenario. However, we often encounter the following scenarios, making it necessary to solve the demand for complete-process traffic control.

2.2.1 Isolate the Project and Test Environment

First, we create new project environments and give each project environment a unique project label. When traffic contains this project label, it will be routed to the project environment. Otherwise, it will be routed to the backbone environment. The benefit of project environment isolation is that each developer can have their own project environment. This will avoid interference from each other in development.

2.2.2 Achieve Complete-Process Grayscale

First, we divide the grayscale machines and deploy the grayscale version for all online applications. All grayscale traffic enters the grayscale version, and normal traffic enters the production version. The grayscale version only verifies grayscale traffic, which reduces risks. When we want to release N applications in gray release, grayscale traffic needs to be routed among these N applications with the grayscale version.

The following figure shows that traffic control allows our development personnel to deploy their applications in development environment 1 and environment 2. This achieves environment isolation and complete-process grayscale.


Without complete-process traffic control, various development, grayscale, and production environments must be isolated logically or physically, which requires the deployment of N complete sets of microservices architectures. The costs will be very high.


As we can see, the solution based on complete-process traffic control shown in the figure above is more feasible. We provide the capability to control complete-process traffic with out-of-the-box features. Now, I will introduce the complete-process traffic control function by taking the scenario of placing orders in the e-commerce architecture as an example.

After a customer places an order, the traffic comes in from the ingress application (or microservice gateway.) The ingress application calls the transaction center, the transaction center calls the commodity center, and then the commodity center calls the downstream inventory center. The transaction center and the commodity center are running in new versions V1.0 and V2.0. Grayscale verification is needed for the two versions. At this time, it is expected to route the request traffic that meets specific traffic control rules in the ingress application (or microservice gateway) to applications of the new version. All the remaining traffic is routed to online applications, which is the official version.


We only need to create the following complete-process traffic control rules in the EDAS console:


We also provide a monitoring dashboard for traffic control, which allows you to view the QPS metric of each application in real-time to check whether the traffic trend meets expectations.


2.3 Tag Routing

EDAS/MSE service governance provides the traffic control capability of tag routing. Each pod/ecs can be tagged. After the tag is recognized, it will be displayed on the console. Then, we can set the proportion and content rules for the tag.


We can set the traffic proportion for each tag:


We can set the content traffic rules for each tag:


If there is a complete-process demand, the "Whether to pass through" switch above can be used to pass through tags.

2.4 Service Testing with Out-of-the-Box Features

Service testing provides users with a private network Postman on the cloud. This allows users to call their own services easily. Users do not need the protocols of relational services and self-built testing tools to understand the complex network topology on the cloud. They only need to use the console to call services. It supports the Dubbo 3.0.0 framework and the mainstream Triple protocol of Dubbo 3.0.0.


2.5 Removal of Outlier Instance

In microservice architecture, service calls are affected if consumers cannot perceive the exceptions on the application instances of a provider in a timely manner. This affects the performance and availability of the services provided for consumers.


As shown in the figure above, the system includes applications A, B, C, and D. Application A can call application B, C, and D. Some calls fail if application A cannot perceive the abnormal instances of application B, C, or D. As shown in the figure above, application B has one abnormal instance, and application C and D have two abnormal instances. If the business code is imperfect, the performance of application A and the system availability will be affected.

Here, we mainly introduce the removal of outlier instances. What is an outlier instance? Intermittent stand-alone jitter occurs in a microservice cluster, marked by extremely high load, short-term CPU failure, and full thread pool. Due to the jitter of these individual nodes, the service quality of the overall cluster will decrease. This situation often occurs on the cloud, especially for some big clients. As a result, the capability of removing outlier instances is extremely important. We want to improve the stability of the business, so we need a solution that can remove outlier instances automatically. At the same time, when these instances return to normal, they need to be put back into the cluster to continue providing services.

In a word, outlier instances removal provides the self-heal capability for the single-point exception of the business.

We only need to select the framework type and application and then configure the lower limit of the allowed error rate.


2.6 Service Authentication

Compared with the open-source Dubbo 3.0.0, MSE provides out-of-the-box service authentication capabilities to protect sensitive businesses and control the permissions for service calls.

As our business develops, our service must also meet the demand of permission control. For example, an application in the coupon department includes a coupon query interface and a coupon issuance interface. For the coupon query interface, all applications within the company have permission to call it by default. However, only some applications in the customer service and operation departments have permission to call the coupon issuance interface.

As shown in the following figure, MSE users can manage the permissions of their services. Let's take Dubbo as an example. The configuration in the following figure shows the addItemToCart method of the com.alibabacloud.hipstershop.CartService service published by cartservice can only be called by the frontend application.


Accurate permission management allows you to manage the permissions of microservice calls better. This ensures business compliance and data security.

2.7 Service Mock

Compared with the Mock capability of the open-source Dubbo 3.0.0 service, MSE provides a complete out-of-the-box solution.

During peak hours, a downstream service provider may run into a performance bottleneck that can affect the business. You can use the service degradation feature to degrade some service consumers through service Mock. Unimportant service consumers do not perform real calls with this feature. Instead, mocked responses are returned. Valuable resources of the downstream service provider are reserved for important service consumers. As such, the overall service stability is improved.

Existing open-source circuit breaking frameworks, such as Sentinel and Hystrix, break the circuits for unstable calls to services with weak dependencies. The overall system failure that may be caused by local instability factors is prevented. Circuit breaking is a self-protective feature that is often configured on service consumers.

The service degradation feature can be enabled when a service call is abnormal or normal. This feature protects service providers and ensures that more limited resources are allocated to important service consumers.

In the process of development, I believe many of us have met the situation that our development progress is delayed due to the slow development progress of the downstream dependent parties. With the Mock function of microservice governance, a specific return value is mocked, so the development process does not need to depend on the progress of downstream dependent parties. At the same time, Mock rules can be changed flexibly on the console to achieve rapid development.

As shown in the following figure, after the application is connected to MSE, the Mock function can be provided on the console in the following way:


2.8 Service Monitoring

The online monitoring and diagnosis capabilities of Dubbo applications are essential. We provide the following complete and out-of-the-box application monitoring capabilities to make application O&M easy and efficient:

  • Application Details


  • Application-Dependent Services and Statistics of Application Instance and Status Code


  • Application System Information and Slow Call Monitoring


  • Statistical Analysis of Application Data


  • Topology Analysis of Application Calls


2.9 Summary

EDAS, MSE, SAE service governance center is a commercial version of Dubbo Admin. However, it is also more than that. We have enhanced all versions of Dubbo, Spring Cloud, and other frameworks on the market in a non-intrusive way, providing an all-in-one solution with complete microservice governance capabilities.

3. More Than Dubbo 3.0.0

Service governance of EDAS, MSE, and SAE also integrates some excellent designs and capabilities of Dubbo 3.0.0 into Dubbo 2.x and Spring Cloud frameworks as non-intrusive service governance capabilities.

3.1 Lifecycle Alignment between Microservice and Kubernetes

The lifecycle of Pod is closely related to service scheduling. If microservice does not implement its interface when the Kubernetes architecture is deployed, errors affecting the service will occur during the process of application scale-in, scale-out, restart, and release of new versions. Therefore, microservice health checks in the Kubernetes environment must be configured properly.

Health checks alone are not enough because there may be many reasons for the scenarios above:

1) During application disabling, errors and delays may occur in any of the following processes. The application provider receives the kill signal, the provider processes the in-transit request and stops, the registry perceives that the provider is disabled, the consumer receives the disabling notification, and the consumer refreshes the call list.

2) There may also be problems during the application launching process. For example, the service has not yet registered and subscribed when the Pod health check is completed. Large traffic comes in before Dubbo is ready. The initial request fails due to the establishment of a connection between it and the database/Redis. Lock in JVM class loading causes the startup to be slow. No healthy node in the rolling release process exists due to a problematic health check code.

These problems need to be solved and avoided. These problems can be solved through open-source methods, such as adjusting the configuration of the registry, the configuration of the connection pool, or the image packaging file. We can modify code to realize the logic of processing in-transit requests. We can also adopt the MSE plan without modifying the code. We only need to access MSE once; the access process takes five minutes or less.

3.1.1 Readiness Check

MSE provides a Readiness interface. The returned status of the interface turns into 200 after the microservice is fully prepared. Otherwise, it will become 503.

3.1.2 Liveness Check

MSE provides a Liveness interface. After the interface determines that the microservice is ready and the service status is healthy, the returned status turns into 200. Otherwise, it turns into 503.

We only need to complete relevant configurations on the interfaces provided by Kubernetes.

3.2 Lossless Enabling and Disabling

If your application does not have the capability of lossless disabling, any of your applications will cause short-term service unavailability during the release process. A large number of I/O exceptions will be reported. If your business does not deal with transactions properly, it will also cause data inconsistency, and you need to correct the wrong data manually. For each release, you need to notify the users about service shutdowns. Otherwise, users will meet the problem of service unavailability, affecting the user experience.

Online applications must be developed in a way to ensure normal service requests even during the period from when applications are stopped for service upgrade and deployment to when services are restarted and recovered. This means the entire process must be imperceptible to clients. Existing open-source frameworks cannot deal with this problem very well.

After your application is connected to MSE, EDAS, and SAE, it will enhance the lossless disabling capability of Dubbo and Spring Cloud traffic in non-intrusive mode automatically. The microservice governance center integrates the capabilities of lossless disabling in the lifecycle of Kubernetes. When you deploy, roll back, and scale out applications in an ACK cluster, lossless disabling will be achieved automatically.

3.3 Parallel Registration and Subscription of Service

The default service registration and subscription of Dubbo is performed in serial execution. When there are too many services in the Dubbo application, the process will be very long, increasing the startup time of the application and posing stability risks. MSE enhances the microservice framework in a non-intrusive way to make application startup faster. We achieve parallel registration and subscription of services by adding a switch, reducing the application startup time significantly.

4. Summary

Apache Dubbo 3.0.0 is a milestone version after its donation to Apache. It shows that Apache Dubbo is fully embracing cloud-native.

The service governance capabilities of EDAS, MSE, SAE are also continuously enriched with the development of cloud-native microservices and the evolution of Dubbo. With the large-scale migration of customers to the cloud, the pain points of microservices in some cloud-native scenarios are constantly emerging. We are committed to enhancing non-intrusive microservice governance to ensure that the business of customers on the cloud is always online in the process of solving clients' pain points. As such, cloud-native microservice architecture upgrades will be easier.

0 0 0
Share on

You may also like


Related Products