In this use case, the application monitoring solution based on Application Real-Time Monitoring Service (ARMS) is adopted to resolve pain points in monitoring distributed Java applications.

Rapid growth of Internet businesses has brought about increasing pressure on traffic, and business logic has also become increasingly complicated. In this background, traditional single-machine applications can no longer satisfy customer needs. The distributed deployment architecture has been adopted by more and more websites. The basic development frameworks, such as Spring Cloud and Dubbo, have gradually become mature. More enterprises vertically split their website architectures by business module and adopt the microservice architecture, which is more suitable for collaborative development among teams and quick iterations.

The microservice-based distributed architecture is advanced in terms of development efficiency. However, it creates huge challenges for traditional monitoring, O&M, and diagnosis technologies. For example, we encountered the following challenges during the application of the microservice-based distributed architecture to www.taobao.com:

  • Difficult to troubleshoot

    The customer service center submitted customer feedback to the technical support engineers for troubleshooting about problems with buying items. A website request in the microservice-based distributed architecture always passes through multiple services and nodes for the result. Once an error occurs, the engineers usually have to go through the logs over and over to identify the preliminary issue. Multiple teams were often involved in troubleshooting a simple problem.

  • Difficult to find out the bottleneck

    When a customer reports that a website gets stuck, it is difficult to quickly find out the bottleneck. Is the network between the user terminal and the server at fault? Is it a result of server overloading or high database pressure? Even though the cause is identified, it is still difficult to quickly identify the error in the code.

  • Difficult to get a clear picture of the architecture

    The business logic has become more complicated. It is difficult to sort out the code that specifies the depending downstream services of an application, for example, the database, HTTP API, or cache, and to sort out the code that specifies the external calls depending on this application. It is more difficult to sort the business logic, manage the architecture, and plan the capacity. For example, during the preparations for "Double 11" promotion campaigns, the number of servers required for each application is hard to predict.

ARMS-based application monitoring solution

The ARMS application monitoring function is originated from the distributed tracing and monitoring system, Alibaba EagleEye. It solves the preceding problems without touching the existing code.

View the call topology

You can view the call topology of an application on ARMS, for example, services that depend on the application and downstream services that the application depends on. As shown in the figure, a bottleneck occurs when an unknown application calls the monitored application, with an average consumed time more than 3,000 ms.

Generate a report on slow services and slow SQL statements

You can go to the SQL analysis report for an application to easily find the slow SQL statements and slow services.

Query distributed trace

You can click the interface snapshot of a slow SQL statement. Then, you can find a request that includes the SQL call, view the call stack of the method, and then identify the code of the problem.

Either from the global angle or from a single call, ARMS comprehensively resolves your pain points in the distributed Java application monitoring field. ARMS supports browser monitoring and business monitoring as well as application monitoring to provide all-around protection for your sites from key business metrics and customer experience to application performance.