In this use case, the ARMS-based application monitoring solution is adopted to resolve pain points in monitoring distributed Java applications.
Rapid growth of Internet businesses has brought about increasing pressure on traffic, and business logic has also become increasingly complicated. In this background, traditional single-machine applications can no longer satisfy customer needs. The distributed deployment architecture has been adopted by more and more websites. Moreover, the basic development frameworks, such as Spring Cloud and Dubbo, have gradually become mature. More enterprises vertically split their website architectures by business module and adopt the microservice architecture (MSA), which is more suitable for collaborative development among teams and quick iterations.
The distributed MSA is advanced in development efficiency. However, it brings about huge challenges for traditional monitoring, operation and maintenance, and diagnosis technologies. For example, we encountered the following challenges during applying the distributed MSA to www.taobao.com:
Difficult to troubleshoot
The customer service center submitted customer feedback about the problems in buying items to the technical support engineers for troubleshooting. A website request in the distributed MSA always passes through multiple services and nodes for the result. Once an error occurs, the engineers usually have to go through the logs over and over to identify the preliminary issue. Multiple teams were often involved in a troubleshooting a simple problem.
Difficult to find out the bottleneck
When a customer reports that a website gets stuck, it is difficult to quickly find out the bottleneck. Is the network between the user terminal and the server at fault? Is it a result of server overloading or high database pressure? Even though the cause is identified, it is still difficult to quickly identify the error in code.
Difficult to get a clear picture of the architecture
The business logic has become more complicated. It is difficult to sort out the code that specifies the depending downstream services of an application, be it the database, HTTP API, or cache, and to sort out the code that specifies the external calls depending on this application. It is more difficult to sort the business logic, manage the architecture, and plan the capacity. For example, during the preparations for “Double 11” promotion campaigns, the number of servers required for each application is hard to be planned.
ARMS Application Monitoring function is originated from Alibaba EagleEye distributed tracing and monitoring system. It resolves the preceding problems without touching the existing code.
View the call topology
You can view the call topology of an application on ARMS, for example, services that depend on the application and downstream services that the application depends on. As shown in Figure 1, obviously the applications monitored by ARMS depend on Redis, MySQL database, and some external HTTP services. The dependency on the MySQL database is the bottleneck, with an average time consumption of over 1700 ms.
Generate a report on slow services and slow SQL reports
Go to the SQL Analysis report for an application. You can easily find the slow SQL statements and slow services.
Query distributed invocation trace
Click the interface snapshot of a slow SQL statement. You can find a request that includes the SQL call, view the call stack of the method, and then identify the code of the problem.
Either from the global angle or from a single call, ARMS comprehensively resolves your pain points in the distributed Java application monitoring field. The ARMS supports browser monitoring and business monitoring as well as application monitoring to provide all-around protection for your sites from key business metrics and customer experience to application performance.