Application Real-Time Monitoring Service

Build business monitoring capabilities with real time response based on frontend monitoring, application monitoring, and custom business monitoring capabilities

Buy Now Contact Sales

Overview

Application Real-Time Monitoring Service (ARMS) is an Alibaba Cloud monitoring product for Application Performance Management (APM). You can conveniently and quickly develop business monitoring capabilities with real time response based on the service's front-end monitoring, application monitoring, or custom business monitoring capabilities.


Benefits

An all-in-one Multifunctional Monitoring Platform.

  • Application Performance and Exception Monitoring: ARMS implements APM on distributed applications, including performance exception monitoring and call link queries.

  • Frontend Experience Monitoring: ARMS reflects your web page browsing situation by region, channel, link, or other set dimensions.

  • Advanced Custom Monitoring: Customize real-time monitoring alarms and dashboards based on your specific business needs.

  • Centralized Alarm and Report Platform: Custom monitoring, front-end monitoring, and application monitoring are integrated into a centralized alarm and report platform.

Product Details

ARMS is an Alibaba Cloud monitoring product for APM. You can easily and quickly construct business monitoring capabilities with real-time response based on the front-end monitoring, application monitoring, or custom business monitoring capabilities of this service.


Features

Real Time Highly-efficient Monitoring for Distributed Applications

  • Application topology self-discovery: ARMS automatically generates calling relationship among distributed applications based on dynamic analysis and intelligent computing of RPCs.

  • Drill-down analysis of metrics for common diagnosis scenarios: ARMS performs drill-down analysis on metrics such as the application response time, request count, and error rate. You can view the analysis results by application, event, or database.

  • Capture of exceptional and slow transactions: ARMS analyzes time-outs and exceptions based on traces and effectively associates with related interface calling, such as SQL and MQ.

  • Transaction snapshot query: ARMS intelligently collects trace-based problematic transactions and identifies the sources of exceptions or errors by troubleshooting detailed data.

Front-end Monitoring For User Experience

  • High Time-efficiency: ARMS detects the response time and error rates in real time for websites that users actually access.

  • Page Exception Monitoring: ARMS monitors and diagnoses the performance and success rate of a large number of asynchronous data calls for applications.

  • Multidimensional Monitoring analysis: ARMS analyzes user access rates and errors by region, operator, or browser.

Advanced Custom Monitoring for Business

  • Rich Data Sources: ARMS supports various types of real time data sources, such as logs, SDKs, Message Queue, and Loghub.

  • Flexible Real Time Computing and Storage Orchestration: ARMS allows you to orchestrate real time computing and storage modes based on specified dimensions and computing methods.

  • Flexible Alarm and Dashboard Interconnection: ARMS quickly connects the monitored datasets to the ARMS alarm and dashboard platform to provide monitoring capabilities in various scenarios.

Integrated Alarm and Dashboard Platform

  • Flexible Alarm Configuration: Configuration of alarm policies based on custom comparison results and metrics of various datasets.

Scenarios

Java Application Monitoring and Diagnosis Solution

In this use case, the ARMS-based application monitoring solution is adopted to resolve pain points in monitoring distributed Java applications.

Business pain points

Rapid growth of Internet businesses has brought about increasing traffic pressure, and business logic has also become more complicated. In this context, traditional single-machine applications can no longer satisfy customer needs. More and more websites have adopted distributed deployment architecture. Moreover, basic development frameworks, such as Spring Cloud and Dubbo, have gradually matured. Now, more enterprises vertically split their website architectures by business module and adopt a microservice architecture (MSA), which is more suitable for collaborative development among teams and quick iterations.

The distributed MSA is advanced in terms of development efficiency. However, it brings about huge challenges for traditional monitoring, operation and maintenance, and diagnosis technologies. For example, we confronted the following challenges when applying the distributed MSA to www.taobao.com:

  • Difficulty in identifying the problem: The customer service center submits customer feedback about problems with buying items to technical support engineers for troubleshooting. A website request in the distributed MSA always passes through multiple services and nodes before a result is returned. Once an error occurs, the engineers usually have to go through the logs over and over to identify possible problems. Multiple teams are involved in solving simple problems.

  • Difficulty in identifying the bottleneck: In case a website freezes, identifying the bottleneck quickly is difficult. It may be a fault in the network between the user terminal and the server, a result of server overload, or high database pressure. Even when the cause is identified, it is still difficult to quickly identify the error in the code.

  • Difficulty in organizing the architecture: The business logic has become more complicated. It is difficult to organize the code that specifies the depending downstream services of an application, which may be the database, HTTP API, or cache, and to organize the code that specifies the external calls depending on this application. It is more difficult to organize the business logic, manage the architecture, and plan the capacity. For example, during preparations for "Double 11" promotion campaigns, it is hard to predict the number of servers required for each application.

ARMS-based application monitoring solution

The application monitoring function provided by ARMS originates from the Alibaba EagleEye distributed tracing and monitoring system. It resolves the preceding problems without requiring you to modify the existing code.

  • Call topology: You can view the call topology of an application on ARMS, for example, services that depend on the application and downstream services that the application depends on. As shown in the figure, you can find that the applications monitored by ARMS depend on Redis, MySQL databases, and some external HTTP services. The dependency on the MySQL database is the bottleneck, with an average time consumption of over 1700 ms.

  • Slow services and SQL reports: This application's SQL analysis reports clearly display slow SQL statements and slow services.

  • Distributed call link queries: Click the interface snapshot of a slow SQL statement to retrieve a request that includes the SQL call, view the call stack of the method, and then identify problems in the code.

Whether from a global perspective or from the perspective of a single call, ARMS comprehensively resolves your pain points in the distributed Java application monitoring field. ARMS supports browser monitoring and business monitoring as well as application monitoring to provide comprehensive protection for your sites, from key business metrics and customer experience to application performance.

User Experience Monitoring

User experience business pain points

When a user accesses a service, the whole access process can be classified into three phases: page production (server status), page loading, and page running. To ensure stable running of online services, the running statuses of services are monitored on the server. The existing monitoring system on the server has matured, but status monitoring for page loading and page running has been falling short. This is because less emphasis is put on front-end monitoring, as the monitoring on the server can partially replace the front-end monitoring. In this case, the system access details cannot be perceived when the system is running online and a user accesses the system. Therefore, it is difficult to locate a front-end problem that online users occasionally encounter.

Business pain points

  • Difficulty in identifying the performance bottleneck: When a user gives feedback on slow page loading, it is difficult to quickly find out the bottleneck. It may be a fault in network connection, resource loading, or page DOM parsing problem. Is it associated with the province and country where the user is located, or the browser and equipment used by the user? These problems cannot be reproduced quickly for cause identification.

  • Inability to view error reports during user access: After a system is put into operation, it cannot be used normally due to a large number of JavaScript errors generated during user access. If the error details for user access cannot be acquired in time, are a large number of users lost? If a user gives feedback on page usage details, can the use case be reproduced immediately? Or can the error details be acquired for quick restoration? All these problems are the difficulties currently encountered by developers.

  • Unknown asynchronous API calling details: All the HTTP status codes returned by API calls are 200, but this does not mean that the API is completely normal. If the business logic is abnormal, can it be perceived in time? If all the HTTP status codes returned by API calls are normal, but the whole process consumes a long time, how can the global conditions be grasped and optimized? Under these unknown situations, problems cannot be discovered in time, and the customer experience cannot be improved.

ARMS-based front-end monitoring solution

The front-end monitoring function is achieved based on the massive real-time log analysis and processing services provided by ARMS. The function is used to monitor the access conditions of all real online users and resolve the preceding problems.

  • Finding exceptions in the application overview: You can view the overall information on the application using the front-end monitoring function, including application satisfaction, JavaScript error rate, access speed, API request success rate, and PV details. In the following example, the average value of the JavaScript error rate is 3.78%, which has increased by 56.25% compared to the same period of the last week.

  • Performance data trend and waterfall chart: On the Access Speed page, you can view detailed metrics related to the page performance and the page loading waterfall chart. Then, you can locate the performance bottleneck based on the detailed data.

  • View page stability based on JavaScript error rates and error clustering: On the JavaScript Stability page, you can view a ranking of error rates from high to low and a ranking of error clusters. That is, you can intuitively see which pages have higher JavaScript error rates and which errors are most common.

  • Analysis on API access based on API requests: On the API Request page, you can view data on API success rates and time consumption, and fully grasp interface conditions.

  • Access details: Click Details and go to the Access Details page. There, you can view the access details. For example, you can identify an error according to the error information, including File, Stack, Line, and Col.

Retail Industry Real Time Monitoring Solution

In this use case, a leader in the apparel industry has adopted an ARMS-based hybrid cloud solution to build a real-time monitoring system for its retail business.

Business pain points

  • This customer has adopted a traditional commercial OLAP database as its monitoring platform, with expensive license expenses.

  • The monitoring platform cannot meet business needs for horizontal scalability and real-time monitoring.

  • ARMS-based retail monitoring solution

ARMS-based business monitoring solution

  • Transaction logs are uploaded in real time to the Loghub of Alibaba Cloud Log Service using the Logtail Agent.

  • ARMS monitors Loghub interconnected with the service for computing and storage and analyzes and displays sales business data on its interactive dashboard in real time. It provides the following functions:

  • Computing orchestration and storage: ARMS extracts detailed data of each transaction from logs, including data on the total price and number of items, and then aggregates the data according to multiple dimensions, such as the transaction location, sales firm name, and customer membership information.

  • Interactive presentation: ARMS presents sales status and analysis on various types of drill-down cases according to multiple dimensions, such as region, outlet, member, and product category.

  • The data generated by ARMS is delivered to the downstream DataV component for dashboard presentation.

Business value of the ARMS-based IT O&M monitoring system

  • ARMS reduces total costs of operation (TCO) by a factor in the hundreds and meets the need for high time-efficacy multidimensional analysis. It not only helps you grasp sales conditions in real time, but also helps you respond to challenges using sales and inventory configuration policies.

  • ARMS presents monitoring results in different ways. The amazing dashboard of DataV presents the overall data in monitoring, while the interactive dashboard presents data for advanced troubleshooting.

Getting Started

Application Monitoring

Start using application monitoring in three steps:

  • Download the application monitoring probe package.

  • Install the probe package.

  • Start the application and begin monitoring.

Front-end Monitoring

Start using front-end monitoring in three steps:

  • Copy the front-end monitoring code snippet.

  • Paste the code snippet to the front-end web page.

  • Publish the page and begin monitoring.

Custom Monitoring

Start using custom monitoring in four steps:

  • Define the data source.

  • Define real-time data rules.

  • Define dataset aggregation rules.

  • Formulate reports and monitoring.


FAQs

1. How are ARMS Free Edition and Premium Edition different?

For more information about the differences between the free edition and premium edition of ARMS products, see ARMS product catalog.

2. Which languages does ARMS Application Monitoring support? What is in the plan?

ARMS Application Monitoring currently supports Java, and plans to support PHP and C#. We also plan to provide OpenAPI for developers of other languages to integrate with customization.

3. Does ARMS support other regions than Hangzhou and Singapore?

Currently, Browser Monitoring only covers Hangzhou region in Mainland China, and Singapore internationally. Given that the statistical data of Browser Monitoring is from the public network, it doesn’t make a lot of sense to regionalize, and therefore we don’t plan to support other regions than Hangzhou and Singapore for the time being. Alibaba Cloud users in other regions can directly use the Browser Monitoring service provided in the preceding regions, without compromising the performance and functions.

4. How to expand ARMS Custom Monitoring?

If the on-screen message encourages you to expand ARMS Custom Monitoring due to insufficient computing resources, please reach out to ARMS online customer service for manual expansion. Theoretically, it only takes a few minutes to finish the expansion.