All Products
Search
Document Center

Application Real-Time Monitoring Service:Benefits

Last Updated:Dec 09, 2024

The Application Monitoring sub-service of Application Real-Time Monitoring Service (ARMS) is an application performance management (APM) service. By installing an ARMS agent for your application, you can comprehensively monitor the application without the need to modify your code. You can also keep track of the status of the application, quickly locate abnormal and slow interfaces, identify performance bottlenecks, and restore request parameters. This greatly improves the efficiency of error diagnostics. Application Monitoring provides the following benefits.

Out-of-the-box use

Guaranteed stability

Unlimited scale

  • Application Monitoring uses an agent to enhance bytecodes in application runtime environments and manage application performance. Therefore, access to Application Monitoring does not involve changes to your business code.

  • Applications deployed in Container Service for Kubernetes (ACK) and applications deployed on Elastic Compute Service (ECS) can be automatically injected into the ARMS integration center to further reduce access costs. The ARMS agent is automatically upgraded.

  • You can implement closed-loop observability by using capabilities such as agent installation, data computing, data storage, data visualization, and alert integration, without the need to build additional components.

  • Applications deployed in data centers and third-party cloud services can be quickly integrated into Application Monitoring.

  • The data collection, processing, and storage components can be scaled out by deploying multiple replicas, which ensures high availability of core data connections.

  • Each version of the ARMS agent is fully tested before it is released to ensure high stability. Service-level agreement (SLA) guarantee is supported.

  • By using methods such as lazy loading, lossless calculation, trace throttling, sampling protection, automatic URL convergence, long text compression and encoding, and memory control, Application Monitoring ensures the persistent stability and controllable application performance of the ARMS agent.

  • Application Monitoring makes full use of distributed cloud storage capability to ensure the stability of data reporting and query.

  • Application Monitoring allows you to connect more than 100,000 application instances from an ultra-large microservice system at the same time.

Advanced diagnostic capability

Integration capability and open source compatibility

Cost-effectiveness

  • Based on the site reliability engineering (SRE) experience cumulated in business scenarios, intelligent insight capabilities are built to troubleshoot complex issues, and traffic and latency spikes.

  • Continuous profiling is provided to effectively detect bottlenecks caused by CPU, memory, and I/O in Java programs. You can query data by method name, class name, or line number to troubleshoot problems.

  • Thread analysis capabilities are provided. Local method stack information related to slow calls is automatically saved. This helps you analyze performance bottlenecks that occur during the execution of local method stacks.

  • Integrated with Arthas, a production-environment diagnostics tool, Application Monitoring uses the bytecode enhancement technology to display the details of application runtime, such as method parameters, exceptions, and returned values, without restarting processes.

  • Integrated with the Alert Management sub-service of ARMS, Application Monitoring supports multi-channel alert push, alert workflow, grouping, compression, and denoising capabilities to help you complete the closed loop of IT service management.

  • In accordance with OpenTelemetry specifications, Application Monitoring is able to connect traces among multiple languages and heterogeneous technology stacks.

  • Application metrics collected and processed by Application Monitoring are stored in Managed Service for Prometheus instances that belong to your Alibaba Cloud account. Default Grafana dashboards are provided. You can use Prometheus Query Language (PromQL) to customize and develop the dashboards.

  • Observability components are fully managed and O&M-free.

  • You can start or stop using Application Monitoring at any time. Billing simultaneously takes or loses effect.

  • With the end-side pre-aggregation and adaptive sampling technologies, Application Monitoring ensures that the accuracy of data collection is not affected by the sampling rate. Therefore, Application Monitoring has a definite advantage in costs in large-scale scenarios.

  • If you use Application Monitoring to monitor applications deployed in Container Service for Kubernetes (ACK), you can get a 50 percent discount at least. In addition, resource plans offer up to an 80 percent discount to help you further reduce costs.

Comparison between Application Monitoring and open source APM services

Item

Application Monitoring

Open source APM service

Resource purchase and system construction

Resources are fully managed by Alibaba Cloud.

You must purchase related resources and deploy systems on your own.

O&M cost

No O&M operations are required.

Routine O&M operations are required.

Application integration

Applications deployed in ACK or ECS can be integrated into Application Monitoring with simple configurations. The ARMS agent can be automatically upgraded.

Applications are manually integrated, and the agent is manually upgraded. This requires a heavy workload.

Performance overhead

The performance overhead is less than 5%. By using methods such as lazy loading, lossless calculation, trace throttling, sampling protection, automatic URL convergence, long text compression and encoding, and memory control, Application Monitoring ensures the persistent stability of the ARMS agent.

In high-throughput scenarios, the performance overhead exceeds 10%, and the stability cannot be guaranteed.

SLA guarantee

A service availability of 99.5% is provided based on the SLA. Measures such as multi-zone disaster recovery, service level objective (SLO) monitoring and alerting, and emergency response rotation.

Not supported.

Performance and horizontal scaling

Automatic horizontal scaling is supported. A maximum of 100,000 nodes can be added.

Distributed horizontal scaling capabilities are not supported.

Application and instance tags

You can query the topology, monitoring data, and trace data by tag.

Not supported.

Dubbo instrumentation

The durations of routing, addressing, and encoding are recorded in detail.

Instrumentation is more coarse-grained.

Lossless calculation

The end-side pre-aggregation and adaptive sampling technologies are used to collect the traces of applications. This ensures that the sampling rate does not affect the accuracy of data collection.

Not supported. You can only rely on sampling.

Service interface monitoring

You can construct service requests in a visualized manner without modifying the service code. A wide range of performance metrics and diagnostic capabilities that fit your business are provided.

You need to modify the service code.

Interface name convergence

Automatic convergence and manual convergence based on regular expressions can be directly configured without restarting the application.

You must manually modify the configuration file and restart the application.

Local method stack analysis

Local method stack information related to slow calls is automatically saved. This helps you analyze performance bottlenecks that occur during the execution of local method stacks.

You can manually save local method stack information only for specific services.

Thread profiling

Thread-specific statistics of CPU time consumption and the number of threads for each type are provided to simulate the code execution process.

Not supported.

Thread pool monitoring and connection pool monitoring

You can monitor specific thread pools, such as Tomcat and Dubbo, and specific connection pools, such as Druid.

Not supported.

Exception analysis and error analysis

Exception analysis and error analysis views are provided.

Not supported.

End-to-end trace query

Integrated with the Browser Monitoring sub-service of ARMS, Application Monitoring connects the user interface to the server application. End-to-end trace query is supported.

Not supported.

Insights

Based on the SRE experience cumulated in business scenarios, intelligent insight capabilities are built to troubleshoot complex issues, and traffic and latency spikes.

Not supported.

Memory snapshot

You can create and analyze memory snapshots to troubleshoot memory issues such as memory leakage and memory waste.

Not supported.

Arthas integration

Application Monitoring uses the bytecode enhancement technology to display the details of application runtime, such as method parameters, exceptions, and returned values, without restarting processes.

Not supported.

Alert rule

Application Monitoring provides more than 50 preset alert rules for metrics about JVMs, hosts, and interfaces. You can configure common operators, perform period-over-period comparison, and specify threshold values in the ARMS console.

You must manually modify the configuration file. Only basic operators such as equal (=), less than (<), and greater than (>) are supported.

Alert notification

Integrated with the Alert Management sub-service of ARMS, Application Monitoring supports multi-channel alert push, alert workflow, grouping, compression, and denoising capabilities to help you complete the closed loop of IT service management.

You must manually build components to configure alerting, which cannot effectively prevent from false positives or alert storms.

Prometheus integration

Application metrics collected and processed by Application Monitoring are stored in Managed Service for Prometheus instances that belong to your Alibaba Cloud account. Default Grafana dashboards are provided. You can use PromQL to customize and develop the dashboards.

Not supported.

Cost

You can start or stop using Application Monitoring at any time. Billing simultaneously takes or loses effect. If you use Application Monitoring to monitor applications deployed in ACK and purchase resource plans, you can get a discount and further reduce costs.

You need to build a complete set of components and properly manage the capacity. If a large number of requests are initiated, complete dependence on sampling results in huge costs.

Technical support

You can use the ticket system to obtain technical support from SRE experts.

Not supported.