Benefits of ARMS - Application Real-Time Monitoring Service

The Application Monitoring sub-service of Application Real-Time Monitoring Service (ARMS) is an application performance management (APM) service. By installing an ARMS agent for your application, you can comprehensively monitor the application without the need to modify your code. You can also keep track of the status of the application, quickly locate abnormal and slow interfaces, identify performance bottlenecks, and restore request parameters. This greatly improves the efficiency of error diagnostics. Application Monitoring provides the following benefits.

Out-of-the-box use	Guaranteed stability	Unlimited scale
Application Monitoring uses an agent to enhance bytecodes in application runtime environments and manage application performance. Therefore, access to Application Monitoring does not involve changes to your business code. Applications deployed in Container Service for Kubernetes (ACK) and applications deployed on Elastic Compute Service (ECS) can be automatically injected into the ARMS integration center to further reduce access costs. The ARMS agent is automatically upgraded. You can implement closed-loop observability by using capabilities such as agent installation, data computing, data storage, data visualization, and alert integration, without the need to build additional components. Applications deployed in data centers and third-party cloud services can be quickly integrated into Application Monitoring.	The data collection, processing, and storage components can be scaled out by deploying multiple replicas, which ensures high availability of core data connections. Each version of the ARMS agent is fully tested before it is released to ensure high stability. Service-level agreement (SLA) guarantee is supported. By using methods such as lazy loading, lossless calculation, trace throttling, sampling protection, automatic URL convergence, long text compression and encoding, and memory control, Application Monitoring ensures the persistent stability and controllable application performance of the ARMS agent.	Application Monitoring makes full use of distributed cloud storage capability to ensure the stability of data reporting and query. Application Monitoring allows you to connect more than 100,000 application instances from an ultra-large microservice system at the same time.

Advanced diagnostic capability	Integration capability and open source compatibility	Cost-effectiveness
Based on the site reliability engineering (SRE) experience cumulated in business scenarios, intelligent insight capabilities are built to troubleshoot complex issues, and traffic and latency spikes. Continuous profiling is provided to effectively detect bottlenecks caused by CPU, memory, and I/O in Java programs. You can query data by method name, class name, or line number to troubleshoot problems. Thread analysis capabilities are provided. Local method stack information related to slow calls is automatically saved. This helps you analyze performance bottlenecks that occur during the execution of local method stacks. Integrated with Arthas, a production-environment diagnostics tool, Application Monitoring uses the bytecode enhancement technology to display the details of application runtime, such as method parameters, exceptions, and returned values, without restarting processes.	Integrated with the Alert Management sub-service of ARMS, Application Monitoring supports multi-channel alert push, alert workflow, grouping, compression, and denoising capabilities to help you complete the closed loop of IT service management. In accordance with OpenTelemetry specifications, Application Monitoring is able to connect traces among multiple languages and heterogeneous technology stacks. Application metrics collected and processed by Application Monitoring are stored in Managed Service for Prometheus instances that belong to your Alibaba Cloud account. Default Grafana dashboards are provided. You can use Prometheus Query Language (PromQL) to customize and develop the dashboards.	Observability components are fully managed and O&M-free. You can start or stop using Application Monitoring at any time. Billing simultaneously takes or loses effect. With the end-side pre-aggregation and adaptive sampling technologies, Application Monitoring ensures that the accuracy of data collection is not affected by the sampling rate. Therefore, Application Monitoring has a definite advantage in costs in large-scale scenarios. If you use Application Monitoring to monitor applications deployed in Container Service for Kubernetes (ACK), you can get a 50 percent discount at least. In addition, resource plans offer up to an 80 percent discount to help you further reduce costs.

Comparison between Application Monitoring and open source APM services

Item	Application Monitoring	Open source APM service
Resource purchase and system construction	Resources are fully managed by Alibaba Cloud.	You must purchase related resources and deploy systems on your own.
O&M cost	No O&M operations are required.	Routine O&M operations are required.
Application integration	Applications deployed in ACK or ECS can be integrated into Application Monitoring with simple configurations. The ARMS agent can be automatically upgraded.	Applications are manually integrated, and the agent is manually upgraded. This requires a heavy workload.
Performance overhead	The performance overhead is less than 5%. By using methods such as lazy loading, lossless calculation, trace throttling, sampling protection, automatic URL convergence, long text compression and encoding, and memory control, Application Monitoring ensures the persistent stability of the ARMS agent.	In high-throughput scenarios, the performance overhead exceeds 10%, and the stability cannot be guaranteed.
SLA guarantee	A service availability of 99.5% is provided based on the SLA. Measures such as multi-zone disaster recovery, service level objective (SLO) monitoring and alerting, and emergency response rotation.	Not supported.
Performance and horizontal scaling	Automatic horizontal scaling is supported. A maximum of 100,000 nodes can be added.	Distributed horizontal scaling capabilities are not supported.
Application and instance tags	You can query the topology, monitoring data, and trace data by tag.	Not supported.
Dubbo instrumentation	The durations of routing, addressing, and encoding are recorded in detail.	Instrumentation is more coarse-grained.
Lossless calculation	The end-side pre-aggregation and adaptive sampling technologies are used to collect the traces of applications. This ensures that the sampling rate does not affect the accuracy of data collection.	Not supported. You can only rely on sampling.
Service interface monitoring	You can construct service requests in a visualized manner without modifying the service code. A wide range of performance metrics and diagnostic capabilities that fit your business are provided.	You need to modify the service code.
Interface name convergence	Automatic convergence and manual convergence based on regular expressions can be directly configured without restarting the application.	You must manually modify the configuration file and restart the application.
Local method stack analysis	Local method stack information related to slow calls is automatically saved. This helps you analyze performance bottlenecks that occur during the execution of local method stacks.	You can manually save local method stack information only for specific services.
Thread profiling	Thread-specific statistics of CPU time consumption and the number of threads for each type are provided to simulate the code execution process.	Not supported.
Thread pool monitoring and connection pool monitoring	You can monitor specific thread pools, such as Tomcat and Dubbo, and specific connection pools, such as Druid.	Not supported.
Exception analysis and error analysis	Exception analysis and error analysis views are provided.	Not supported.
End-to-end trace query	Integrated with the Browser Monitoring sub-service of ARMS, Application Monitoring connects the user interface to the server application. End-to-end trace query is supported.	Not supported.
Insights	Based on the SRE experience cumulated in business scenarios, intelligent insight capabilities are built to troubleshoot complex issues, and traffic and latency spikes.	Not supported.
Memory snapshot	You can create and analyze memory snapshots to troubleshoot memory issues such as memory leakage and memory waste.	Not supported.
Arthas integration	Application Monitoring uses the bytecode enhancement technology to display the details of application runtime, such as method parameters, exceptions, and returned values, without restarting processes.	Not supported.
Alert rule	Application Monitoring provides more than 50 preset alert rules for metrics about JVMs, hosts, and interfaces. You can configure common operators, perform period-over-period comparison, and specify threshold values in the ARMS console.	You must manually modify the configuration file. Only basic operators such as equal (=), less than (<), and greater than (>) are supported.
Alert notification	Integrated with the Alert Management sub-service of ARMS, Application Monitoring supports multi-channel alert push, alert workflow, grouping, compression, and denoising capabilities to help you complete the closed loop of IT service management.	You must manually build components to configure alerting, which cannot effectively prevent from false positives or alert storms.
Prometheus integration	Application metrics collected and processed by Application Monitoring are stored in Managed Service for Prometheus instances that belong to your Alibaba Cloud account. Default Grafana dashboards are provided. You can use PromQL to customize and develop the dashboards.	Not supported.
Cost	You can start or stop using Application Monitoring at any time. Billing simultaneously takes or loses effect. If you use Application Monitoring to monitor applications deployed in ACK and purchase resource plans, you can get a discount and further reduce costs.	You need to build a complete set of components and properly manage the capacity. If a large number of requests are initiated, complete dependence on sampling results in huge costs.
Technical support	You can use the ticket system to obtain technical support from SRE experts.	Not supported.