Application performance management-Application Monitoring - Application Real-Time Monitoring Service

Application Monitoring, a sub-service of Application Real-Time Monitoring Service (ARMS), is an application performance management (APM) solution that provides full visibility into your application's health, performance, and dependencies without code changes.

After you install an ARMS agent, Application Monitoring automatically instruments your application to capture traces, detect bottlenecks, surface errors, and restore request parameters in real time. Whether you need to pinpoint a slow database query or trace a failed request across distributed services, Application Monitoring provides the data to diagnose and resolve issues fast.

How it works

Install the ARMS agent -- The agent enhances bytecodes in your application's runtime environment. No changes to your business code are required.
Auto-discover dependencies -- The agent detects upstream and downstream services, middleware (MySQL, Redis, RocketMQ), and framework calls (Spring Cloud, Dubbo) to build a complete application topology.
Collect metrics and traces -- Interface call counts, response times, errors, and exceptions are captured automatically for all HTTP and RPC frameworks.
Analyze and alert -- View traces, diagnose slow queries, get root cause analysis from intelligent algorithms, and receive alerts through 50+ preset rules.

Applications deployed in Container Service for Kubernetes (ACK) or on Elastic Compute Service (ECS) can be automatically injected into the ARMS integration center -- no manual agent installation needed.

Key capabilities

Application topology

The ARMS agent automatically discovers how your services connect and interact. It captures application traces that use RPC frameworks and HTTP frameworks (such as Spring Cloud and Dubbo), and visualizes the resulting topology map with upstream and downstream dependencies across your application stack, including common middleware such as MySQL, Redis, and RocketMQ.

Use the topology map to:

Identify which downstream service is causing latency spikes
Spot abnormal call patterns between services
Understand the full request path before debugging

Interface monitoring

Application Monitoring automatically discovers and monitors HTTP and RPC frameworks in your code. For each interface, it collects four core metrics:

Call count -- Request volume over time
Response time -- Latency per interface
Error count -- Failed requests
Exception count -- Unhandled exceptions

Combine interface monitoring with the trace view to follow a single request end-to-end and isolate the exact interface causing a performance issue.

Trace analysis

Filter and aggregate traces in real time using multiple dimensions. Trace analysis helps you answer questions like:

Which slow calls exceed a specific latency threshold, and when do they occur?
How are abnormal requests distributed across machines?
How has traffic from VIP customers changed over time?

Slow SQL analysis

For relational databases (MySQL, PostgreSQL) and NoSQL databases (Redis, MongoDB), slow SQL analysis identifies queries that degrade transaction performance. Use it to detect slow transactions and drill down to the specific query causing the problem.

Intelligent insight

When response times spike or error rates surge, intelligent insight automatically investigates the issue using historical application data and intelligent algorithms. It delivers:

Root cause analysis -- Pinpoints the likely source of the problem
Actionable suggestions -- Recommends steps to resolve it
Alert subscription -- Notifies you proactively so you can respond before users are affected

Continuous profiling

Continuous profiling diagnoses CPU utilization and memory usage at the method, class, and line-number level -- with minimal performance overhead. Use profiling data to:

Optimize hot code paths to reduce latency
Identify memory-intensive methods to lower resource costs
Increase throughput by eliminating inefficient operations

Alerting

ARMS provides more than 50 preset alert rules covering JVM, host, and interface metrics. Customize and combine rules to match your operational needs. Through the Alert Management sub-service, configure:

Alert convergence -- Reduce noise by grouping related alerts
Notification -- Route alerts to the right team through your preferred channel
Escalation -- Automatically escalate unresolved alerts
Collaborative processing -- Coordinate incident response across teams

OpenTelemetry and open-source integration

Application Monitoring follows OpenTelemetry specifications, enabling trace correlation across multiple languages and heterogeneous technology stacks. Application metrics are stored in Managed Service for Prometheus instances under your Alibaba Cloud account. Default Grafana dashboards are included out of the box, and you can build custom dashboards using Prometheus Query Language (PromQL).