Alibaba Cloud CloudMonitor 2.0 is a unified observability platform that integrates Simple Log Service (SLS), CloudMonitor (CMS), and Application Real-Time Monitoring Service (ARMS). It consolidates metrics, traces, logs, and events into a single view. Using the UModel observability framework and observability graph, CloudMonitor 2.0 combines visualization and alerting to automatically associate resources and perform intelligent diagnostics. It delivers full-stack, end-to-end observability—from infrastructure to applications—so you can quickly detect and resolve issues and improve operations and maintenance (O&M) efficiency. It supports complex environments such as microservices, containers, and cloud services.
CloudMonitor 2.0 uses AI-enhanced cross-domain insights to analyze and predict system performance in real time. It detects anomalies early and provides intelligent fault diagnosis and optimization suggestions. This helps enterprises build a smarter, more efficient, and lower-cost full-stack observability system in the AI-native era—ensuring business stability and security.
Try in Playground
Alibaba Cloud Playground provides a demo environment where you can experience the main features of Cloud Monitor 2.0.
Visit the Playground Demo. You enter a workspace by default.
Benefits
Unified observability
CloudMonitor 2.0 deeply integrates the core capabilities of CMS, SLS, and ARMS. It unifies metrics, logs, traces, and events into one platform. You no longer need to deploy or maintain multiple standalone monitoring tools. Instead, you can obtain full-stack, end-to-end visibility—from infrastructure to applications—in one place. This reduces complexity and management costs.
Unified data modeling
UModel (Universal Observability Model) connects data silos—including metrics, logs, traces, and configuration changes—to build a complete digital view of your IT systems. People, programs, and AI can all understand and analyze observable data. This enables true full-stack observability and accelerates issue detection.
AI-powered intelligent analysis
Using the unified observability data model, CloudMonitor 2.0 applies machine learning for deep pattern recognition and association analysis. It delivers precise anomaly detection, trend forecasting, and intelligent alert noise reduction. It also uses large language models to transform complex observable data into actionable insights. This enables revolutionary “conversational O&M”—you can interact with an intelligent assistant in plain language to quickly locate, analyze, and fix problems.
Open and compatible with mainstream ecosystems
CloudMonitor 2.0 fully embraces open-source technology. It natively supports Prometheus, Grafana, OpenTelemetry, Elasticsearch, and other industry standards and tools. Your existing monitoring assets and technology stack migrate smoothly. Whether you run cloud-native applications or hybrid cloud environments, you get seamless, unified monitoring. This gives you a flexible, open, vendor-neutral solution.
Terms
Before using CloudMonitor, learn these basic concepts.
Term | Description |
Workspace | A workspace is an abstraction layer in CloudMonitor 2.0 that groups a set of resources. It gives teams unified management and resource-group data isolation. The selected region stores the workspace’s data and configuration. With workspaces, you can create multiple independent resource environments. Each environment has its own set of objects—such as cloud services, infrastructure, server-side and frontend applications, and middleware. Resources in each group are isolated. This prevents resource conflicts across groups and improves security.
|
App | An app is a lightweight carrier for reading and writing data sources within a workspace. You can show or hide apps in the workspace. An app usually represents domain-specific observability knowledge for a particular scenario. Key traits:
|
Entity | An entity is an observable object—such as a container cluster or an ECS instance. |
Model (Umodel) | UModel is a specification for defining observability data models. It defines models for logs, metrics, traces, entities, and their relationships—enabling unified definition and management of observable data. |
Features
Features | Description |
Full-stack data collection and monitoring |
|
Intelligent analysis and diagnostics |
|
Visualization and reporting |
|
Alerting and notification management |
|
Openness and integration |
|
Security and high availability |
|
Cost optimization |
|
Cross-region unified management | Monitor and manage resources across multiple regions from one place. Simplify O&M workflows. |
Scenarios
Scenario | Description | Advantages |
Scenario 1: Unified full-stack monitoring and real-time observability graph | Enterprises must monitor physical servers, container clusters, microservice applications, and databases across hybrid cloud environments. Fragmented tools reduce O&M efficiency. CloudMonitor 2.0 collects metrics—such as CPU and memory usage—traces—such as API call chains—logs—such as error logs—and events—such as configuration changes—in one place. It builds an end-to-end observability graph for global, cross-resource, cross-service visibility. |
|
Scenario 2: Intelligent anomaly detection and failure prediction | During traffic spikes or in complex architectures, manually identifying hidden failures is difficult. CloudMonitor 2.0 uses machine learning models to analyze historical data. It predicts risks—such as capacity bottlenecks and service latency—and triggers early warnings. |
|
Scenario 3: End-to-end full-stack tracing from client to server (APM) | In microservice architectures, a single user request may involve dozens of service calls and frontend or backend interactions. Pinpointing performance bottlenecks is challenging. CloudMonitor 2.0 combines full-stack tracing with code-level diagnostics. It links upstream to user experience and downstream to infrastructure—building a full-stack observability graph to precisely analyze slow queries and deadlocks. |
|
Scenario 4: Security compliance and threat insights | Enterprises need real-time monitoring of security events—such as abnormal logins and data breaches—and must meet compliance audit requirements. CloudMonitor 2.0 uses real-time log analysis and behavior pattern recognition to detect threats quickly. |
|
Scenario 5: Resource optimization and cost management | Opaque cloud resource usage leads to waste. CloudMonitor 2.0 analyzes resource utilization and recommends elastic scaling policies and idle resource release plans. |
|
Scenario 6: Intelligent alerting and automated O&M | Traditional alerting often causes false positives or information overload. CloudMonitor 2.0 uses alert noise reduction, dynamic thresholds, and tiered notifications to improve accuracy—and supports automated remediation actions. |
|
Scenario 7: Managed open-source observability components and intelligent O&M | Enterprises widely use open-source observability tools—such as Prometheus, Grafana, and OpenTelemetry—in hybrid or multicloud environments. But they face three challenges:
|
|
Observability apps
App type | App name | Description |
Persistent | Alert Center | Manage all alert information in one place |
Persistent | All Features | Manage all apps and related services in one place |
Persistent | Integration Center | Integrate and manage observability objects and data |
Resident | Entity Explorer | Explore the status and performance of monitored objects. |
Persistent | Cloud Service Monitoring | Query and alert on basic monitoring metrics for Alibaba Cloud services |
Application observability | Application Monitoring | Monitor application performance and diagnose faults in real time |
Application observability | Real User Monitoring | Monitor web, mobile apps, and mini programs |
Application observability | AI Application Observability | Deliver full-stack, integrated observability for AI applications |
O&M monitoring | Prometheus Service | A fully managed Prometheus cloud service for high-performance monitoring |
O&M monitoring | Incident Response | Group alert events into incidents and manage them |
O&M monitoring | Synthetic Monitoring | Simulate user requests to proactively monitor network quality, service availability, and user experience |
O&M monitoring | Database Observability | Deliver one-stop observability for database services |
O&M monitoring | Log Audit | Record and review operation logs |
Cloud service insights | PAI Insights | Deliver full-stack, one-stop observability for Platform for AI (PAI) |
Cloud service insights | Container Insights | Analyze the operational status of Kubernetes clusters in depth |
Cloud service insights | ECS Insights | Advanced monitoring for Elastic Compute Service (ECS) |
Intelligent exploration and analysis | UModel Explorer | Entity and UModel debugging tool |
Intelligent exploration and analysis | Data Explorer | Explore and analyze monitoring metrics and data |
Intelligent exploration and analysis | Event Hub | Manage all types of event information in one place |
Intelligent exploration and analysis | Dashboard | Dashboard showing key metrics |
Intelligent exploration and analysis | Log Explorer | Provide log data exploration and analysis services |