Alibaba Cloud CloudMonitor 2.0 is a one-stop observability platform that integrates Simple Log Service (SLS), CloudMonitor (CMS), and Application Real-Time Monitoring Service (ARMS). It unifies metrics, traces, logs, and events into a single view. Using UModel modeling and an observability graph, combined with visualization and alerting capabilities, CloudMonitor 2.0 automatically associates resources and performs intelligent diagnostics. This provides full-stack, end-to-end observability from the infrastructure to the application layer, which lets you quickly find and resolve potential issues to improve O&M efficiency. It is widely used in complex scenarios that involve microservices, containers, and cloud services.
CloudMonitor 2.0 uses AI-enhanced, cross-domain insights to analyze and predict system performance in real time. It identifies anomalies in advance and provides intelligent fault diagnosis and optimization suggestions. This helps businesses build a full-stack observability system for the AI-native era that is smarter, more efficient, and more cost-effective, ensuring business stability and security.
Try in Playground
Alibaba Cloud Playground provides a demo environment where you can experience the main features of Cloud Monitor 2.0.
Go to the Playground Demo. You will be directed to the workspace by default.
Benefits
One-stop unified observability
CloudMonitor 2.0 integrates the core capabilities of CloudMonitor (CMS), Simple Log Service (SLS), and Application Real-Time Monitoring Service (ARMS), unifying multiple data sources, such as metrics, logs, traces, and events. This eliminates the need to deploy and maintain separate monitoring tools. You can achieve comprehensive, end-to-end observability from the infrastructure to the application layer on a single platform, which significantly reduces the complexity and management costs of your monitoring system.
Unified data modeling
Based on the Universal Observability Model (UModel), CloudMonitor 2.0 breaks down data silos between metrics, logs, traces, and changes to build a panoramic digital view of your IT system. This model allows people, programs, and AI to understand and analyze observable data, helping you build true full-stack observability and improve the efficiency of issue resolution.
AI-driven intelligent analysis
Using the unified observability data model as a foundation, CloudMonitor 2.0 uses machine learning to perform deep pattern recognition and association analysis. This enables precise anomaly detection, trend prediction, and intelligent alert denoising. It also leverages large language models (LLMs) to turn complex observable data into deep insights. This enables "conversational O&M", allowing you to use natural language to interact with an intelligent assistant to quickly locate, analyze, and resolve issues.
Open and compatible with mainstream ecosystems
CloudMonitor 2.0 is fully compatible with the open source technology ecosystem. It natively supports industry-standard tools such as Prometheus, Grafana, OpenTelemetry, and Elasticsearch. This allows for a smooth migration and integration of your existing monitoring assets and technology stacks. You can achieve seamless, unified monitoring for both native applications and hybrid cloud environments. This provides a flexible, open solution without vendor lock-in.
Terms
Before using CloudMonitor 2.0, you need to understand the following basic concepts.
Term | Description |
Workspace | A workspace is an abstract layer in CloudMonitor 2.0 that represents a collection of resources. It provides teams with unified management and data isolation for resource groups. The selected region is used to store the data and configuration information accessed by the workspace. Using workspaces, you can create multiple independent resource environments. Each environment can have its own set of objects, such as cloud services, infrastructure, server-side and frontend applications, and middleware. The resources within each group are isolated from each other. This prevents resource contention between different groups and improves the security of resource usage.
|
Application (App) | An observable App is a medium for reading and writing data from data sources within a workspace. You can show or hide Apps in a workspace. An App typically represents the observability knowledge for a specific scenario. It has the following features:
|
Entity | An entity is an observable object, such as a container cluster or an ECS server. |
Model (UModel) | UModel is a specification for defining observability data models. It is used to define models for various observable objects, including logs, metrics, traces, and entities, along with the relationships between them. This achieves unified definition and management of observable data. |
Features
Feature | Description |
Full-stack data collection and monitoring |
|
Intelligent analysis and diagnostics |
|
Visualization and reporting |
|
Alert and notification management |
|
Openness and integration capabilities |
|
Security and high availability |
|
Cost optimization features |
|
Cross-region unified management | Supports centralized monitoring and management of resources distributed across multiple regions, simplifying O&M workflows. |
Scenarios
Scenario | Scenario description | Benefits |
Scenario 1: Full-stack unified monitoring and real-time observability graph | A business needs to monitor resources such as physical servers, container clusters, microservice applications, and databases in a hybrid cloud environment. Traditional tools are often fragmented, which leads to low O&M efficiency. CloudMonitor 2.0 builds an end-to-end observability graph by unifying the collection of metrics (such as CPU and memory), traces (such as API call chains), logs (such as error logs), and events (such as configuration changes). This provides a global, visualized view of the state across all resources and services. |
|
Scenario 2: Intelligent anomaly detection and fault prediction | It is difficult to manually identify potential faults during traffic spikes or in complex architectures. CloudMonitor 2.0 uses machine learning models to analyze historical data, predict risks such as system capacity bottlenecks and service response delays in real time, and trigger early warnings. |
|
Scenario 3: End-to-end full-stack tracing from client to server (APM) | In a microservices model, a single user request can involve dozens of service invocations and frontend-backend calls, which makes performance bottlenecks difficult to trace. CloudMonitor 2.0 combines full-stack tracing with code-level diagnostics, linking user experience with the underlying infrastructure. It builds a full-stack observability graph to accurately analyze issues such as slow queries and deadlocks. |
|
Scenario 4: Security, compliance, and threat insights | Businesses need to monitor security events, such as abnormal logons and data breaches, in real time while meeting compliance audit requirements. CloudMonitor 2.0 quickly detects potential threats through real-time log analysis and behavior pattern recognition. |
|
Scenario 5: Resource optimization and cost management | A lack of transparency in cloud resource usage can easily lead to waste. CloudMonitor 2.0 analyzes resource utilization and recommends Auto Scaling policies and solutions for releasing idle resources. |
|
Scenario 6: Intelligent alerting and automated O&M | Traditional alerting is prone to false positives or information overload. CloudMonitor 2.0 improves alert accuracy through alert denoising, dynamic thresholds, and tiered notification mechanisms. It also supports automated remediation actions. |
|
Scenario 7: Managed services for open source observability components and intelligent O&M | Businesses widely use open source observability tools (such as Prometheus, Grafana, and OpenTelemetry) in hybrid or multicloud environments, but face the following challenges:
|
|
List of observable applications
Application type | Application name | Description |
Resident | Alert Center | Manages all alert information centrally. |
Resident | Application Center | Manages all applications and their related services centrally. |
Resident | Integration Center | Provides integration and management for various observable objects and data. |
Resident | Entity Explorer | Explores the status and performance of different observable objects. |
Resident | Cloud Service Monitoring | Provides basic monitoring metric queries and alerting services for Alibaba Cloud services. |
Application Observability | Application Monitoring | Provides real-time monitoring and fault diagnosis for application performance. |
Application Observability | Real User Monitoring | Focuses on monitoring for web, mobile app, and miniapp scenarios. |
Application Observability | AI Application Observability | Provides one-stop, full-stack observability for AI applications. |
Service Monitoring | Prometheus Service | A fully managed cloud service for Prometheus to build a high-performance monitoring system. |
Service Monitoring | Incident Response | Aggregates alert events into incidents for management. |
Service Monitoring | Synthetic Monitoring | Simulates user requests to proactively monitor network quality, service availability, and user experience. |
Service Monitoring | Database Observability | Provides one-stop observability for database services. |
Service Monitoring | Log Audit | Records and audits operation logs. |
Cloud Service Insights | PAI Insights | Provides one-stop, full-stack observability for Platform for AI (PAI). |
Cloud Service Insights | Container Insights | Provides in-depth analysis of the running status of Kubernetes clusters. |
Cloud Service Insights | ECS Insights | Provides advanced monitoring features for Elastic Compute Service (ECS). |
Intelligent Exploration and Analysis | UModel Explorer | A debugging tool for entities and UModel. |
Intelligent Exploration and Analysis | Data Explorer | Explores and analyzes various monitoring metrics and data. |
Intelligent Exploration and Analysis | Event Center | Manages various types of event information centrally. |
Intelligent Exploration and Analysis | Dashboard | A comprehensive dashboard that displays key metrics. |
Intelligent Exploration and Analysis | Log Explorer | Provides log data exploration and analysis services. |