Intelligent O&M Assistant - Cloud Monitor - Alibaba Cloud Documentation Center

Intelligent O&M Assistant is a smart assistant for observability scenarios in Cloud Monitor 2.0. Powered by a large language model (LLM), it uses your UModel observable data, allowing you to explore data through simple, natural language interactions. It also helps you efficiently locate and analyze issues.

Intelligent O&M Assistant feature overview

It accesses all data within the Cloud Monitor 2.0 platform, covering core observability scenarios. This lets you ask various observability-related questions directly in natural language.
You do not need to know Structured Process Language (SPL) syntax or how to format queries. Simply describe what you need in natural language. For example, you can ask to "find logs for Elastic Compute Service (ECS) instances with CPU usage over 80% in the last 24 hours" or "count the number of 5xx error requests for an application in the last hour". The Intelligent O&M Assistant understands your request and automatically generates an accurate SPL search statement. You can use this statement to retrieve data directly from the observability platform. This process greatly reduces the time it takes to retrieve data based on your question.
You do not need to manually switch between different tools on the observability platform, such as Log Service, Application Monitoring, the Alert Center, or CloudLens. The Intelligent O&M Assistant automatically detects your intent and invokes the correct observability tool based on your scenario. For example, if you ask for the cause of an alert, the assistant finds the upstream and downstream relationships of the relevant entity. It then uses the root cause analysis tool to pull relevant metric fluctuations and abnormal log data. If you want to view an application's latency trend, it directly opens the Application Monitoring tool and displays the latency metric dashboard for that application. This simplifies the workflow and reduces manual effort.

Procedure

Find and click the
At the bottom of the conversation sidebar, find the Intelligent O&M Assistant input box. Enter your question in natural language directly into the input box. For example:
- Find the error logs for the fraud-detection application from the past hour.
- What is the CPU usage trend for ECS instance i-xxx?
After you enter your question, press Enter or click the send button to the right of the input box. The Intelligent O&M Assistant generates an answer from the platform's observable data and displays the result in the sidebar.
The top of the conversation sidebar has a History feature. Click it to view all past interaction records with the Intelligent O&M Assistant. To clear the current session context, click the Clear Session button to reset the conversation.

Entity context association

Within the Cloud Monitor 2.0 platform, you can provide context to the Intelligent O&M Assistant by sending it an entity from tabs such as "Application List", "K8s Cluster", "ECS List", or "RDS List". This focuses your questions on the observable data for that specific entity.

Intelligent O&M Assistant Q&A

The Intelligent O&M Assistant supports natural language queries for observability topics, including logs, Application Monitoring, infrastructure data, and Real User Monitoring (RUM). You can ask the Intelligent O&M Assistant about the health of your services, check for abnormalities, and retrieve information about relevant upstream and downstream resources.
You can also ask the Intelligent O&&M Assistant for help with operations in Cloud Monitor 2.0. For example, you can ask "How do I create an alert rule and configure a notification policy?".

Example questions:

Entity data query	What are the average response time and P95 value for the xxx API over the last 24 hours?
	What is the success rate trend for application xxx when calling service xx (hourly data for the last 3 days)?
	Find the peak CPU usage and the time it occurred for ECS instance xxx over the last 3 days.
	Retrieve the real-time GPU memory usage and GPU usage for all GPU instances.
	List the names and CPU usage percentages of the top five containers by CPU usage on node xxx.
	Which deployments in the cluster have had memory usage over 80% in the last hour?
In-depth entity analysis	Find the time consumption metrics for application xxx and predict its future trend.
	Is the resource usage for application xxx normal?
	Identify pods with abnormal memory consumption.
Decision support	What is the main reason for calls to the xxx API taking longer than 3s?
	Analyze if any pods under a specific deployment have restarted and find the cause.
	List the 10 ECS instances with the highest CPU, memory, disk, and network load. Generate an inspection report that includes their CPU, memory, disk, and network metrics.