×
Community Blog First Step of Alibaba Cloud O&M: Out-of-the-Box Monitoring

First Step of Alibaba Cloud O&M: Out-of-the-Box Monitoring

This article provides a simple introduction to observability on Alibaba Cloud, specifically how to start using CloudMonitor to monitor and troubleshoot cloud resources.

By Zhongyang

This is the era of cloud, and cloud computing has been widely used in all walks of life. However, for many customers, migrating to the cloud still comes with a significant learning curve. With hundreds of products available, it can be difficult to choose and know how to use them.

Today, we will provide new users with a simple introduction on how to start observability on Alibaba Cloud.

Understand Layers of Observability

1

In the observability field, the technology stack is primarily divided into three directions: metrics, tracing, and logging. From a business perspective, we categorize it into experience, business, applications, and resources. The higher you go up the pyramid, the closer you get to the user's real experience. For instance, if a user has 100 servers and one goes down, it may not necessarily affect the user. However, if an issue is detected through network monitoring, the user has likely already been impacted. On the other hand, problems at the lower layers of the pyramid are more specific and can be quickly resolved. Taking the same example, if there's an issue with an ECS, it can be addressed by restarting or scaling out. However, the reasons for unavailable user experience services are numerous and require more time for troubleshooting.

Our goal is to help new Alibaba Cloud customers quickly build a comprehensive cloud resource monitoring solution. As cloud applications continue to evolve, you'll find more observability solutions waiting for you.

First Use of CloudMonitor

CloudMonitor is designed to solve various problems in the field of observability. The following three technology stacks are pillars of observability, based on which they can be divided into multi-purpose application methods such as Dashboard, Alert, and API. CloudMonitor primarily focuses on monitoring Metrics on Cloud Resource.

2

First, we need to go to the CloudMonitor product page, where we can see five major functional modules: cloud resource monitoring, network analysis and monitoring, Dashboard, alert service, and event center. Today, we will focus on cloud resource monitoring in this article. It is an out-of-the-box feature. After purchasing the resource product on Alibaba Cloud, you can view the related monitoring dashboard for all metrics.

3

To make good use of the cloud, it's essential to keep it visible and audible. You can utilize dashboard components to monitor the status of cloud resources. During peak hours, the system automatically keeps an eye on resources and sends notifications through the alert system.

4

How to Be "Visuable" Visualized in CloudMonitor

There is no doubt that ECS is the foundation of Alibaba Cloud computing and is also a primary product supported by CloudMonitor. It is also the most prominent feature in the CloudMonitor menu, as shown in the figure above. When users purchase ECS, they can find it in the Host Monitoring menu. The host monitoring view focuses on CPU, memory, load, network, and disk, and also provides the top 5 process monitorings at the same time.

5
6

CloudMonitor provides not only ECS monitoring but also monitoring data for over 100 cloud products. It can be said that with CloudMonitor, you can see the monitoring data for any Alibaba Cloud resource.

7

Skilled users have another two options:

1) Customize Dashboard to manage cloud resource monitoring from your own perspective.

2) Use API to pull the monitoring metric data and use the user-created monitoring system for integration.

This article mainly focuses on the first step of monitoring O&M, and the above two advanced features will not be introduced in detail.

How to Be "Audible" (Notified) in CloudMonitor

Anyone who has worked in O&M should be clear that observability is a means rather than a goal. For customers, the most important thing is to keep their applications (services) highly available for a long time. To achieve this goal, observability is the first step. You must first view the status of the system, and then take appropriate O&M measures when the system is not functioning properly, such as scaling out, restarting, migration, and throttling.

Similarly, skilled users have many measures to achieve this goal (keep applications (services) highly available for a long time), but in this article, we'd like to give new users a simple and easy-to-use introduction, similar to helloworld. With fundamentals, it will not be difficult to master the advanced features.

How to be "audible" (notified)?

  • Phone, SMS, or email? Not enough
  • DingTalk, WeChat Work, or Lark (Feishu)?
  • Auto Scaling, Serviceless FC, or MSN?
  • Webhook: Pagerduty, Slack, and Teams.

If the customer still can't be notified even with the above measures? Well, you can propose it and leave your comments.

This implies another question: What exactly do we need to be notified? Regarding alert threshold rules, skilled users can choose advanced features such as dynamic threshold, combined alert, and expression-based alert. So, what should new users do? Two things: initiative alert and monitoring governance.

Initiative alert: CloudMonitor has summarized the most important metrics and reasonable metric thresholds. For new users, they simply need to configure the alert rules which can respond promptly when cloud resources encounter problems.

8

Monitoring governance: If you don't know whether you use CloudMonitor appropriately or not, don't worry. With one-click diagnostics, you can simply follow the prompts step by step.

9

Afterword

Cloud resource monitoring is a subset of observability, and monitoring is the driving force behind O&M. Building observability is a long-term process, covering various experience aspects such as Real User Monitoring (RUM), synthetic monitoring (dial test), PTS (stress test), Application Performance Monitoring (APM), and logs. You need to develop corresponding solutions based on the importance of your own business.

Today, we have introduced the most basic out-of-the-box capability of CloudMonitor. With minimal dependencies, users can obtain basic monitoring and alerting assurance on Alibaba Cloud.

In the future, there will be a series of articles to introduce the detailed concepts and capabilities of CloudMonitor. Stay tuned.

References

[1] Alibaba Cloud CloudMonitor Help Documentation
[2] CloudMonitor Metrics

0 1 0
Share on

Alibaba Cloud Native

212 posts | 13 followers

You may also like

Comments