Observable monitoring scheme Daquan - SLS full stack monitoring

Preface

Monitoring, as a necessary function in every company's IT system, has begun to appear with the birth of computers. After decades of development, the current IT technology and architecture have undergone great changes. The development model, system architecture, deployment model and infrastructure have undergone many architectural changes. At present, the mainstream technologies are micro-service, containerization, cloud and DevOps.

With the changes of these architectures, the impact is that the whole system is also more complex, development depends on more people and departments, deployment mode and operation environment are also more dynamic and uncertain. Therefore, the IT industry has also reached the process of more systematic and systematic observation. For the monitoring system, it is also undergoing great changes, evolving towards cloud native, data fusion, intelligence and other directions.

Development history of monitoring system

Looking back on the development process of IT monitoring, I personally think it can be divided into four stages, namely, the Unix era, the data center era, the distributed era and the cloud native era:

• Unix era: With the popularity of Unix and Linux, we have a real IT system. In the 1980s and 1990s, applications were usually deployed on a stand-alone basis and were very simple. In order to locate some problems in stand-alone applications, Unix has added many metrics, such as CPU, memory and IO usage. At the same time, in order to obtain these indicators more quickly, Unix/Linux provides many command line tools, such as top, vmstat, iostat, and so on. At the same time, it provides many graphical tools for people who use desktop systems to see problems. This is also the earliest application of line chart in IT monitoring. At this stage, people don't pay much attention to performance, user experience, etc. They basically only care about availability, that is, whether the service can work.

• Data center era: In the 1990s, more and more companies began to build their own data centers, from a few to hundreds of thousands. At this time, special IT operation and maintenance personnel appeared. In order to better manage these machines, SNMP (Simple Network Management Protocol) protocol was developed to manage and monitor the status of each machine in the data center. At this time, the monitoring architecture is mainly implemented in a stand-alone way. The network and hardware information of each host is monitored through SNMP protocol. At this stage, there are also cross-host applications and web-based applications that provide external services. The monitoring system will also pay some attention to the network delay, but it is not the actual user request delay.

• Distributed era: after the 21st century, the Internet has become popular, and the application scenarios are becoming more and more extensive. The single machine has gradually been unable to withstand the increasing demand, so the layered distributed architecture has gradually become popular. The hierarchical mode of the monitoring system is also gradually obvious, such as host monitoring, network monitoring, middleware monitoring, application monitoring, etc. Among them, application monitoring is a new category. For application monitoring, it is required not only to pay attention to application availability issues, but also to monitor and solve performance problems. At this stage, the architecture of the monitoring system will also become distributed. The back end will be composed of multiple machines and modules, such as data processing, storage, alarm, etc. Each module may also be distributed, such as distributed streaming processing, distributed database, etc.

• Cloud native era: With the maturity of cloud computing and containerization technology, many companies begin to use containerization and microservice technology to develop applications, and the deployment environment of applications will also choose public cloud or private cloud. In the cloud native scenario, virtualization will be more thorough, and the environment will be more dynamic. Some traditional monitoring methods will no longer be suitable. Therefore, a monitoring system that can interface with Kubernetes, microservices, and cloud resources is required. The purpose of monitoring is also more upward, focusing on the actual experience of users and the efficiency of troubleshooting. Therefore, in addition to collecting more monitoring information, it is also necessary to be able to perform correlation analysis with other observable data (such as Logs/Traces) to quickly locate problems. At the same time, AI technology is also introduced to carry out automatic exception discovery, location and repair.

Monitoring solutions in the cloud native era

In addition to the progress required by the monitoring scheme itself, the monitoring capability and effect of the monitoring scheme in the cloud native era must also be improved to a higher level. Here we summarize the following features:

1. Wide range: support from infrastructure, container/K8s, cloud vendors, middleware, database, etc

2. Unified view: data at different levels can be viewed through unified access and view

3. Unified alarm: alarm is an important part of monitoring, and alarm must also be able to achieve unified management, and have some advanced features such as intelligent noise reduction, dynamic duty table, alarm merging/routing, to reduce management and use costs

4. Intelligence: The number of components involved in the enterprise's IT system is huge, and static rule alarm is difficult to apply. Therefore, there must be some heuristic AIOps timing anomaly detection methods, which can automatically find abnormal curves and alarm

5. Data fusion analysis: It can conveniently and effectively perform correlation analysis with other observable data such as Trace, Log, Event, so as to facilitate rapid positioning and problem solving

SLS full stack monitoring

As an Alibaba observable data engine, SLS has the one-stop collection and storage of observable data logs, indicators, distributed link tracking, events, etc. In order to facilitate users to quickly access and monitor the business system, SLS provides a full-stack monitoring APP, which summarizes various monitoring data into one instance for unified management and monitoring. Full stack monitoring is based on SLS monitoring data acquisition, storage, analysis, visualization, alarm, AIOps and other capabilities. The detailed functions are as follows:

• Real-time monitoring of various systems, including host monitoring, Kubernetes monitoring, database monitoring, middleware monitoring, etc.

• Support one-click installation of ECS and K8s, support graphical monitoring configuration management, and do not need to log in to the host to configure collection monitoring items.

• Report summary of the years' experience of the old driver of operation and maintenance, including dozens of reports such as resource overview, water level monitoring, hot spot analysis and detailed indicators.

• Support customized analysis, including PromQL, SQL92 and other analysis syntax.

• Support the inspection of AIOps indicators and use machine learning technology to automatically discover abnormal indicators.

• It supports customized alarm configuration. The alarm notification is directly connected to the message center, SMS, email, voice (phone), staples, and supports the connection of customized WebHooks.

Overview of full-stack monitoring function

Host monitoring

Kubernetes monitoring

Database monitoring

Middleware monitoring

Coming soon

At this stage, full-stack monitoring provides host monitoring, K8s monitoring, database monitoring, and middleware monitoring. The subsequent horizontal and vertical function expansion will also be seen, for example:

1. Cloud resource monitoring, including various monitoring indicators on Alibaba Cloud and other cloud monitoring indicators such as AWS and Azure

2. The host adds more functions, such as process level monitoring, kernel monitoring, process/kernel profile capabilities, etc

3. K8s adds monitoring capabilities such as performance, change and service topology; Add diagnosis and plan monitoring to the database; Middleware supports more types

4. Increase monitoring capabilities related to user experience and applications, such as dial test, front-end monitoring, mobile monitoring, etc

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us