Non-intrusive monitoring - - Alibaba Cloud Documentation Center

Fast-developing microservices, DevOps, and cloud-native technologies have enabled developers to develop, deploy, and iterate applications in a more efficient manner. Developers have increasingly high requirements for application observability. For example, developers need to customize observation methods based on different programming languages, middleware components, and dynamic Kubernetes environments. Simple Log Service and Alibaba Cloud OpenAnolis have jointly developed the non-intrusive monitoring feature. The feature provides a transparent, high-performance, and non-intrusive kernel observation capability for cloud developers.

Data collection

The non-intrusive monitoring feature of Logtail provides two workspaces for data collection: Kernel Space and User Space.

Kernel Space is used to extract and preprocess data.
- Data extraction: The Kernel Hook module intercepts network data based on a KProbe definition. Kernel functions such as connect, accept, and write can be used to intercept data.
- Data preprocessing: The preprocessing module intercepts and discards data, and infers protocols based on the user mode configurations. Only data that meets specific requirements can be passed to the SendToUserSpace module. Other data is discarded. The SendToUserSpace module transmits data that meets specific requirements from kernel mode to user mode by using eBPF maps.
User Space is used to analyze, aggregate, and manage data.
- Data analysis: The Process module continuously processes the network data stored in eBPF maps. The Process module performs fine-grained protocol analysis based on the protocol type that is inferred by the preprocessing module. For example, the Process module analyzes the SQL statements of the MySQL protocol and the status codes of the HTTP protocol. The Correlate Meta module binds container metadata to the data processed by the Process module. This is because Kernel Space passes only process metadata such as PID and FD, but the observation of Kubernetes clusters requires pod metadata and container metadata, which are more informative.
- Data aggregation: After the container metadata is bound, the Aggregate module aggregates the data for deduplication. For example, if an SQL statement is called 1,000 times during an aggregation period, the Aggregate module abstracts and uploads the final data in the XSQL:1000 format.
- Data management: The eBPF program interacts with a large amount of process data and connection data. The lifecycles of the eBPF objects in the eBPF program must match the status of your machine. If a process or connection is released, the Connection Management module and the Garbage Collection module release the eBPF objects that are used by the process or connection.

无侵入监控

In a runtime environment of a program, unattended processes or local network calls may exist on virtual machines or Kubernetes nodes. The non-intrusive monitoring feature provides a variety of options for metadata collection. You can adjust the scope of metadata collection for kernel mode in user mode.

Option	Description
Protocol processing	You can enable protocol parsing.
Protocol processing	You can specify the protocols that you want to parse.
Connection filtering	You can query the data of Unix domain sockets.
Connection filtering	You can query the interactive process data.
Host process filtering	You can use a cmdline regular expression to specify the processes that you want to monitor.
Host process filtering	You can use a cmdline regular expression to specify the processes that you do not want to monitor.
Kubernetes cluster process filtering	You can use a pod-name regular expression to specify the processes that you want to monitor.
	You can use a pod-name regular expression to specify the processes that you do not want to monitor.
	You can use a Namespace-name regular expression to specify the processes that you want to monitor.
	You can use a Namespace-name regular expression to specify the processes that you do not want to monitor.
	You can use a label regular expression to specify the processes that you want to monitor.
	You can use a label regular expression to specify the processes that you do not want to monitor.
	You can use an environment variable regular expression to specify the processes that you want to monitor.
	You can use an environment variable regular expression to specify the processes that you do not want to monitor.

Data analysis

The non-intrusive monitoring feature is integrated into Kubernetes Monitoring in Full-stack Monitoring. This section describes the analysis capabilities of non-intrusive monitoring in Kubernetes scenarios.

Analysis of Layer 4 network traffic

In cloud-native scenarios, complex topology relationships exist among multi-language and multi-protocol services. You may be unable to discover the hotspot services that generate large traffic in Kubernetes clusters. However, you can use the non-intrusive monitoring feature to handle these challenges. The following figure shows the real network traffic topology among the upstream and downstream services of a frontend service. The topology shows direct interactions, DNS requests, and calls of external IP addresses. You can analyze the traffic plans or traffic volumes of Layer 4 to identify hotspot services and bottleneck issues.

The non-intrusive monitoring feature allows you to analyze Layer 4 network traffic based on various application layer protocols, such as HTTP, Redis, MySQL, DNS, and PostgreSQL.

无侵入监控

Analysis of Layer 7 network traffic

In most troubleshooting cases, you cannot identify root causes by only analyzing Layer 4 network traffic. You also need to analyze Layer 7 network traffic.

For example, the client of a Spring Cloud RESTful project uses the HTTP protocol to send requests. The server uses thread pools and BlockingQueue to respond to the requests. However, the response time calculated by the server (Time1) and the response time calculated by the client (Time2) may be different because the server-side instrumentation points and client-side instrumentation points are different. If the traffic is excessively large or a large number of upstream requests are blocked, subsequent requests are blocked, and Time1 and Time2 may be significantly different. If you configure instrumentation for only Service B, Time1 and Time2 may be within a normal range even if bottleneck issues exist. In this case, you may be unable to discover the bottleneck issues in a timely manner.

You can use the non-intrusive monitoring feature to resolve the issue. Data from the server and the client is visualized based on the time when the kernel functions such as Revc, Write, and Sendmsg process requests. This helps you monitor the status of services.

无侵入监控

The Full-stack Monitoring application analyzes the non-intrusive monitoring data that is collected and visualizes the data in charts. For example, if you enable HTTP analysis in the Simple Log Service console, you can view the status of each client and server in charts without instrumentation.

The non-intrusive monitoring feature also allows you to monitor various middleware components of MySQL, Redis, and PostgreSQL. For example, you can analyze the performance of MySQL calls based on the calls of MySQL clients.

Data import

You can enable the non-intrusive monitoring feature when you import data to the Full-stack Monitoring application. The non-intrusive monitoring feature supports the collection of monitoring data of Kubernetes data planes. For more information, see Collect data by using the non-intrusive monitoring feature.