Explore application architecture with Kubernetes monitoring

Why is Kubernetes monitoring required?

Many students should be familiar with application performance monitoring. This type of monitoring mainly focuses on business application logic, application framework and language runtime. The monitoring object is full of thread pool, database connection cannot be obtained, MySQL, memory overflow, and various call chain exception stacks. With the evolution of cloud native technology brought about by Kubernetes containerization technology, the development and operation of upper level applications have become simpler, but the complexity is constant, and a decrease in upper level complexity will inevitably be accompanied by an increase in lower level complexity. As shown in the figure below, the complexity has gradually shifted to the container virtualization layer and the system call inner nuclear layer layer's support for various virtualization technologies. Each layer may encounter problems, and these problems can affect upper level applications. For example, if the Kubernetes component in the container virtualization layer is abnormal, and the scheduler is abnormal, Pod will not be able to schedule and affect the application; For example, system calls related to the file system are abnormal, causing upper level applications to be unable to read files, resulting in application problems; For example, if the kernel is abnormal, the application process cannot schedule and complete the work.

In order for applications to run healthily and stably, it is necessary to have end-to-end health and stability of the software stack. Although many operation and maintenance teams have established application monitoring and system monitoring systems, no monitoring system can connect the behavior of various layers of software from top to bottom and end-to-end, resulting in difficult problems that cannot be solved from scratch. In the application layer, a network request timeout may appear to be fine on both the client and server sides, but in reality, it is due to the high RTT and retransmission rate of network layer packets, slow DNS resolution, or slow CNI plugin. How to achieve end-to-end observability in the Kubernetes container environment is the significance of Kubernetes monitoring.

Kubernetes monitoring is based on the Kubernetes container interface and underlying operating system under application monitoring. In the container virtualization layer, we obtain observation data from the following five data sources, and use the Kubernetes control component exporter to obtain observation data from the Kubernetes control component; Obtain resource observation data of the container through cAdvisor; Obtain status data of Kubernetes resources, as well as event and Kubernetes resource status and condition data through kube state metrics. At the system call layer, we obtain observation data through Linux tracing technologies such as Kprobe/tracepoints; In the inner nuclear layer, we obtain the observation data through the kernel observable module, and then Kubernetes monitoring connects the application performance monitoring through the association of processes, containers, Kubernetes resources and business applications to create end-to-end observability. So Kubernetes monitoring is an integrated solution for end-to-end observability of Kubernetes cluster software stack. In Kubernetes monitoring, observation data of all associated layers can be seen simultaneously. We hope that through a series of best practices in Kubernetes monitoring, everyone can use Kubernetes monitoring to solve thorny observability problems in the Kubernetes environment.

We will also explain from two types. The first type is problem discovery, which mainly includes five types of problem discovery: application architecture problems, performance problems, resource problems, scheduling problems, and network problems. The second type is localization problems, mainly including root cause localization of the problems discovered in the above five categories and providing repair suggestions.

Explore application architecture and discover unexpected traffic

The theme of the first class in the Kubernetes monitoring series is "How to use Kubernetes monitoring for application architecture exploration and discover unexpected traffic", which includes the following three points:

Background introduction: Challenges in exploring application architecture;

Typical scenarios: In which scenarios do we need to explore application architecture;

Best Practice: Introduce a pattern of application architecture exploration to efficiently identify and locate problems.

1、 The Challenge of Exploring Application Architecture

(1) Chaotic Microservice Architecture

In the Kubernetes containerized environment, microservices architecture is the most common architectural pattern. Under this architecture, as business develops, there will inevitably be more and more microservices, and their relationships will become increasingly complex. In the context of increasing complexity, some common architecture issues become difficult, such as what the current running architecture of the application is, whether the downstream dependent services of the application are normal, whether the upstream client traffic of the application is normal, whether the DNS resolution of the application is normal, and whether there are connectivity issues between the two applications. Therefore, it often becomes very difficult for us to explore application architecture.

(2) Multilingual

In microservice architecture, each microservice can usually use different programming languages, as long as standard services are exposed. So how do different languages monitor, do they have the same embedding mode, and do they have efficient and user-friendly embedding tools for the corresponding languages? What is the impact of code intrusion on performance, and will buried code affect business operations? This is an observational challenge faced in multilingual scenarios.

(3) Multiple communication protocols

In the microservice architecture, communication between different microservices can use different communication protocols, such as HTTP, gRPC, Kafka, Dubbo, etc. Often, we need to identify these protocols to quickly identify problems with corresponding dependent services. However, identifying protocols means understanding each agreement, burying points in appropriate places, and how different communication protocols can uniformly embed code intrusion, which will affect business performance, This is an observation challenge faced in communication protocol scenarios.

2、 Typical Scenarios

(1) Architecture Awareness

Architecture awareness refers to drawing a topology map based on real network calls, using microservices as nodes and calls between microservices as edges. By comparing the expected architecture of static design, we can identify issues such as whether there are more or less microservices, and whether the relationship between microservices is correct. Usually, when new applications are launched, new regions are opened, and overall link sorting is carried out, attention needs to be paid to the use of a large structure diagram.

(2) Architecture anomaly discovery

Architecture anomaly detection refers to the ability to quickly discover abnormal nodes and edges by displaying corresponding anomaly colors based on the anomaly rules of nodes and edges in the custom architecture topology graph. It is usually used in scenarios that focus on node and edge status such as overall link sorting and health inspection.

(3) Correlation analysis

After locating a node or edge anomaly through anomaly detection, we usually need to switch the association relationship, quickly view the upstream and downstream of the relevant nodes or edges, as well as the corresponding service instances, and gradually narrow the scope of the problem.

3、 Best Practices

The above three typical scenarios constitute a complete practical process: observing whether the actual running architecture of the application is consistent with expectations through architecture perception. If there are structural problems, further investigation of services with abnormal structures is needed. If there are no structural problems, we can proceed to the next step. Observe whether there are nodes and edges with color anomalies through anomaly detection. If there are no abnormal nodes and edges, it is best. Otherwise, we will proceed to the next step, locate specific nodes and edges, and start correlation analysis. First, analyze whether there are problems with our own instances, and then look upstream and downstream to see if there are problems.

How does Kubernetes monitoring support best practices? Firstly, Kubernetes monitors the architecture awareness of cluster topology. Kubernetes monitoring maps the application architecture topology by correlating real network requests. Currently, there are two views available: Service and Workload. The former refers to service calls between Services, while the latter refers to service calls between Deployment, Daemonset, and Statefulset.

Entering the topology map, nodes are grouped and converged by default. Within the cluster, they are grouped by namespace, while outside the cluster, they are grouped by service type. After expanding the grouping, you can see the corresponding nodes and node relationships. Clicking on a node can display the aggregated and temporal values of performance indicators within the selected time range, which are divided by network protocol. Clicking on the edge can display the aggregated and temporal values of performance indicators within the selected time range, which are divided by network protocol and filtered by nodes. For example, viewing the architectural relationship between two specific namespaces, And node queries, quickly viewing a node can be a great way to explore the architecture.

Looking at the anomaly detection ability of Kubernetes monitoring, Kubernetes monitoring draws nodes and edges into abnormal yellow or red colors through three dimensions of anomaly conditions. Specifically, these three dimensions are abnormal performance indicators, such as an error rate greater than 10% and an average response time greater than 500 milliseconds; Secondly, resource indicators are abnormal, such as CPU usage exceeding 70% and memory usage exceeding 70%; Thirdly, if the K8S control status is abnormal, such as POD being unable to reach the ready state, the percentage of abnormal node groups will be displayed when the group is folded. Expanding the group can see that a specific node has become abnormal. Through this ability, we can quickly discover anomalies in specific microservices or microservice relationships.

Kubernetes monitoring also has correlation analysis capabilities, supporting viewing the upstream and downstream of specific nodes, providing 3D views while viewing the upstream and downstream relationships associated with nodes and their own strength status. It can explore all associated data in one graph, greatly improving the efficiency of problem localization.

4、 Product value monitored by Kubernetes

Alibaba Cloud Kubernetes Monitoring is a one-stop observability product developed for Kubernetes clusters, which associates all indicators, links, logs, and events under the Kubernetes name. It mainly has six characteristics:

Code non-invasive: Alibaba Cloud Kubernetes monitoring uses bypass technology to obtain rich network performance data without the need for code embedding.

• Language independent: Alibaba Cloud Kubernetes monitors the network protocol resolution at the inner nuclear layer, and supports any language and any framework.

High performance: Alibaba Cloud Kubernetes monitoring is based on eBPF technology, which can obtain rich network performance data with extremely low consumption.

Resource correlation: Alibaba Cloud Kubernetes monitoring displays the correlation of related resources through network topology and resource topology.

• Data diversity: Alibaba Cloud Kubernetes monitoring supports various types of observable data (monitoring indicators, links, logs, and events), covering end-to-end software stacks.

• Integrity: Alibaba Cloud Kubernetes monitoring utilizes console scenario design, architecture aware topology, application monitoring, Prometheus monitoring, cloud dial testing, health inspections, event centers, log services, and cloud services.

So what are the similarities and differences between Kubernetes monitoring, application performance monitoring, and Prometheus monitoring? The following figure clearly expresses the relationship and differences between these three. Application performance monitoring mainly focuses on application logic, framework, and programming language, while Kubernetes monitoring focuses on system network and container interface, and will also be associated with application performance monitoring upwards. Prometheus monitoring is an infrastructure, and indicator data for Kubernetes monitoring and application performance monitoring will be stored in Prometheus monitoring.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us