How Can We Monitor Containers as They Become More Widely Used?

By Baiyu

The vigorous development and implementation of container technology allows more enterprises to run their businesses in containers. As one of the mainstream deployment methods, containers separate the tasks and concerns of the team. The Development Team only needs to focus on application logic and dependencies, and the O&M Team only needs to focus on deployment and management. The O&M Team no longer needs to worry about application details, such as specific software versions and application-specific configurations. This means the Development Team and O&M Team can spend less time debugging and launching and more time delivering new functions to end users. Containers make it easier for enterprises to improve application portability and operational flexibility. According to a CNCF research report, 73% of respondents are using containers to improve production agility and speed up innovation.

Why Do We Need Container Monitoring?

Sometimes, we use containers on a large scale and may face a highly dynamic containerized environment that requires continuous monitoring. Thus, the establishment of a monitoring system is of great significance for maintaining a stable operating environment and optimizing resource costs. Each container image may have a large number of running instances. Due to the rapid introduction of new images and new versions, failures can easily spread through containers, applications, and architectures. This makes it crucial to locate the root cause of the problem immediately after it has occurred to prevent the spread of exceptions. After a lot of practice, we believe monitoring the following components is critical during container use:

Server
Container Runtime
Orchestrator Control Plane
Middleware Dependencies
Applications that run in a container

Under a complete monitoring system, teams can understand what is happening in clusters, container runtimes, and applications by deeply understanding metrics, logs, and procedures. It is also helpful when making business decisions, such as the time to expand and reduce instances, tasks, and Pods and change instance types. DevOps engineers can also improve troubleshooting and resource management efficiency by adding automated alerts and related configurations. For example, they can actively monitor memory utilization to notify the O&M Team to add additional nodes before the available CPU and memory resources are exhausted when resource consumption approaches the threshold. The benefits include:

Detect problems as early as possible to avoid system disruption
Analyze container health across cloud environments
Identify clusters that allocate excessive/insufficient available resources and adjust applications for better performance
Create intelligent alarms to improve alarm accuracy and avoid false alarms
Optimize the system using monitoring data for optimal performance and lower operating costs

However, the O&M Team will feel that the benefits above are relatively insignificant during the implementation process. It seems the existing O&M tools can achieve the purposes above. However, if you cannot build a corresponding monitoring system for container-related scenarios, you have to face the following two troubles as your business continues to expand:

1. The Time for Troubleshooting Is Prolonged, and SLA Cannot Be Satisfied.

It is difficult for the Development Team and the O&M Team to understand what is running and its execution. Maintaining applications, meeting SLA requirements, and troubleshooting are extremely difficult.

2. Scalability Is Affected, and Elasticity Cannot Be Realized.

The capability to quickly extend applications or microservice instances on-demand is an important requirement for containerized environments. The monitoring system is the only visual way to measure requirements and user experience. Delayed scale-out leads to a decline in performance and user experience, and delayed scale-in leads to a waste of resources and costs.

Therefore, when the problems and value of container monitoring constantly accumulate and appear, more O&M Teams begin to pay attention to the building of container monitoring systems. However, various unexpected problems are encountered during the process of real-world container monitoring implementation.

These problems include the tracking difficulty brought by the short-lived feature. Due to the complexity of the container, the container contains the underlying code and all the underlying services required for the application to run. As newly deployed applications are put into production and code and underlying services are changed, containerized applications are updated frequently, which increases the possibility of errors. The characteristics of fast creation and destruction make it extremely difficult to track changes in large-scale complex systems.

Due to the monitoring difficulties caused by shared resources, it is difficult to monitor the resource consumption on the physical host because the memory and CPU used by the container are shared among one or more hosts. This makes it difficult to obtain reliable indications of container performance or application health.

Finally, it is difficult for traditional tools to meet container monitoring requirements. Traditional monitoring solutions often lack the metrics needed for virtualized environments and the tools required for traces and logs, especially tools for container health and performance.

Therefore, considering the benefits, problems, and difficulties above, we need to design from the following dimensions when establishing a container monitoring system.

Non-Intrusiveness: Whether the monitoring SDK or the probe integrated into the business code is intrusive and affects the stability of the business
Integrity: Whether you can observe the performance of the entire application in terms of business and technology platforms
Multi-Source: Whether metrics and log sets can be obtained from different data sources for aggregate display, analysis, and alerts
Convenience: Whether it is possible to correlate events and logs, find exceptions, and troubleshoot them actively and passively to reduce losses, and whether the related alert policy configuration is convenient

There are many open-source tools for O&M teams to choose from during the process of defining business demands and designing the monitoring system. However, the O&M Team also needs to evaluate possible business and project risks. These risks are listed below:

Whether unknown risks that affect the stability of the business can be detected, whether the monitoring service can be traceless, and whether the monitoring process itself affects the normal operation of the system
The workforce and time investment in open-source or in-house development is difficult to predict. Associated components or resources need to be configured or built independently. There is no corresponding support and services as the business changes, whether it requires higher workforce and time costs or not, or in the face of performance problems in large-scale scenarios, whether open-source or enterprise-owned teams can quickly deal with them.

Alibaba Cloud Kubernetes Monitoring Makes Container Cluster Monitoring More Perceivable and Simple

Therefore, based on the preceding insights and extensive practical experience, Alibaba Cloud has launched the Kubernetes monitoring service. Alibaba Cloud Kubernetes Monitoring is an all-in-one observability product developed for Kubernetes clusters. Kubernetes Monitoring provides IT developers and O&M personnel with a comprehensive observability solution based on multiple aspects of Kubernetes clusters, including metrics, traces, logs, and events. Alibaba Cloud Kubernetes Monitoring has the following six features.

No Intrusion into Code: Network performance data can be obtained without code tracking points through bypass technology.
Multi-Language Support: Network protocol resolution is performed at the kernel layer. All languages and frameworks are supported.
Low Consumption and High Performance: Based on eBPF technology, network performance data is obtained with extremely low consumption.
Automatic Resource Topology: The correlation of related resources is displayed through the network topology.
Multi-Dimensional Data Display: Various types of observable data are supported, such as monitoring metrics, procedures, logs, and events.
Loop Correlation: Observable data at the architecture layer, application layer, container operation layer, container control layer, and basic resource layer can be correlated.

At the same time, compared with open-source container monitoring, Alibaba Cloud Kubernetes Monitoring is closer to business scenarios.

No Upper Limit of Data Volume: Data, such as metrics, procedures, and logs are stored independently. Cloud storage ensures low-cost and large-capacity storage.
Efficient Resource Correlation and Interaction: You can monitor network requests and build a complete network topology to view the service dependency status and improve O&M efficiency. In addition to the network topology, the 3D topology supports simultaneous viewing of the network topology and the resource topology to speed up problem location.
Diversified Data Combinations: Data visualization and a free combination of metrics, procedures, and logs make deep O&M optimization possible.
Construction of Complete Monitoring System: Work with other sub-products of the real-time monitoring service to build a complete monitoring system. Application monitoring focuses on application language runtime, application framework, and business code. Kubernetes monitoring focuses on the container runtime, container control layer, and system calls of containerized applications. Both monitoring services serve applications but focus on different layers of applications. The two products complement each other. Prometheus is the infrastructure for collecting, storing, and querying metrics. The metric data of both application monitoring and Kubernetes monitoring depends on Prometheus.

Based on the features and different values, we apply Kubernetes monitoring in the following scenarios.

Use default system rules or customized inspection rules of Kubernetes monitoring to find exceptions of nodes, services, and workloads. Kubernetes monitoring inspects nodes, services, and workloads in terms of performance, resource, and control. The analysis results of normal, warning, and critical states are visually displayed in different colors to help O&M personnel perceive the running states of user nodes, services, and workloads.

Kubernetes monitoring is used to locate the root cause of service and workload response failures. Kubernetes monitoring stores details of failed requests by analyzing network protocols and uses the failed request details associated with failed request metrics to locate the failure cause.
Kubernetes monitoring is used to locate the root causes of the slow response of services and workloads. Kubernetes monitoring captures the metrics of the critical paths of network links to view DNS resolution performance, TCP retransmission rate, network packet RTT, and other metrics. These metrics are used to locate the cause of slow response to optimize related services.

Kubernetes monitoring is used to explore application architectures and discover unexpected network traffic. Kubernetes monitoring allows you to view the topology of global traffic and configure static ports to identify specific services. You can use the perceivable and powerful interaction of the topology to explore the application architecture and verify whether the traffic meets expectations and whether the architecture form is reasonable.

Kubernetes monitoring can be used to find the problems of uneven use of node resources to schedule node resources in advance and reduce business operation risks.

Currently, Kubernetes monitoring is in the public beta stage and is free to use. Let Kubernetes monitoring help you get rid of repeated and dull O&M work!

Community

How Can We Monitor Containers as They Become More Widely Used?

Why Do We Need Container Monitoring?

1. The Time for Troubleshooting Is Prolonged, and SLA Cannot Be Satisfied.

2. Scalability Is Affected, and Elasticity Cannot Be Realized.

Alibaba Cloud Kubernetes Monitoring Makes Container Cluster Monitoring More Perceivable and Simple

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Managed Service for Grafana

Bastionhost

ACK One

Managed Service for Prometheus