Container Service for Kubernetes (ACK) provides integration with the observability services of Alibaba Cloud, including CloudMonitor and Managed Service for Prometheus. ACK provides a variety of cluster monitoring components to help you monitor the health status of your clusters in a comprehensive manner. You can use the components to detect potential issues and respond to the issues. This topic describes the end-to-end monitoring solution provided by ACK that covers the monitoring of basic resources, applications, clusters, events, control plane components, networks, and kernel-level container monitoring.
Cluster observability
The following table describes the monitoring modules provided by the cluster observability feature of ACK.
Module | Description | References | Component |
Basic resource monitoring | This module allows you to enable the Kubernetes monitoring feature of CloudMonitor or Prometheus monitoring to monitor the usage and health status of basic resources in your cluster, including CPU, memory, and network resources, and enable alert notifications based on key metrics. This improves cluster stability. | ||
ack-prometheus-operator | |||
Applicationmonitoring | This module uses Application Real-Time Monitoring Service (ARMS) and the ack-onepilot component to enable topolocy analysis, API and event monitoring, tracing, and performance bottleneck check for containerized applications. | ||
Cluster monitoring | This module uses Application Monitoring eBPF Edition to obtain container performance data without code intrusion, identify pod issues, and automatically identify the Services and controller workloads that are related to the issues. This improves troubleshooting efficiency. | ||
Event monitoring | This module uses Node Problem Detector (NPD) and the Kubernetes event center feature to enable real-time monitoring and alert notification. This module diagnoses nodes and generates events based on node exceptions, and supports closed-loop management of alerts and offline alert notifications. | ||
Control plane componentmonitoring | This module uses Prometheus and Grafana to monitor control plane planes components in real time, including the API server, etcd, kube-scheduler, and kube-controller-manager. You can use this module to optimize access to control plane components and configure self-managed Prometheus systems. | ||
etcd | |||
Network monitoring | This module integrates Simple Log Service for Ingress monitoring based on Ingress Dashboard and ARMS. This module provides CoreDNS monitoring and troubleshooting. In addition, this module visualizes network traffic and business topology in clusters that use Terway, implementing observability for container networks and containerized applications. | ||
Implement network observability by using ACK Terway and Cilium Hubble | |||
Kernel-level container monitoring | This module provides OS kernel-level container monitoring to allow you to monitor containers at the OS kernel level based on System Observer Monitoring (SysOM). This facilitates the delolyment and migration of containerized applications. |