All Products
Search
Document Center

Container Service for Kubernetes:Work with cluster diagnostics

Last Updated:Feb 23, 2024

Container Intelligence Service (CIS) allows you to diagnose nodes, pods, Services, Ingresses, memory, and networks with a few clicks to locate issues in your Container Service for Kubernetes (ACK) cluster. This topic describes how to use the cluster diagnostics feature to diagnose an ACK cluster.

Prerequisites

  • An ACK managed cluster is created. For more information, see Create an ACK managed cluster.

  • The status of the ACK cluster is Running.

    Note

    You can log on to the ACK console, go to the Clusters page, and then check whether the Cluster Status column of your cluster displays Running.

Introduction to cluster diagnostics

CIS provides the following diagnostic items.

Diagnostic item

Description

Node diagnostics

Diagnose node issues, such as Kubernetes nodes in the NotReady state.

Pod diagnosis

Diagnose pod status issues, such as pod startup failures or frequent pod restarts.

Service diagnostics

Diagnose Service issues, such as Service configurations, resource quotas, and abnormal events.

Ingress diagnosis

Diagnose Ingress-related issues in traffic routing configurations.

Memory diagnostics

Diagnose node memory issues, such as memory leaks, cgroup leaks, out of memory (OOM) errors. Diagnostic results can be visualized to display the overall memory usage.

Network diagnosis

Diagnose common network issues, such as connectivity issues between pods, between a cluster and the Internet, and between a LoadBalancer Service and the Internet.

Configure diagnostics

Important

When you use the cluster diagnostics feature, ACK runs a data collection program on each node in the cluster and collects diagnostic results. ACK collects key error messages from the system log and operation information, such as the system version, loads, Docker, and Kubelet. ACK does not collect business information or sensitive data.

The procedures for configuring node, pod, Service, Ingress, memory, and network diagnostics are similar. The following section uses node diagnostics as an example to demonstrate how to configure the diagnostics feature.

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster that you want to manage. In the left-side navigation pane, choose Inspections and Diagnostics > Diagnostics and follow the on-screen instructions to complete authorization.

  3. On the Diagnosis page, click Node diagnosis. Then, click Diagnosis in the top left corner.

  4. In the Select node panel, specify Node name, read and select I know and agree, and then click Create diagnosis.

    You can view the diagnostic progress on the page. After the diagnostic is complete, the page displays the diagnostic results and diagnostic items. You can check the cause and fix the issues.

View diagnostic results

On the Node Diagnosis page, click Diagnosis details in the Operation column of the diagnostic report in the list to view the diagnostic results on the details page.

Note

The diagnostic items may vary based on the cluster configuration. The actual diagnostic items on the diagnostic page shall prevail.

Diagnostic item

Flag

Description

Node diagnostics

  • Normal 正常: No operations are required.

  • Warning 警告: Confirm the severity of the issue. Troubleshoot the issues that may cause cluster anomalies.

  • Abnormal 异常: Troubleshoot the issues at the earliest opportunity to avoid cluster errors.

  • Unknown 未知: The diagnostic is not completed or the diagnostic result is unknown.

Node diagnostics consist of the Node, NodeComponent, ClusterComponent, ECSControllerManager, and GPUNode diagnostic items. These diagnostic items help you identify node anomalies based on the status of nodes, node components, cluster components, and Elastic Compute Service (ECS) instances. On the diagnostic details page, you can view the node diagnostic results, repair suggestions, and diagnostic items.

Move the pointer over the 详情图标.png icon to the right of a diagnostic item to view information about the diagnostic item.

Diagnostic items with the Abnormal or Warning flag are displayed on the Troubleshoot tab.

When a diagnostic item displays the Abnormal flag, you can move the pointer over Details in the Status column to view details about the issue.

Pod diagnosis

Pod diagnostics consist of the Pod, ClusterComponent, Node, NodeComponent, and ECSControllerManager diagnostic items. These diagnostic items help you identify pod anomalies based on the status of pods, cluster components, nodes, and ECS instances. On the diagnostic details page, you can view the pod diagnostic results, repair suggestions, and diagnostic items.

Move the pointer over the 详情图标.png icon to the right of a diagnostic item to view information about the diagnostic item.

Diagnostic items with the Abnormal or Warning flag are displayed on the Troubleshoot tab.

When a diagnostic item displays the Abnormal flag, you can move the pointer over Details in the Status column to view details about the issue.

Service diagnostics

Service diagnostics consist of the Service and ResourceQuotas diagnostic items. These diagnostic items help you identify Service anomalies based on the billing method of Classic Load Balancer (CLB) instances, certificates, quotas, and abnormal events.

Move the pointer over the 详情图标.png icon to the right of a diagnostic item to view information about the diagnostic item.

Diagnostic items with the Abnormal or Warning flag are displayed on the Troubleshoot tab.

When a diagnostic item displays the Abnormal flag, you can move the pointer over Details in the Status column to view details about the issue.

Ingress diagnosis

Ingress diagnostics consist of the Ingress, Addon, and SLB diagnostic items. These diagnostic items help you identify Ingress anomalies based on the status of Ingresses, Ingress plug-ins, and Server Load Balancer (SLB) instances.

Move the pointer over the 详情图标.png icon to the right of a diagnostic item to view information about the diagnostic item.

Diagnostic items with the Abnormal or Warning flag are displayed on the Troubleshoot tab.

When a diagnostic item displays the Abnormal flag, you can move the pointer over Details in the Status column to view details about the issue.

Memory diagnostics

None.

On the diagnostic details page, you can view diagnostic results in the Memory Overview, Memory Analysis, and OOM Analysis sections, including memory leaks, memory utilization, and memory occupied by each process.

Network diagnosis

  • Normal image.png: No operations are required.

  • Abnormal image.png: Troubleshoot the issues at the earliest opportunity.

On the Diagnosis result page, you can view the diagnostic results. The Packet paths section displays all nodes that are diagnosed. Abnormal nodes are highlighted.