Container Service for Kubernetes (ACK) allows you to diagnose nodes, pods, Services, Ingresses, and memory with a few clicks to identify issues in your ACK cluster. This topic describes how to use the cluster diagnostics feature to diagnose an ACK cluster.

Prerequisites

Introduction to cluster diagnostics

The following table describes the diagnostics features provided by ACK.

CategoryDescription
Node diagnosticsDiagnose node issues, such as Kubernetes nodes in the NotReady state.
Pod diagnosticsDiagnose pod status issues, such as pod startup failures or frequent pod restarts.
Service diagnosticsDiagnose Service issues, such as Service configurations, resource quotas, and abnormal events.
Ingress diagnosticsDiagnose Ingress-related issues in traffic routing configurations.
Memory diagnosticsDiagnose node memory issues, such as memory leaks, cgroup leaks, out of memory (OOM) errors. Diagnostic results can be visualized to display the overall memory usage.

Configure diagnostics

Important When you use the cluster diagnostics feature, ACK runs a data collection program on each node in the cluster and collects diagnostic results. ACK collects key error messages from the system log and operation information, such as the system version, loads, Docker, and Kubelet. ACK does not collect business information or sensitive data.

The procedures for configuring node, pod, Service, Ingress, and memory diagnostics are similar. The following section uses node diagnostics as an example to demonstrate how to configure the diagnostics features.

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of the cluster that you want to diagnose. In the left-side navigation pane, choose Inspections and Diagnostics > Diagnostics.
  3. On the Diagnosis page, click Node diagnosis.
  4. In the Select node panel, specify Node name, read the warning and select I know and agree, and then click Create diagnosis.
    Wait until the Status column of the diagnostic report on the Diagnosis page displays Success.

View diagnostic results

On the Diagnosis page, click Diagnosis details in the operation column of the diagnostic report to view the detailed diagnostic result.
Diagnostic itemFlagDescription
Node diagnostics
  • Normal Normal: No operations are required.
  • Warning Warning: Confirm the severity of the issue. Troubleshoot the issues that may cause cluster anomalies.
  • Abnormal Abnormal: Troubleshoot the issues at the earliest opportunity to avoid cluster errors.
  • Unknown Unknown: The diagnostic may not be completed or the diagnostic result is unknown.
Node diagnostics consist of the Node, NodeComponent, ClusterComponent, ECSControllerManager, and GPUNode diagnostic items. These diagnostic items help you identify node anomalies based on the status of nodes, node components, cluster components, and Elastic Compute Service (ECS) instances. On the diagnostic details page, you can view the node diagnostic results, repair suggestions, and diagnostic items.

Move the pointer over the Details.png icon to the right side of a diagnostic item to view information about the diagnostic item.

Diagnostic items with the Abnormal or Warning flag are displayed on the Troubleshoot tab.

When a diagnostic item displays the Abnormal flag, you can move the pointer over Details in the Status column to view details about the issue.

Pod diagnostics

Pod diagnostics consist of the Pod, ClusterComponent, Node, NodeComponent, and ECSControllerManager diagnostic items. These diagnostic items help you identify pod anomalies based on the status of pods, cluster components, nodes, and ECS instances. On the diagnostic details page, you can view the pod diagnostic results, repair suggestions, and diagnostic items.

Move the pointer over the Details.png icon to the right side of a diagnostic item to view information about the diagnostic item.

Diagnostic items with the Abnormal or Warning flag are displayed on the Troubleshoot tab.

When a diagnostic item displays the Abnormal flag, you can move the pointer over Details in the Status column to view details about the issue.

Service diagnosticsService diagnostics consist of the Service and ResourceQuotas diagnostic items. These diagnostic items help you identify Service anomalies based on the billing method of Classic Load Balancer (CLB) instances, certificates, quotas, and abnormal events.

Move the pointer over the Details.png icon to the right side of a diagnostic item to view information about the diagnostic item.

Diagnostic items with the Abnormal or Warning flag are displayed on the Troubleshoot tab.

When a diagnostic item displays the Abnormal flag, you can move the pointer over Details in the Status column to view details about the issue.

Ingress diagnostics

Ingress diagnostics consist of the Ingress, Addon, and SLB diagnostic items. These diagnostic items help you identify Ingress anomalies based on the status of Ingresses, Ingress plug-ins, and Server Load Balancer (SLB) instances.

Move the pointer over the Details.png icon to the right side of a diagnostic item to view information about the diagnostic item.

Diagnostic items with the Abnormal or Warning flag are displayed on the Troubleshoot tab.

When a diagnostic item displays the Abnormal flag, you can move the pointer over Details in the Status column to view details about the issue.

Memory diagnosticsNone. On the diagnostic details page, you can view diagnostic results in the Memory Overview, Memory Analysis, and OOM Analysis sections, including memory leaks, memory utilization, and memory occupied by each process.