The AIOps suite helps you keep ACK clusters healthy at every stage: before O&M operations run, while risks build up unnoticed, and when something breaks and you need to find the cause fast. It includes three tools—cluster check, cluster inspection, and cluster diagnostics—that together cover the full cluster health lifecycle.
Tools at a glance
| Tool | When it runs | What it does |
|---|---|---|
| Cluster check | Before O&M operations | Automatically evaluates whether the cluster is ready for the operation and blocks it if not, so issues are caught before they cause failures. |
| Cluster inspection | On a schedule | Scans for potential risks—resource usage, resource quotas, cluster certificates, component versions—before they become incidents. |
| Cluster diagnostics | On demand | Diagnoses pods, Services, and Ingresses in a few clicks and returns the root cause with a recommended fix. |
Cluster check
Cluster check runs automatically before key O&M operations: cluster upgrades, cluster migration, component installation, and component upgrades. The operation proceeds only after the cluster passes the check. If the check fails, the console displays each failed item with the reason and a suggested fix in a visualized manner.
For more information, see Cluster check.
Cluster inspection
Cluster inspection scans your cluster on a schedule and surfaces issues before they become critical. Each inspection:
-
Scans cluster status to identify potential risks
-
Checks resource usage, resource quotas, cluster certificates, and component versions, with results displayed visually
-
Categorizes anomalies by severity level and provides recommended solutions
For more information, see Cluster inspection.
Cluster diagnostics
Cluster diagnostics covers three areas: pods, Services, and Ingresses. Select what to diagnose, and the tool displays the root cause and provides suggestions on how to fix the issue.
| Diagnostic | Issues covered |
|---|---|
| Pod diagnostics | Pod startup failures, container image pull failures, and pod exceptions |
| Service diagnostics | Service exception events, Server Load Balancer (SLB) backend server quota issues, and SLB instance quota issues |
| Ingress diagnostics | Ingress component health, startup parameters, Ingress pod error logs, and SLB instances used by the Ingress controller—helping you troubleshoot application access issues |