All Products
Search
Document Center

Container Service for Kubernetes:Overview of the AIOps suite

Last Updated:Jun 19, 2023

Container Service for Kubernetes is a large-scale distributed container orchestration engine. Due to its complexity, the management and maintenance of clusters require technical expertise. To make cluster management and maintenance easier, ACK provides the AIOps suite. The AIOps suite consists of cluster check, cluster inspection, and cluster diagnostics, which can help you troubleshoot issues and improve the O&M efficiency. This topic describes the benefits of the AIOps suite and the features provided by the suite, including cluster check, cluster inspection, and cluster diagnosis.

Table of contents

Benefits

The AIOps suite provides a variety of features, including cluster check, cluster inspection, and cluster diagnostics. The following table describes the benefits of the AIOps suite.

Feature

Benefit

Cluster check

A cluster check is triggered before the system performs O&M operations on a cluster to evaluate whether the cluster meets the requirements of the O&M operation. This increases the success rate of O&M operations.

Cluster inspection

Cluster inspections are performed at a scheduled time to identify potential risks in clusters.

Cluster diagnostics

A collection of cluster diagnostics tools are provided to diagnose pods, nodes, Ingresses, and memory and improve the efficiency of cluster diagnostics.

Cluster check

The cluster check feature covers key O&M operations, such as cluster upgrades, cluster migration, component installation, component upgrades, and node pool upgrades. Before you perform these O&M operations, a cluster check is automatically triggered. You can perform the corresponding O&M operations only after the cluster passes the check. The system also displays the reasons of failed check items in a visualized manner and provides suggestions on how to fix them. For more information, see Cluster Check.

Cluster inspection

Based on a large number of cluster management practices, ACK has gained cluster inspection experience from a variety of use cases. You can use the cluster inspection feature to complete the following tasks:

  • Scan the status of a cluster to identify potential risks.

  • Periodically check the resource usage, resource quotas, cluster certificates, and component versions of a cluster and view the results in a visualized manner.

  • The severity levels of anomalies are displayed and solutions are provided to help you maintain your clusters more efficiently.

For more information, see Cluster inspection.

Cluster diagnostics

The cluster diagnostics feature allows you to diagnose clusters with a few clicks. This feature can help troubleshoot pods, nodes, Services, Ingresses, and memory in your cluster. For more information, see Work with cluster diagnostics.

Category

Description

Pod diagnostics

Diagnoses common pod issues, such as pod startup failures, container image pulling failures, and pod exceptions, displays the root cause of these issues, and provides suggestions on how to fix the issues.

Node diagnostics

Diagnoses common node issues, such as the NotReady issue, node network issues, and runtime issues, displays the root cause of these issues, and provides suggestions on how to fix the issues.

Service diagnostics

Diagnoses common Service issues, such as Service exception events, Server Load Balancer (SLB) backend server quota issues, and SLB instance quota issues, displays the root cause of these issues, and provides suggestions on how to fix the issues.

Ingress diagnostics

Collects information about Ingress component check, startup parameters, Ingress pod error logs, and the SLB instances used by the Ingress controller to help troubleshoot application access issues.

Memory diagnostics

Diagnoses common memory issues in ACK clusters, such as memory leaks, memory fragmentation, and cgroup leaks, displays the root cause of these issues, and provides suggestions on how to fix the issues.