Alibaba Cloud Container Compute Service (ACS) integrates with Container Intelligence Service (CIS) to run periodic cluster inspections. When an inspection detects a potential risk, it generates an alert. This topic lists each alert, explains what triggers it and what breaks, and provides steps to resolve it.
Note: Check items may vary based on your cluster configuration. The items in your inspection report take precedence over this topic. For instructions on running cluster inspections, see Work with the cluster inspection feature.
Check items and alerts
| Check item | Inspection item | Alert |
|---|---|---|
| Resource quotas | Quota on SLB instances | Insufficient quota on SLB instances in a VPC |
| Resource quotas | Quota on SLB backend servers | Insufficient quota on SLB backend servers |
| Resource quotas | Quota on SLB listeners | Insufficient quota on SLB listeners |
| Resource watermarks | SLB bandwidth usage | Excessive SLB bandwidth usage |
| Resource watermarks | Number of SLB connections | Excessive number of SLB connections |
| Resource watermarks | Rate of new SLB connections | Excessively high rate of new SLB connections |
| Resource watermarks | SLB QPS | Excessively high SLB QPS |
| Versions and certificates | Kubernetes version of a cluster | Outdated Kubernetes version of a cluster |
| Cluster risks | Whether an SLB instance is associated with the API server | No SLB instance associated with the API server |
| Cluster risks | Status of the SLB instance associated with the API server | Abnormal status of the SLB instance associated with the API server |
| Cluster risks | Configuration of the listener on port 6443 for the SLB instance associated with the API server | Errors in the listener configuration on port 6443 for the SLB instance associated with the API server |
| Cluster risks | Access control configuration of the SLB instance associated with the API server | Errors in the access control configuration of the SLB instance associated with the API server |
| Cluster risks | Cluster IP address of the DNS service | Abnormal cluster IP address of the DNS service |
| Cluster risks | Endpoints of the DNS service | No endpoints available for the DNS service |
| Cluster risks | Whether one SLB port is shared by multiple Services | One SLB port shared by multiple Services |
Resource quotas
Insufficient quota on SLB instances in a VPC
Condition: Fewer than five Server Load Balancer (SLB) instances can still be created in the cluster VPC.
Impact: Each LoadBalancer Service consumes one SLB instance. When the quota is exhausted, new LoadBalancer Services fail to work.
Solution: Request a quota increase in the Quota Center console. The default limit is 60 SLB instances per Alibaba Cloud account. For quota details, see Quotas.
Insufficient quota on SLB backend servers
Condition: Fewer than the maximum number of ECS instances can still be associated with an SLB instance.
Impact: Backend pods are spread across multiple ECS instances. When the backend server quota is exhausted, no additional ECS instances can be associated with the SLB instance, causing traffic routing to fail.
Solution: Request a quota increase in the Quota Center console. The default limit is 200 backend servers per SLB instance. For quota details, see Quotas.
Insufficient quota on SLB listeners
Condition: The quota on the number of listeners per SLB instance is running low.
Impact: Each port on a LoadBalancer Service maps to one SLB listener. When the listener quota is exhausted, ports without a listener stop receiving traffic.
Solution: Request a quota increase in the Quota Center console. The default limit is 50 listeners per SLB instance. For quota details, see Quotas.
Insufficient quota on SLB instances
Condition: Fewer than five SLB instances can still be created in your account.
Impact: An SLB instance is created for each LoadBalancer Service. When the SLB instance quota is exhausted, newly created LoadBalancer Services cannot work as expected.
Solution: Request a quota increase in the Quota Center console. The default limit is 60 SLB instances per Alibaba Cloud account.
Resource watermarks
Excessive SLB bandwidth usage
Condition: Peak outbound bandwidth over the previous three days exceeded 80% of the bandwidth limit.
Impact: When the bandwidth limit is reached, the SLB instance drops packets, causing network jitter or increased response latency.
Solution: Upgrade the SLB instance to a higher bandwidth tier. For instructions, see Use an existing SLB instance.
Excessive number of SLB connections
Condition: Peak concurrent connections over the previous three days exceeded 80% of the connection limit.
Impact: When the connection limit is reached, clients cannot establish new connections to the SLB instance.
Solution: Upgrade the SLB instance before connections reach the limit to avoid service interruptions. For instructions, see Use an existing SLB instance.
Excessively high rate of new SLB connections
Condition: The peak rate of new connections over the previous three days exceeded 80% of the upper limit.
Impact: When the rate limit is reached, clients cannot establish new connections within a short period.
Solution: Upgrade the SLB instance before the rate reaches the limit to avoid service interruptions. For instructions, see Use an existing SLB instance.
Excessively high SLB QPS
Condition: Peak queries per second (QPS) over the previous three days exceeded 80% of the upper limit.
Impact: When the QPS limit is reached, clients cannot connect to the SLB instance.
Solution: Upgrade the SLB instance before QPS reaches the limit to avoid service interruptions. For instructions, see Use an existing SLB instance.
Versions and certificates
Outdated Kubernetes version of a cluster
Condition: The cluster is running an outdated Kubernetes major version, or the current version is nearing end of support.
Impact: Outdated versions no longer receive security patches or feature updates and may lose compatibility with newer workloads and tooling.
Solution: Update the cluster to a supported Kubernetes version as soon as possible.
Cluster risks
No SLB instance associated with the API server
Condition: No SLB instance is fronting the API server.
Impact: With only a single API server and no load balancer, the API server becomes a single point of failure (SPOF). If that API server fails, the cluster becomes unresponsive.
Solution: Associate an SLB instance with the API server.
Abnormal status of the SLB instance associated with the API server
Condition: The SLB instance in front of the API server is in an abnormal state.
Impact: All cluster operations — pod scheduling, service deployment, scale-out — are interrupted or delayed. Service discovery also fails because it depends on the API server.
Solution: Check the SLB instance configurations, including backend servers, listening ports, and health checks.
Errors in the listener configuration on port 6443 for the SLB instance associated with the API server
Condition: The HTTPS listener on port 6443 of the API server's SLB instance is misconfigured or missing.
Impact: All requests routed through the SLB instance to the API server fail. This includes kubectl operations, dashboard access, and API calls from other services. Service name resolution also fails because DNS queries depend on the API server.
Solution: Check the SLB instance configurations, including backend servers, listening ports, and health checks. Verify that an HTTPS listener on port 6443 is configured.
Errors in the access control configuration of the SLB instance associated with the API server
Condition: The access control configuration on the API server's SLB instance contains errors.
Impact: Cluster management operations — node management, pod scheduling, service deployment — are blocked or limited. Any workload that depends on the API server for communication or service discovery is also affected.
Solution:
Review security groups and access control lists (ACLs) on the SLB instance. Verify that the required IP addresses and port 6443 are allowed.
Check the TLS/SSL configuration on both the SLB instance and the API server. Verify that certificates are valid.
Abnormal cluster IP address of the DNS service
Condition: The cluster IP address of the DNS service has not been assigned or is invalid.
Impact: DNS resolution fails across the cluster, causing cascading failures in workloads that rely on service name resolution.
Solution:
Check network plugin configurations for conflicts or errors.
Redeploy CoreDNS to restore a valid cluster IP address assignment.
No endpoints available for the DNS service
Condition: The DNS service has zero backend endpoints.
Impact: The DNS service is completely unavailable. All workloads that depend on service name resolution fail.
Solution: Check the Corefile configuration. Verify that the forward or proxy directive points to a valid set of backend DNS servers.
One SLB port shared by multiple Services
Condition: Multiple Kubernetes Services are sharing the same port on a single SLB instance.
Impact: Port conflicts cause one or more of those Services to become unavailable.
Solution: Delete or update the conflicting Services so that each Service uses a distinct port on the shared SLB instance.