Container Service for Kubernetes (ACK) - Node Auto-recovery in ACK Managed Pro Clusters Now Covers GPU-accelerated Nodes
Jun 30 2025
Container Service for Kubernetes (ACK)Content
Intended customers: all users. Details: We have enhanced the node auto-recovery feature for managed node pools to provide a fully automated, end-to-end workflow for handling GPU software and hardware anomalies. This new capability includes: automatic fault detection, notifications and alerts, automatic isolation of faulty GPU cards or entire nodes, and automatic remediation. To ensure you have full control, remediation actions can be configured to require user authorization before execution. This authorization step can be integrated with your own business platforms for seamless workflow management. This new feature minimizes the business impact of GPU failures and significantly lowers your O&M costs.















