Intelligent managed node pools offer a fully managed, maintenance-free node management mode for ACK Managed Cluster Pro Edition. When enabled, this feature automatically handles the dynamic scaling of nodes based on workload demand and provides O&M capabilities such as operating system upgrades, security patching, and self-healing, eliminating the need to manually create and maintain nodes.
Features
-
Full-lifecycle automated O&M: Automatically manages the entire lifecycle of nodes from creation to recycling. This includes features such as node image rotation, automatic replacement of abnormal nodes, and automatic patching of OS CVE vulnerabilities.
-
Instant node elasticity: Built-in instant node elasticity triggers node scale-out or scale-in within milliseconds based on the resource requests of Kubernetes workloads.
-
ContainerOS: Nodes use ContainerOS, which features an optimized underlying software stack and an immutable root file system for enhanced security.
Automated node O&M
Automatically manages the entire lifecycle of nodes, from creation and operation to termination. It continuously handles O&M responsibilities such as operating system upgrades, component maintenance, and security vulnerability patching, eliminating the need for manual node configuration planning.
-
Fault recovery: Automatically detects node anomalies and initiates recovery. You can configure whether to allow node reboots for fault recovery.
-
OS CVE patching: Automatically patches operating system vulnerabilities of high, medium, and low severity.
-
Operating system version upgrades: Automatically updates the node pool's operating system image and completes the upgrade through node rotation.
-
Automatic response to ECS system events: Automatically identifies and responds to ECS system events to improve node stability and availability.
Instant node elasticity
Includes built-in instant node elasticity to automatically scale nodes based on workload changes, removing the need for advance capacity planning. Billing is based on actual resource usage, which helps reduce waste from idle resources and optimize costs.
-
Faster elastic response: With an event-driven model, the system responds to scaling needs in 1 to 3 seconds.
These scaling actions, accelerated by ContainerOS capabilities, are completed in approximately 45±10 seconds.
-
More reliable resource delivery: The service automatically selects suitable instance types for scaling. If the target inventory is insufficient, it automatically falls back to other eligible instance types, achieving a resource delivery success rate of up to 99%. It also provides inventory warnings to identify potential risks with instance type combinations in advance.
-
Improved scheduling efficiency: Uses optimal bin packing and PreBind (a custom feature) strategies based on pod requirements, reducing scheduling fragmentation by 30%.
ContainerOS
Nodes use ContainerOS as their operating system. ContainerOS is designed specifically for containerized environments. It is fully compatible with the Kubernetes ecosystem and offers benefits such as fast startup, security hardening, and consistent upgrades.
-
Rapid node scale-out
-
Streamlined image: The image includes only the necessary software packages and system services for running Kubernetes pods. System-level optimizations significantly reduce node startup time.
-
Optimized for GPU scenarios: When using GPU instances, the system uses a GPU-optimized version of ContainerOS with pre-installed NVIDIA drivers and necessary runtime environments. This reduces post-startup installation and configuration steps. For example, when you deploy an inference service for the Qwen large model, the node Ready time is reduced from 200 seconds to 94 seconds compared to Alibaba Cloud Linux 3, which uses an OS image without built-in GPU drivers.
-
-
Security hardening
-
Read-only root file system: The root file system is read-only by default. Only the
/etcand/vardirectories are writable. This meets basic system configuration needs while adhering to the immutable infrastructure principle in cloud-native environments, which prevents container escapes from modifying the host file system. -
Minimal system exposure: By default, a Python runtime environment is not provided, and direct SSH login is disabled to prevent untraceable operations on the system. For special O&M scenarios, a dedicated O&M container is provided as a supplement.
-
-
Atomic upgrades
-
Image-level updates and rollbacks: Following the immutable infrastructure concept, ContainerOS does not provide traditional package managers like
yum. It supports granular updates and rollbacks at the operating system image level (disk replacement upgrades) and limited layered hot-fixes. This ensures that the software versions and system configurations of all nodes in the cluster remain consistent.
-
Comparison with other node pool modes
The following table compares the configuration capabilities of intelligent managed node pools with standard managed node pools and node pools where management is disabled.
Managed configuration | Disabled | Managed node pool | Intelligent management | |
Node pool configuration | Instance type | Manual configuration | Manual configuration | Configurable, with intelligent instance type recommendations. |
Billing method | Manual configuration | Manual configuration | Pay-as-you-go only. | |
Operating system | Manual configuration | Manual configuration | Supports only the container-optimized operating system ContainerOS. | |
System disk | Manual configuration | Manual configuration | Recommended default: 20 GiB. | |
Data disk | Manual configuration | Manual configuration | ContainerOS uses one data disk for temporary storage. The size is configurable. | |
Auto scaling | Can be enabled and configured manually. | Can be enabled and configured manually. | Instant node elasticity is enabled by default. Manual configuration is supported. | |
Automatic response to ECS system events | Not supported | Enabled by default | Enabled by default | |
Node self-healing | Not supported | Can be enabled and configured manually. | Enabled by default | |
Automatic kubelet and containerd upgrades | Manually configured using the automatic cluster upgrade feature. | Enabled by default | ||
Automatic OS CVE vulnerability fixes | Not supported | Can be enabled and configured manually. | Enabled by default | |
Usage notes
-
Capacity boundaries
-
When you use an intelligent managed node pool, ACK dynamically scales nodes based on workload demand. By default, it supports scaling out to a maximum of 50 nodes. You can change the maximum number of instances by using the node pool's scaling settings.
-
Intelligent managed node pools do not support certain instance types, such as Arm-based instances or instances with local disks, and they support only ContainerOS 3.6 and later. ACK recommends default instance type families that meet the needs of most use cases. You can also adjust the settings in the console based on your specific business requirements. We recommend configuring a sufficient number of instance types to enhance the node pool's elasticity and prevent scaling failures.
-
-
Operational boundaries
-
With an intelligent managed node pool, ACK is responsible for O&M tasks such as operating system version upgrades, software version upgrades, and security patching. These tasks may involve software version upgrades, configuration changes, reboots, and node draining and eviction. To avoid policy conflicts, do not perform manual O&M on the ECS nodes in the node pool, such as rebooting, mounting data disks, or logging in to nodes to modify configurations.
-
Set appropriate replica counts for your workloads, as well as PreStop hooks for graceful shutdowns and PodDisruptionBudget (PDB) policies. This ensures that nodes can be safely drained without service interruptions.
-
Although intelligent managed node pools automate Kubernetes node O&M, you are still responsible for certain obligations under the shared responsibility model.
-
-
Storage guidelines
-
Intelligent managed node pools use ContainerOS, which enhances node security with an immutable root file system. Avoid using node system paths for storage, such as HostPath. We recommend using PersistentVolumeClaims (PVCs) for persistent storage.
-
Quick start
You can create an intelligent managed node pool in an ACK Managed Cluster Pro Edition.
On the ACK Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Node Pools page, click Create Node Pool. Set Configure Managed Node Pool to Auto Mode, and configure settings like the number of instances and network configuration. For detailed descriptions of the configuration items, see Create a node pool.
Click Confirm to create the intelligent managed node pool.
Related documents
-
We recommend using intelligent managed node pools in an Auto Mode cluster.
-
You can use GPU compute resources to quickly deploy large model inference services. For more information, see Deploy an inference service for the Qwen large model.