Create intelligent managed node pools - Container Service for Kubernetes

Features

Full-lifecycle automated O&M: Automatically manages the entire lifecycle of nodes from creation to recycling. This includes features such as node image rotation, automatic replacement of abnormal nodes, and automatic patching of OS CVE vulnerabilities.
Instant node elasticity: Built-in instant node elasticity triggers node scale-out or scale-in within milliseconds based on the resource requests of Kubernetes workloads.
ContainerOS: Nodes use ContainerOS, which features an optimized underlying software stack and an immutable root file system for enhanced security.

Automated node O&M

Automatically manages the entire lifecycle of nodes, from creation and operation to termination. It continuously handles O&M responsibilities such as operating system upgrades, component maintenance, and security vulnerability patching, eliminating the need for manual node configuration planning.

Fault recovery: Automatically detects node anomalies and initiates recovery. You can configure whether to allow node reboots for fault recovery.
OS CVE patching: Automatically patches operating system vulnerabilities of high, medium, and low severity.
Operating system version upgrades: Automatically updates the node pool's operating system image and completes the upgrade through node rotation.
Automatic response to ECS system events: Automatically identifies and responds to ECS system events to improve node stability and availability.

Instant node elasticity

Includes built-in instant node elasticity to automatically scale nodes based on workload changes, removing the need for advance capacity planning. Billing is based on actual resource usage, which helps reduce waste from idle resources and optimize costs.

Faster elastic response: With an event-driven model, the system responds to scaling needs in 1 to 3 seconds.

These scaling actions, accelerated by ContainerOS capabilities, are completed in approximately 45±10 seconds.
More reliable resource delivery: The service automatically selects suitable instance types for scaling. If the target inventory is insufficient, it automatically falls back to other eligible instance types, achieving a resource delivery success rate of up to 99%. It also provides inventory warnings to identify potential risks with instance type combinations in advance.
Improved scheduling efficiency: Uses optimal bin packing and PreBind (a custom feature) strategies based on pod requirements, reducing scheduling fragmentation by 30%.

ContainerOS

Nodes use ContainerOS as their operating system. ContainerOS is designed specifically for containerized environments. It is fully compatible with the Kubernetes ecosystem and offers benefits such as fast startup, security hardening, and consistent upgrades.

Rapid node scale-out
- Streamlined image: The image includes only the necessary software packages and system services for running Kubernetes pods. System-level optimizations significantly reduce node startup time.
- Optimized for GPU scenarios: When using GPU instances, the system uses a GPU-optimized version of ContainerOS with pre-installed NVIDIA drivers and necessary runtime environments. This reduces post-startup installation and configuration steps. For example, when you deploy an inference service for the Qwen large model, the node Ready time is reduced from 200 seconds to 94 seconds compared to Alibaba Cloud Linux 3, which uses an OS image without built-in GPU drivers.
Security hardening
- Read-only root file system: The root file system is read-only by default. Only the /etc and /var directories are writable. This meets basic system configuration needs while adhering to the immutable infrastructure principle in cloud-native environments, which prevents container escapes from modifying the host file system.
- Minimal system exposure: By default, a Python runtime environment is not provided, and direct SSH login is disabled to prevent untraceable operations on the system. For special O&M scenarios, a dedicated O&M container is provided as a supplement.
Atomic upgrades
- Image-level updates and rollbacks: Following the immutable infrastructure concept, ContainerOS does not provide traditional package managers like yum. It supports granular updates and rollbacks at the operating system image level (disk replacement upgrades) and limited layered hot-fixes. This ensures that the software versions and system configurations of all nodes in the cluster remain consistent.

Comparison with other node pool modes

The following table compares the configuration capabilities of intelligent managed node pools with standard managed node pools and node pools where management is disabled.

Managed configuration		Disabled	Managed node pool	Intelligent management
Node pool configuration	Instance type	Manual configuration	Manual configuration	Configurable, with intelligent instance type recommendations.
	Billing method	Manual configuration	Manual configuration	Pay-as-you-go only.
	Operating system	Manual configuration	Manual configuration	Supports only the container-optimized operating system ContainerOS.
	System disk	Manual configuration	Manual configuration	Recommended default: 20 GiB.
	Data disk	Manual configuration	Manual configuration	ContainerOS uses one data disk for temporary storage. The size is configurable.
Auto scaling		Can be enabled and configured manually.	Can be enabled and configured manually.	Instant node elasticity is enabled by default. Manual configuration is supported.
Automated O&M capabilities	Automatic response to ECS system events	Not supported	Enabled by default	Enabled by default
	Node self-healing	Not supported	Can be enabled and configured manually.	Enabled by default
	Automatic kubelet and containerd upgrades	Manually configured using the automatic cluster upgrade feature.		Enabled by default
	Automatic OS CVE vulnerability fixes	Not supported	Can be enabled and configured manually.	Enabled by default

Usage notes

Capacity boundaries
- When you use an intelligent managed node pool, ACK dynamically scales nodes based on workload demand. By default, it supports scaling out to a maximum of 50 nodes. You can change the maximum number of instances by using the node pool's scaling settings.
- Intelligent managed node pools do not support certain instance types, such as Arm-based instances or instances with local disks, and they support only ContainerOS 3.6 and later. ACK recommends default instance type families that meet the needs of most use cases. You can also adjust the settings in the console based on your specific business requirements. We recommend configuring a sufficient number of instance types to enhance the node pool's elasticity and prevent scaling failures.
Operational boundaries
- With an intelligent managed node pool, ACK is responsible for O&M tasks such as operating system version upgrades, software version upgrades, and security patching. These tasks may involve software version upgrades, configuration changes, reboots, and node draining and eviction. To avoid policy conflicts, do not perform manual O&M on the ECS nodes in the node pool, such as rebooting, mounting data disks, or logging in to nodes to modify configurations.
- Set appropriate replica counts for your workloads, as well as PreStop hooks for graceful shutdowns and PodDisruptionBudget (PDB) policies. This ensures that nodes can be safely drained without service interruptions.
- Although intelligent managed node pools automate Kubernetes node O&M, you are still responsible for certain obligations under the shared responsibility model.
Storage guidelines
- Intelligent managed node pools use ContainerOS, which enhances node security with an immutable root file system. Avoid using node system paths for storage, such as HostPath. We recommend using PersistentVolumeClaims (PVCs) for persistent storage.

Quick start

You can create an intelligent managed node pool in an ACK Managed Cluster Pro Edition.

On the ACK Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Node Pools.
On the Node Pools page, click Create Node Pool. Set Configure Managed Node Pool to Auto Mode, and configure settings like the number of instances and network configuration. For detailed descriptions of the configuration items, see Create a node pool.

Click Confirm to create the intelligent managed node pool.

Container Service for Kubernetes:Intelligent managed node pool