ACK lets you manage nodes in groups using node pools. A node pool is a logical collection of nodes that share the same properties, such as instance types, operating systems, labels, and taints. You can create multiple node pools with different configurations and types within a cluster for unified node management.
Before you create a node pool, see Node pools to learn about the basic concepts, scenarios, features, and billing of node pools.
Entry points
On the Node Pools page, you can create, edit, delete, and view the node pools in your cluster.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster to manage and click its name. In the left navigation pane, choose .
Create a node pool
You can configure a node pool in the console, including its basic, network, and storage configurations. Some parameters, especially those related to node pool availability and networking, cannot be changed after the node pool is created. Creating a node pool does not affect existing nodes or services in other node pools.
In addition to the console, you can also create node pools using the API and Terraform. For more information, see CreateClusterNodePool and Use Terraform to create a node pool that has auto scaling enabled.
On the Node Pools page, click Create Node Pool. In the Create Node Pool dialog box, configure the parameters.
After you create a node pool, you can modify its configuration items on the Edit Node Pool page. The Modifiable column in the following table indicates whether a configuration item can be modified after the node pool is created. An
icon indicates that the configuration item cannot be modified, and a
icon indicates that it can.Basic configuration
Parameter
Description
Modifiable
Node Pool Name
Specify a node pool name.
✓
Confidential Computing
NoteCurrently, only whitelisted users can configure confidential computing. Please submit a ticket to apply to be added to the whitelist.
This parameter is required only when Container Runtime is set to containerd.
Specify whether to enable Confidential Computing. Confidential computing is a one-stop, cloud-native confidential computing container platform that is based on hardware encryption technologies for users with high security requirements. It helps you protect the security, integrity, and confidentiality of your data during use (computation), while also simplifying the development, delivery, and management of trusted or confidential applications to reduce costs. For more information, see ACK TEE-based confidential computing.
✗
Container Runtime
Specify the Container Runtime based on the Kubernetes version. For more information about how to select a container runtime, see Comparison among containerd, Sandboxed-Container, and Docker.
containerd (recommended): supports all Kubernetes versions.
Sandboxed-Container: supports Kubernetes 1.31 and earlier.
Docker (deprecated): supports Kubernetes 1.22 and earlier.
✗
Scaling Mode
Manual and Auto scalings are supported. Computing resources are automatically adjusted as needed and policies to reduce cluster costs.
Manual: ACK adjusts the number of nodes in the node pool based on the value of the Expected Nodes parameter. The number of nodes is always the same as the value of the Expected Nodes parameter. For more information, see Manually scale a node pool.
Auto: When the capacity planning of the cluster cannot meet the requirements of pod scheduling, ACK automatically scales out nodes based on the configured minimum and maximum number of instances. By default, node instant scaling is enabled for clusters running Kubernetes 1.24 and later, and node autoscaling is enabled for clusters running Kubernetes versions earlier than 1.24. For more information, see Node scaling.
✓
Managed configuration
ACK provides three types of managed configurations that enable different levels of automated operations and maintenance (O&M) for node pools.
Intelligent Hosting: After you enable intelligent hosting, ACK automatically and dynamically scales the node pool based on workload requirements. ACK is also responsible for O&M tasks such as operating system upgrades, software upgrades, and security vulnerability fixes.
This feature is available only in clusters with intelligent hosting enabled.
Managed Node Pool: Select the required automated O&M capabilities and the cluster maintenance window for performing automated O&M tasks.
Disabled: Do not use automated O&M. Manually perform O&M on nodes and node pools.
For more information about the differences between and notes on each managed configuration, see Comparison of managed configuration capabilities and notes.
Network configuration
Parameter
Description
Modifiable
Network Configuration
VPC
The VPC of the cluster is selected by default and cannot be changed after the node pool is created.
Cloud resources and billing:
VPC✗
VSwitch
When the node pool scales, nodes are scaled in or out in the zones of the selected vSwitches based on the Scaling Policy. For high availability, select two or more vSwitches in different zones.
To create a vSwitch, see Create and manage vSwitches.
✓
Instance and image configuration
Parameter
Description
Modifiable
Billing Method
The default billing method used when ECS instances are scaled in a node pool. You can select Pay-As-You-Go, Subscription, or Spot Instance.
If you select the Subscription billing method, you must configure the Duration parameter and choose whether to enable Auto Renewal.
Spot Instance: Currently, only Spot Instances with a protection period are supported. You must also configure the Maximum Price Per Instance.
If the real-time market price of an instance type that you select is lower than the value of this parameter, a spot instance of this instance type is created. After the protection period (1 hour) ends, the system checks the spot price and resource availability of the instance type every 5 minutes. If the real-time market price exceeds your bid price or if the resource inventory is insufficient, the spot instance is released. For more information, see Best practices for preemptible instance-based node pools.
To ensure that all nodes in a node pool use the same billing method, ACK does not allow you to change the billing method of a node pool from Pay-as-you-go or Subscription to Spot Instances or from Spot Instances to Pay-as-you-go or Subscription.
ImportantChanging the billing method of a node pool affects only new nodes. The billing method of existing nodes in the node pool remains unchanged. If you want to change the billing method of existing nodes in the node pool, see Change the billing method of an instance from pay-as-you-go to subscription.
✓
Instance-related configurations
Select the ECS instances used by the worker node pool based on instance types or attributes. You can filter instance families by attributes such as vCPU, memory, instance family, and architecture. For more information about the instance specifications not supported by ACK and how to configure nodes, see ECS instance type recommendations.
When the node pool is scaled out, ECS instances of the selected instance types are created. The scaling policy of the node pool determines which instance types are used to create new nodes during scale-out activities. Select multiple instance types to improve the success rate of node pool scale-out operations.
If the node pool fails to be scaled out because the instance types are unavailable or the instances are out of stock, you can specify more instance types for the node pool. The ACK console automatically evaluates the scalability of the node pool. You can check the scalability of the node pool when you create the node pool or after you create the node pool.
✓
Operating System
Public Image: Container Service for Kubernetes provides public images of operating systems, such as Alibaba Cloud Linux 3 Container-optimized, ContainerOS, Alibaba Cloud Linux 3, Ubuntu, and Windows. For more information, see Operating system.
Custom Image: Use a custom operating system image. For more information, see How do I create a custom image based on an existing ECS instance and use the image to create a node?.
NoteFor more information about how to upgrade or change the operating system, see Change the OS.
✓
Security Hardening
Enable security hardening for the cluster. You cannot modify this parameter after the cluster is created.
Disable: disables security hardening for ECS instances.
MLPS Security Hardening: Alibaba Cloud provides baselines and the baseline check feature to help you check the compliance of Alibaba Cloud Linux 2 images and Alibaba Cloud Linux 3 images with the level 3 standards of MLPS 2.0. MLPS Security Hardening enhances the security of OS images to meet the requirements of GB/T 22239-2019 Information Security Technology - Baseline for Classified Protection of Cybersecurity without compromising the compatibility and performance of the OS images. For more information, see ACK security hardening based on MLPS.
ImportantAfter you enable MLPS Security Hardening, remote logons through SSH are prohibited for root users. You can use Virtual Network Computing (VNC) to log on to the OS from the ECS console and create regular users that are allowed to log on through SSH. For more information, see Connect to an instance using VNC.
OS Security Hardening: You can enable Alibaba Cloud OS Security Hardening only when the system image is an Alibaba Cloud Linux 2 or Alibaba Cloud Linux 3 image.
✗
Logon Method
If you select MLPS Security Hardening, only the Password option is supported.
Operating System to ContainerOS, the valid values are Key Pair and Later. To use key pairs, restart the administrative container to apply the configurations. For details, see Maintain ContainerOS nodes.
The available options are Set Key, Set Password, and Set After Creation.
Set during creation:
Key Pair: Alibaba Cloud SSH key pairs provide a secure and convenient method to log on to ECS instances. An SSH key pair consists of a public key and a private key. SSH key pairs support only Linux instances.
Configure the Username (select root or ecs-user as the username) and the Key Pair parameters.
Password: The password must be 8 to 30 characters in length, and can contain letters, digits, and special characters.
Configure the Username (select root or ecs-user as the username) and the Password parameters.
Configure After Creation: After you create the instance, attach a key pair or reset its logon password. For more information, see Attach an SSH key pair to an instance and Reset the logon password of an instance.
✓
Storage configuration
Parameter
Description
Modifiable
System Disk
ESSD AutoPL, ESSD, ESSD Entry, Standard SSD, and Ultra Disk are supported. The types of system disks that you can select vary based on the instance family that you select. Disk types that are not displayed in the drop-down list are not supported by the instance types that you select.
You can select Configure More System Disk Types to configure a disk type different from the System Disk to improve the scale-out success rate. When an instance is created, the system selects the first matching disk type based on the specified order of disk types to create the instance.
✓
Data Disk
ESSD AutoPL, ESSD, ESSD Entry, and previous-generation disks (Standard SSD and Ultra Disk) are supported. The data disk types that you can select vary based on the instance family that you select. Disk types that are not displayed in the drop-down list are not supported by the instance types that you select.
You can select Encryption for all disk types when you specify the type of data disk. By default, the Default Service CMK is used to encrypt the data disk. You can also use an existing CMK generated using BYOK in KMS.
You can also use snapshots to create data disks in scenarios where container image acceleration and fast loading of large language models (LLMs) are required. This improves the system response speed and enhances the processing capability.
During node creation, the last data disk will be automatically formatted. The system will mount
/var/lib/containerto this disk, while mounting/var/lib/kubeletand/var/lib/containerdto/var/lib/container. To custom mount points, modify the initialization configuration of the data disk. Only one data disk can be selected as the container runtime directory. For usage instructions, see Can data disks in an ACK node pool be mounted to custom directories?
NoteUp to 64 data disks can be attached to an ECS instance. The number of disks that can be attached to an ECS instance varies based on the instance type. To query the maximum number of data disks supported by each instance type, call the DescribeInstanceTypesoperation and query the DiskQuantity parameter in the response.
You can select Configure More Data Disk Types to configure a disk type different from the Data Disk to improve the scale-out success rate. When an instance is created, the system selects the first matching disk type based on the specified order of disk types to create the instance.
✓
Instance quantity
Parameter
Description
Modifiable
Expected Number Of Nodes
The expected number of nodes in the node pool. We recommend that you configure at least 2 nodes to ensure that cluster components run as expected. You can configure the Expected Nodes parameter to adjust the number of nodes in the node pool. For more information, see Scale a node pool.
If you do not want to create nodes in the node pool, set this parameter to 0. You can manually modify this parameter to add nodes later.
✓
Advanced configuration
Expand Advanced Options (Optional) to configure the node scaling policy, resource group, ECS tags, taints, and other settings.
Click Confirm Configuration.
On the Confirm Configuration page, you can click Equivalent Code in the lower-left corner to generate the equivalent Terraform or SDK code for the current node pool configuration.
In the node pool list, a Status of Initializing indicates that the node pool is being created. A status of Active indicates that the node pool is successfully created.
Edit a node pool
After you create a node pool, you can adjust some of its settings in the ACK console, such as the vSwitches, billing method, instance types, and system disk. You can also enable or disable elastic scaling. For more information about which settings can be updated, see the parameter descriptions in Create a node pool.
Editing a node pool does not affect existing nodes or services in the node pool. After the node pool configuration is updated, all new nodes added to the node pool use the new configuration by default.
Updates to a node pool configuration apply only to new nodes. Existing nodes remain unaffected, except in scenarios such as Sync ECS Tags To Existing Nodes and Sync Labels And Taints To Existing Nodes.
When you change the Scaling Mode:
Changing the mode from Manual to Auto enables auto scaling. You must also set the minimum and maximum number of instances.
From Auto to Manual: This action disables auto scaling. The minimum and maximum number of instances are set to 0 and 2000, respectively. The Expected Number Of Nodes is automatically set to the current number of nodes in the node pool.
Update the node pool configuration by following the steps in this section. If you modify nodes using other methods, the changes are overwritten when the node pool is upgraded.
Find the target node pool in the list of node pools and click Edit in the Actions column.
On the edit node pool page, modify the parameters of the node pool as prompted.
On the node pool page, the Status of a node pool is Updating while it is being modified. After the modification is complete, the Status changes to Active.
View a node pool
You can view the basic information, monitoring data, node information, and scaling activity records of a node pool.
Click the name of the target node pool to view the following information.
Basic Information tab: displays cluster information, node pool information, and node configuration information. If auto scaling is enabled for the cluster, the auto scaling configuration is also displayed.
Monitoring tab: integrates with Alibaba Cloud Prometheus Service to display the resource usage of the node pool, such as CPU usage, memory usage, disk usage, and the average CPU and memory usage of nodes.
Node Management tab: lists all nodes in the current node pool. On this tab, you can remove, drain, schedule, and perform O&M on nodes. Click Export to export the node information to a CSV file.
Scaling Activities tab: displays recent scaling activity records for node instances, including the number of instances after scaling and a description of the scaling activity. If a scaling activity fails, you can view the reason for the failure. For more information about common error codes for scaling failures, see Manually scale a node pool.
Delete a node pool
The rules for releasing instances vary depending on their billing method. Follow the standard procedure below to delete nodes from a node pool. Before you delete a node pool, check whether Expected Number Of Nodes is set for the node pool, because this directly affects the node release behavior.
Node pool | Release rule |
Node pool with Expected Number of Nodes enabled |
|
Node pool with Expected Number of Nodes disabled |
|
(Optional) Click the name of the target node pool and on the Basic Information tab, check whether Expected Number Of Nodes is configured. If this feature is not enabled, the Expected Number Of Nodes field displays a hyphen (-).
In the Actions column of the target node pool, click
> Delete. In the dialog box that appears, read the confirmation message and click OK.ImportantThe lifecycles of the system disk and data disks are bound to the node. When a node is released, its disks are also released. All data on the disks is permanently lost and cannot be recovered. To ensure data persistence, use PersistentVolumes (PVs) to manage storage. This practice decouples storage data from the node lifecycle and ensures data security.
Related operations
After the node pool is active, you can perform the following operations on the node pool list page as needed.
UI Element | Description | References |
Sync Node Pool | Synchronize the data of the node pool when node information is abnormal. | None |
Details | View the configuration details of the node pool. | None |
Scale | Manual and Auto scalings are supported. Computing resources are automatically adjusted as needed and policies to reduce cluster costs.
| |
Edit | Adjust the configuration of the node pool, such as the vSwitches, managed node pool configuration, billing method, instance types, and enabling or disabling elastic scaling. | See Edit a node pool in this topic. |
Monitoring | Integrates with Alibaba Cloud Prometheus Service to display the resource usage of the node pool, including CPU or memory usage, disk usage, and average CPU or memory usage of nodes. | See View a node pool in this topic. |
Add Existing Nodes | To add an ECS instance to an ACK cluster as a worker node after purchase, or rejoin a worker node to a node pool after removal, use the add existing nodes feature. This feature has some limits and notes. For more information, see the document. | |
Configure Logon Method | Set the logon method for nodes. Both key pair and password methods are supported. | See Instance and image configuration in this topic. |
Managed Configuration | Enable automated O&M for the node pool, including automatic node failure recovery, automatic upgrades for kubelet and runtime, and automatic OS CVE fixes. | See Basic configuration in this topic. |
Clone | Clone a node pool with the same configuration based on an existing node pool. | None |
Delete | Delete a node pool that is no longer in use to reduce unnecessary resource waste. Whether the node pool has the expected number of nodes enabled and the billing mode of the nodes affect the node release behavior. | See Delete a node pool in this topic. |
Kubelet Configuration | Customize the kubelet parameter settings for nodes at the node pool level to adjust node behavior, such as reserving resources for the entire cluster to allocate resource usage. | |
Containerd Configuration | Customize the containerd parameter settings for nodes at the node pool level. For example, you can configure multiple mirror repositories for a specified image repository at the same time, or specify to skip the security certificate verification for a certain image repository. | |
OS Configuration | Customize the OS parameter settings for nodes at the node pool level to tune system performance. | |
Kubelet Upgrade | Upgrade the kubelet and containerd versions of the nodes in the node pool. | |
Change Operating System | Change the node operating system type or upgrade the operating system version. | |
Fix CVE (OS) | Perform batch fixes for CVE vulnerabilities to improve cluster stability, security, and compliance. Some CVE vulnerability fixes require restarting nodes. For more information about the feature and its notes, see the document. | |
Node Recovery | When a node in a managed node pool becomes abnormal, ACK automatically initiates a recovery operation for the faulty node to keep it running normally. Some complex node failures may still require manual repair. For more information about the checks provided by ACK and the specific recovery behaviors, see the document. |
Comparison of managed configuration capabilities
Managed Configuration | Disabled | Managed Node Pool | Intelligent Hosting | |
Node pool configuration | Instance type | Manual configuration | Manual configuration | Configurable. Supports intelligent recommendations by instance type. |
Billing method | Manual configuration | Manual configuration | Only pay-as-you-go is supported. | |
Operating system | Manual configuration | Manual configuration | Only the container-optimized OS ContainerOS is supported. | |
System disk | Manual configuration | Manual configuration | Default recommended configuration, 20 GiB | |
Data disk | Manual configuration | Manual configuration | One data disk is used for temporary storage of the ContainerOS operating system. The size is configurable. | |
Auto scaling | Optional to enable, manual configuration | Optional to enable, manual configuration | Built-in instant elastic scaling is enabled, can be manually configured | |
Automatic response to ECS system events | Not supported | Enabled by default | Enabled by default | |
Node auto-repair | Not supported | Optional to enable, manual configuration | Enabled by default | |
Automatic upgrade of kubelet and runtime versions | Not supported | Optional to enable, manual configuration | Enabled by default | |
Automatic OS CVE vulnerability repair | Not supported | Optional to enable, manual configuration | Enabled by default | |
After you enable intelligent hosting for a node pool, the node pool dynamically scales nodes based on your workload requirements. By default, a node pool can be scaled out to a maximum of 50 nodes. You can modify the maximum number of instances using the scaling feature of the node pool.
After you enable intelligent hosting for a node pool, ACK takes over O&M responsibilities, such as operating system (OS) upgrades, software upgrades, and vulnerability patching.
These responsibilities include tasks such as software version upgrades, software configuration modifications, restarts, and drain evictions.
To prevent conflicts with automation policies, avoid performing manual operations on the ECS nodes within the node pool, such as restarting nodes, attaching data disks, or modifying configurations by logging on to the nodes.
Set reasonable replica counts for your workloads, implement PreStop graceful shutdown policies, and establish PodDisruptionBudget policies to ensure that nodes can be drained for maintenance without interrupting your business.
After you enable intelligent hosting for a node pool, ACK enhances node security based on the ContainerOS operating system, which uses an immutable root file system. Avoid using storage on the node's system path, such as HostPath. We recommend that you use PVCs for persistent storage.
After you enable intelligent hosting for a node pool, instance types such as Arm, GPU, and local disk are not supported. ACK recommends default instance types that can meet application needs in most scenarios. You can also adjust the instance types in the console based on your business scenarios. We recommend that you specify a sufficient number of instance types to improve the scaling flexibility of the node pool and prevent scaling failures.
Intelligent hosting aims to provide automated and intelligent Kubernetes cluster O&M features. However, in certain scenarios, you still need to fulfill some obligations. For more information, see Shared responsibility model.
FAQ
How do I create a custom image from an ECS instance and use the image to create a node?
After you create an ECS instance, you can customize it by installing software and deploying application environments. Then, you can create a custom image from the instance. Instances created from this custom image contain all your customizations, which eliminates the need for repeated configuration.
Log on to the ECS instance and run the following command to delete the specified files. For more information about how to log on to an instance, see Log on to a Linux instance using Workbench.
chattr -i /etc/acknode/nodeconfig-* rm -rf /etc/acknode systemctl disable ack-reconfig rm -rf /etc/systemd/system/ack-reconfig.service rm -rf /usr/local/bin/reconfig.sh rm -rf /var/lib/cloud systemctl stop kubelet systemctl disable kubelet rm -rf /etc/systemd/system/kubelet.service rm -rf /etc/systemd/system/kubelet.service.d/10-kubeadm.confCreate a custom image from the ECS instance. For more information about the notes and procedure, see Create a custom image from an instance.
Configure the node pool by setting the operating system type to Custom Image and completing the creation process as described earlier in this topic.
Create a custom image based on the operating system supported by the ACK cluster. For more information, see Operating system.
Do not build custom images on running ECS instances in an ACK cluster. To do this, you must first remove the ECS instances from the cluster. For more information, see Remove a node.
The predefined behavior logic in a custom image may affect operations such as cluster node initialization, container launching, node OS upgrades, and automatic recovery of nodes in a managed node pool. Before you use it in a production environment, ensure that the custom image has been tested and validated.
References
If a node is no longer needed, remove it by following the standard procedure. For more information, see Remove a node.
ACK reserves a certain amount of node resources for kube components and system processes. For more information, see Node resource reservation policy.
If the planned capacity of the cluster cannot meet the scheduling requirements of application pods, enable node scaling to automatically scale node resources. For more information, see Node scaling.
The maximum number of pods that a single worker node can support is affected by the network plug-in type and cannot be changed in most scenarios. To increase the number of available pods, you can scale out the node pool, upgrade the instance specifications, or recreate the cluster and re-plan the pod CIDR block. For more information, see Adjust the number of available pods on a node.