Create and manage managed node pools and enable automated O&M - Container Service for Kubernetes

ACK lets you manage nodes in groups using node pools. A node pool is a logical collection of nodes that share the same properties, such as instance types, operating systems, labels, and taints. You can create multiple node pools with different configurations and types within a cluster for unified node management.

Before you create a node pool, see Node pools to learn about the basic concepts, scenarios, features, and billing of node pools.

Entry points

On the Node Pools page, you can create, edit, delete, and view the node pools in your cluster.

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster to manage and click its name. In the left navigation pane, choose Nodes > Node Pools.

Create a node pool

You can configure a node pool in the console, including its basic, network, and storage configurations. Some parameters, especially those related to node pool availability and networking, cannot be changed after the node pool is created. Creating a node pool does not affect existing nodes or services in other node pools.

Note

In addition to the console, you can also create node pools using the API and Terraform. For more information, see CreateClusterNodePool and Use Terraform to create a node pool that has auto scaling enabled.

On the Node Pools page, click Create Node Pool. In the Create Node Pool dialog box, configure the parameters.

After you create a node pool, you can modify its configuration items on the Edit Node Pool page. The Modifiable column in the following table indicates whether a configuration item can be modified after the node pool is created. An icon indicates that the configuration item cannot be modified, and a icon indicates that it can.

Basic configuration

Parameter	Description	Modifiable
Node Pool Name	Specify a node pool name.	✓
Confidential Computing	Note Currently, only whitelisted users can configure confidential computing. Please submit a ticket to apply to be added to the whitelist. This parameter is required only when Container Runtime is set to containerd. Specify whether to enable Confidential Computing. Confidential computing is a one-stop, cloud-native confidential computing container platform that is based on hardware encryption technologies for users with high security requirements. It helps you protect the security, integrity, and confidentiality of your data during use (computation), while also simplifying the development, delivery, and management of trusted or confidential applications to reduce costs. For more information, see ACK TEE-based confidential computing.	✗
Container Runtime	Specify the Container Runtime based on the Kubernetes version. For more information about how to select a container runtime, see Comparison among containerd, Sandboxed-Container, and Docker. containerd (recommended): supports all Kubernetes versions. Sandboxed-Container: supports Kubernetes 1.31 and earlier. Docker (deprecated): supports Kubernetes 1.22 and earlier.	✗
Scaling Mode	Manual and Auto scalings are supported. Computing resources are automatically adjusted as needed and policies to reduce cluster costs. Manual: ACK adjusts the number of nodes in the node pool based on the value of the Expected Nodes parameter. The number of nodes is always the same as the value of the Expected Nodes parameter. For more information, see Manually scale a node pool. Auto: When the capacity planning of the cluster cannot meet the requirements of pod scheduling, ACK automatically scales out nodes based on the configured minimum and maximum number of instances. By default, node instant scaling is enabled for clusters running Kubernetes 1.24 and later, and node autoscaling is enabled for clusters running Kubernetes versions earlier than 1.24. For more information, see Node scaling.	✓

Managed configuration

ACK provides three types of managed configurations that enable different levels of automated operations and maintenance (O&M) for node pools.

Intelligent Hosting: After you enable intelligent hosting, ACK automatically and dynamically scales the node pool based on workload requirements. ACK is also responsible for O&M tasks such as operating system upgrades, software upgrades, and security vulnerability fixes.
This feature is available only in clusters with intelligent hosting enabled.

Managed Node Pool: Select the required automated O&M capabilities and the cluster maintenance window for performing automated O&M tasks.

Expand to view related configurations

Parameter	Description	Modifiable
Node Auto-repair	This parameter is available after you select Enable for the managed node pool feature. After you select this option, ACK automatically monitors the status of nodes in the node pool. When exceptions occur on a node, ACK automatically runs auto repair tasks on the node. If you select Restart Faulty Node, ACK may perform node draining and system disk replacement on faulty nodes. For more information about the conditions that trigger auto repair and auto repair events, see Enable auto repair for nodes.	✓
Auto-repair Security Vulnerabilities	This parameter is available after you select Enable for the managed node pool feature. You can configure ACK to automatically patch high-risk, medium-risk, and low-risk vulnerabilities. For more information, see Patch OS CVE vulnerabilities for node pools. Some patches take effect only after you restart the servers. After you enable Restart Nodes If Necessary To Patch CVE Vulnerabilities, ACK automatically restarts nodes on demand. If you do not select this option, you must manually restart nodes after the patches are applied.	✓

Disabled: Do not use automated O&M. Manually perform O&M on nodes and node pools.

For more information about the differences between and notes on each managed configuration, see Comparison of managed configuration capabilities and notes.

Network configuration

Parameter

Description

Modifiable

Network Configuration

VPC

The VPC of the cluster is selected by default and cannot be changed after the node pool is created.

Cloud resources and billing: VPC

✗

VSwitch

When the node pool scales, nodes are scaled in or out in the zones of the selected vSwitches based on the Scaling Policy. For high availability, select two or more vSwitches in different zones.

To create a vSwitch, see Create and manage vSwitches.

VPC

✓

Instance and image configuration

Parameter	Description	Modifiable
Billing Method	The default billing method used when ECS instances are scaled in a node pool. You can select Pay-As-You-Go, Subscription, or Spot Instance. If you select the Subscription billing method, you must configure the Duration parameter and choose whether to enable Auto Renewal. Spot Instance: Currently, only Spot Instances with a protection period are supported. You must also configure the Maximum Price Per Instance. If the real-time market price of an instance type that you select is lower than the value of this parameter, a spot instance of this instance type is created. After the protection period (1 hour) ends, the system checks the spot price and resource availability of the instance type every 5 minutes. If the real-time market price exceeds your bid price or if the resource inventory is insufficient, the spot instance is released. For more information, see Best practices for spot instance node pools. To ensure that all nodes in a node pool use the same billing method, ACK does not allow you to change the billing method of a node pool from Pay-as-you-go or Subscription to Spot Instances or from Spot Instances to Pay-as-you-go or Subscription. Important Changing the billing method of a node pool affects only new nodes. The billing method of existing nodes in the node pool remains unchanged. If you want to change the billing method of existing nodes in the node pool, see Change the billing method of an instance from pay-as-you-go to subscription.	✓
Instance-related configurations	Select the ECS instances used by the worker node pool based on instance types or attributes. You can filter instance families by attributes such as vCPU, memory, instance family, and architecture. For more information about the instance specifications not supported by ACK and how to configure nodes, see ECS instance type recommendations. When the node pool is scaled out, ECS instances of the selected instance types are created. The scaling policy of the node pool determines which instance types are used to create new nodes during scale-out activities. Select multiple instance types to improve the success rate of node pool scale-out operations. If the node pool fails to be scaled out because the instance types are unavailable or the instances are out of stock, you can specify more instance types for the node pool. The ACK console automatically evaluates the scalability of the node pool. You can check the scalability of the node pool when you create the node pool or after you create the node pool.	✓
Operating System	Public Image: Container Service for Kubernetes provides public images of operating systems, such as Alibaba Cloud Linux 3 Container-optimized, ContainerOS, Alibaba Cloud Linux 3, Ubuntu, and Windows. For more information, see Operating system. Custom Image: Use a custom operating system image. For more information, see How do I create a custom image based on an existing ECS instance and use the image to create a node?. Note For more information about how to upgrade or change the operating system, see Change the OS.	✓
Security Hardening	Enable security hardening for the cluster. You cannot modify this parameter after the cluster is created. Disable: disables security hardening for ECS instances. MLPS Security Hardening: Alibaba Cloud provides baselines and the baseline check feature to help you check the compliance of Alibaba Cloud Linux 2 images and Alibaba Cloud Linux 3 images with the level 3 standards of MLPS 2.0. MLPS Security Hardening enhances the security of OS images to meet the requirements of GB/T 22239-2019 Information Security Technology - Baseline for Classified Protection of Cybersecurity without compromising the compatibility and performance of the OS images. For more information, see ACK security hardening based on MLPS. Important After you enable MLPS Security Hardening, remote logons through SSH are prohibited for root users. You can use Virtual Network Computing (VNC) to log on to the OS from the ECS console and create regular users that are allowed to log on through SSH. For more information, see Connect to an instance using VNC. OS Security Hardening: You can enable Alibaba Cloud OS Security Hardening only when the system image is an Alibaba Cloud Linux 2 or Alibaba Cloud Linux 3 image.	✗
Logon Method	If you select MLPS Security Hardening, only the Password option is supported. Operating System to ContainerOS, the valid values are Key Pair and Later. To use key pairs, restart the administrative container to apply the configurations. For details, see Maintain ContainerOS nodes. The available options are Set Key, Set Password, and Set After Creation. Set during creation: Key Pair: Alibaba Cloud SSH key pairs provide a secure and convenient method to log on to ECS instances. An SSH key pair consists of a public key and a private key. SSH key pairs support only Linux instances. Configure the Username (select root or ecs-user as the username) and the Key Pair parameters. Password: The password must be 8 to 30 characters in length, and can contain letters, digits, and special characters. Configure the Username (select root or ecs-user as the username) and the Password parameters. Configure After Creation: After you create the instance, attach a key pair or reset its logon password. For more information, see Attach an SSH key pair to an instance and Reset the logon password of an instance.	✓

Storage configuration

Parameter	Description	Modifiable
System Disk	ESSD AutoPL, ESSD, ESSD Entry, Standard SSD, and Ultra Disk are supported. The types of system disks that you can select vary based on the instance family that you select. Disk types that are not displayed in the drop-down list are not supported by the instance types that you select. ESSD custom performance and encryption If you select ESSD, you can set a custom performance level. You can select higher performance levels (PLs) for ESSDs with larger storage capacities. For example, you can select PL 2 for an ESSD with a storage capacity of more than 460 GiB. You can select PL 3 for an ESSD with a storage capacity of more than 1,260 GiB. For more information, see Capacity and PLs. You can select Encryption only if you set the system disk type to ESSD. By default, the Default Service CMK is used to encrypt the system disk. You can also use an existing CMK generated using Bring-Your-Own-Key (BYOK) in KMS. You can select Configure More System Disk Types to configure a disk type different from the System Disk to improve the scale-out success rate. When an instance is created, the system selects the first matching disk type based on the specified order of disk types to create the instance.	✓
Data Disk	ESSD AutoPL, ESSD, ESSD Entry, and previous-generation disks (Standard SSD and Ultra Disk) are supported. The data disk types that you can select vary based on the instance family that you select. Disk types that are not displayed in the drop-down list are not supported by the instance types that you select. ESSD AutoPL Disk Performance provision: The performance provision feature lets you configure provisioned performance settings for ESSD AutoPL disks to meet storage requirements that exceed the baseline performance without the need to extend the disks. Performance burst: The performance burst feature allows ESSD AutoPL disks to burst their performance when spikes in read/write workloads occur and reduce the performance to the baseline level at the end of workload spikes. ESSD support Configure a custom performance level. You can select higher PLs for ESSDs with larger storage capacities. For example, you can select PL 2 for an ESSD with a storage capacity of more than 460 GiB. You can select PL 3 for an ESSD with a storage capacity of more than 1,260 GiB. For more information, see Capacity and PLs. You can select Encryption for all disk types when you specify the type of data disk. By default, the Default Service CMK is used to encrypt the data disk. You can also use an existing CMK generated using BYOK in KMS. You can also use snapshots to create data disks in scenarios where container image acceleration and fast loading of large language models (LLMs) are required. This improves the system response speed and enhances the processing capability. During node creation, the last data disk will be automatically formatted. The system will mount `/var/lib/container` to this disk, while mounting `/var/lib/kubelet` and `/var/lib/containerd` to `/var/lib/container`. To custom mount points, modify the initialization configuration of the data disk. Only one data disk can be selected as the container runtime directory. For usage instructions, see Can I mount a data disk to a custom directory in an ACK node pool? Note Up to 64 data disks can be attached to an ECS instance. The number of disks that can be attached to an ECS instance varies based on the instance type. To query the maximum number of data disks supported by each instance type, call the DescribeInstanceTypesoperation and query the DiskQuantity parameter in the response. You can select Configure More Data Disk Types to configure a disk type different from the Data Disk to improve the scale-out success rate. When an instance is created, the system selects the first matching disk type based on the specified order of disk types to create the instance.	✓

Instance quantity

Parameter	Description	Modifiable
Expected Number Of Nodes	The expected number of nodes in the node pool. We recommend that you configure at least 2 nodes to ensure that cluster components run as expected. You can configure the Expected Nodes parameter to adjust the number of nodes in the node pool. For more information, see Scale a node pool. If you do not want to create nodes in the node pool, set this parameter to 0. You can manually modify this parameter to add nodes later.	✓

Advanced configuration

Expand Advanced Options (Optional) to configure the node scaling policy, resource group, ECS tags, taints, and other settings.

Advanced Configuration

Parameter	Description	Modifiable
Resource Group	Assign the node pool to the selected resource group for easier permission management and cost allocation. A resource can belong to only one resource group.	✓
Scaling Mode	You must enable Auto Scaling for the node pool and set Scaling Mode to Auto. Standard Mode: Auto scaling is implemented by creating and releasing ECS instances. Swift Mode: Auto scaling is implemented by creating, stopping, and starting ECS instances. Those in the Stopped state can be directly restarted to accelerate scaling activities. When a node in swift mode is reclaimed, only disk fees are charged for the node. No computing fee is charged. This rule does not apply to instance families that use local disks, such as big data and local SSDs instance families. For more information about the billing rules and limits of the economical mode, see Economical mode.	✗
Scaling Policy	Priority: The system scales the node pool based on the priorities of the vSwitches that you select for the node pool. The ones you select are displayed in descending order of priority. If Auto Scaling fails to create ECS instances in the zone of the vSwitch with the highest priority, Auto Scaling attempts to create ECS instances in the zone of the vSwitch with the next highest priority. Cost Optimization: The system creates instances based on the vCPU unit prices in ascending order. If the Billing Method of the node pool is set to Spot Instance, such instances are preferentially created. You can also set the Percentage Of Pay-as-you-go Instances parameter. If spot instances cannot be created due to reasons such as insufficient stocks, pay-as-you-go instances are automatically created as a supplement. Distribution Balancing: The even distribution policy takes effect only when you select multiple vSwitches. This policy ensures that ECS instances are evenly distributed among the zones (the vSwitches) of the scaling group. If they are unevenly distributed due to reasons such as insufficient stocks, you can perform a rebalancing operation.	✓
Use On-Demand Instances To Supplement Preemptible Capacity	You must set the Billing Method parameter to Spot Instance. After this feature is enabled, if enough spot instances cannot be created due to price or inventory constraints, ACK automatically creates pay-as-you-go instances to meet the required number of ECS instances.	✓
Enable Supplemental Spot Instances	You must set the Billing Method parameter to Spot Instance. After this feature is enabled, when a system receives a message that spot instances will be reclaimed (5 minutes before reclamation), ACK will attempt to scale out new instances as compensation. If compensation succeeds, ACK will drain and remove the old nodes from the cluster. If compensation fails, ACK will not drain the old nodes. Active release of spot instances may cause service interruptions. After compensation failure, when inventory becomes available or price conditions are met, ACK will automatically purchase instances to maintain the expected node count. For details, see Best practices for spot instance node pools. To improve compensation success rates, we recommend enabling Use Pay-as-you-go Instances When Spot Instances Are Insufficient at the same time.	✓
ECS Tags	Add tags to the ECS instances that are automatically added during auto scaling. An ECS instance can have up to 20 tags. To increase the quota limit, submit an application in the Quota Center console. The following tags are automatically added to an ECS node by ACK and Auto Scaling. Therefore, you can add at most 17 tags to an ECS node. The following two ECS tags are added by ACK: `ack.aliyun.com:<Your cluster ID>` `ack.alibabacloud.com/nodepool-id:<Your node pool ID>` The following label is added by Auto Scaling: `acs:autoscaling:scalingGroupId:<Your node pool scaling group ID>`. Note After you enable auto scaling, the following ECS tags are added to the node pool by default: `k8s.io/cluster-autoscaler:true` and `k8s.aliyun.com:true`. The auto scaling component simulates scale-out activities based on node labels and taints. To meet this purpose, the format of node labels is changed to `k8s.io/cluster-autoscaler/node-template/label/Label key:Label value` and the format of taints is changed to `k8s.io/cluster-autoscaler/node-template/taint/Taint key/Taint value:Taint effect`.	✓
Taints	Add taints to nodes. A Taint consists of a Key, a Value, and an Effect. A taint key can be prefixed. If you want to specify a prefixed taint key, add a forward slash (/) between the prefix and the remaining content of the key. For more information, see Taints and tolerations. The following limits apply to taints: Key: A key must be 1 to 63 characters in length, and can contain letters, digits, hyphens (-), underscores (_), and periods (.). A key must start and end with a letter or digit.`[a-z0-9A-Z]` If you want to specify a prefixed key, the prefix must be a DNS subdomain name. A subdomain name consists of DNS labels that are separated by periods (.), and cannot exceed 253 characters in length. It must end with a forward slash (/). For more information about DNS subdomain names, see DNS subdomain names. Value: A value cannot exceed 63 characters in length, and can contain letters, digits, hyphens (-), underscores (_), and periods (.). A value must start and end with a letter or digit. You can also leave a value empty.`[a-z0-9A-Z]` You can specify the following Effects for a taint: NoSchedule, NoExecute, and PreferNoSchedule. NoSchedule: If a node has a taint whose Effect is NoSchedule, the system does not schedule pods to the node. NoExecute: Pods that do not tolerate this taint are evicted after this taint is added to a node. Pods that tolerate this taint are not evicted after this taint is added to a node. PreferNoSchedule: The system attempts to avoid scheduling pods to nodes with taints that are not tolerated by the pods.	✓
Labels	Add labels to nodes. A label is a key-value pair. A label key can be prefixed. If you want to specify a prefixed label key, add a forward slash (/) between the prefix and the remaining content of the key. The following limits apply to labels: Key: The name must be 1 to 63 characters in length, and can contain letters, digits, hyphens (-), underscores (_), and periods (.). It must start and end with a letter or a digit.`[a-z0-9A-Z]` If you want to specify a prefixed label key, the prefix must be a subdomain name. A subdomain name consists of DNS labels that are separated by periods (.), and cannot exceed 253 characters in length. It must end with a forward slash (/). The following prefixes are used by key Kubernetes components and cannot be used in node labels: `kubernetes.io/` `k8s.io/` Prefixes that end with `kubernetes.io/` or `k8s.io/`. Example: `test.kubernetes.io/`. The following are the exceptions: `kubelet.kubernetes.io/` `node.kubernetes.io` Prefixes that end with `kubelet.kubernetes.io/`. Prefixes that end with `node.kubernetes.io`. Value: A value cannot exceed 63 characters in length, and can contain letters, digits, hyphens (-), underscores (_), and periods (.). A value must start and end with a letter or digit. You can also leave a value empty.`[a-z0-9A-Z]`	✓
Set To Unschedulable	After you select this option, new nodes added to the cluster are set to unschedulable. You can change the status in the node list. This setting takes effect only on nodes newly added to the node pool. It does not take effect on existing nodes.	✓
Container Image Acceleration	Only clusters that use containerd versions 1.6.34 and later support this configuration. After you select this option, new nodes automatically detect whether container images support on-demand loading. If supported, container startups will be accelerated using on-demand loading. For detailed instructions, see Use on-demand image loading to accelerate container startup.	✓
(Deprecated) CPU Policy	The CPU management policy for kubelet nodes. None: The default policy. Static: This policy allows pods with specific resource characteristics on the node to be granted enhanced CPU affinity and exclusivity. We recommend that you use Custom Node Pool Kubelet Configuration.	✗
Custom Node Name	Specify whether to enable Custom Node Name. If you enable this feature, the node name, ECS instance name, and ECS instance hostname are changed at the same time. Note For a Windows instance with a custom node name, its hostname is fixed to its IP address. Hyphens (`-`) are used to replace periods (`.`) in the IP address, and no prefix or suffix is included. A node name consists of a prefix, the node IP address, and a suffix: A custom node name must be 2 to 64 characters in length. The name must start and end with a lowercase letter or digit. The prefix and suffix can contain letters, digits, hyphens (-), and periods (.). The prefix and suffix must start with a letter and cannot end with a hyphen (-) or period (.). The prefix and suffix cannot contain consecutive hyphens (-) or periods (.). The prefix is required due to ECS limits and the suffix is optional. For example, if the node IP address is 192.XX.YY.55, the prefix is aliyun.com, and the suffix is test. If the node is a Linux node, the node name, ECS instance name, and ECS instance hostname are all aliyun.com192.XX.YY.55test. If the node is a Windows node, the ECS instance hostname is 192-XX-YY-55, and the node name and ECS instance name are aliyun.com192.XX.YY.55test.	✓
Worker RAM Role	This feature is supported only for ACK managed clusters You can assign a worker RAM role to a node pool to reduce the potential risk of sharing a worker RAM role among all nodes in the cluster. Default Role: The node pool uses the default worker RAM role created by the cluster. Custom: The node pool uses the specified role as the worker RAM role. The default role is used when this parameter is left empty. For more information, see Use custom worker RAM roles.	✗
Instance Metadata Access Mode	This is a whitelist feature. To use it, submit a ticket. This feature is supported only for ACK managed clusters Configure the metadata access mode for ECS instances. You can access the metadata service from within an ECS instance to obtain instance metadata, such as the instance ID, VPC information, and NIC information. For more information, see Metadata access modes. Normal And Enforced Modes: You can use the normal mode and the enforced mode to access the instance metadata service. Enforced Mode Only: You can use only the enforced mode to access the instance metadata service. For more information, see Use the enforced mode to access ECS instance metadata.	✗
Pre-custom User Data	To use this feature, submit an application in the Quota Center console. Nodes automatically run predefined scripts before they are added to the cluster. For more information about user-data scripts, see User-data scripts. For example, if you enter `echo "hello world"`, a node runs the following script: `#!/bin/bash echo "hello world" [Node initialization script]`	✓
Instance User Data	Nodes automatically run user-data scripts after they are added to the cluster. For more information about user-data scripts, see User-data scripts. For example, if you enter `echo "hello world"`, a node runs the following script: `#!/bin/bash [Node initialization script] echo "hello world"` Note After you create a cluster or add nodes, the execution of the user-data script on a node may fail. We recommend that you log on to a node and run the `grep cloud-init /var/log/messages` command to view the execution log and check whether the execution succeeds or fails on the node.	✓
CloudMonitor Agent	After you install CloudMonitor, you can view the monitoring information about the nodes in the CloudMonitor console. This parameter takes effect only on newly added nodes and does not take effect on existing nodes. If you want to install the CloudMonitor agent on an existing ECS node, go to the CloudMonitor console.	✓
Public IP	Specify whether to assign an IPv4 address to each node. If you clear the check box, no public IP address is allocated. If you select the check box, you must configure the Bandwidth Billing Method and Peak Bandwidth parameters. This parameter takes effect only on newly added nodes and does not take effect on existing nodes. If you want to enable an existing node to access the Internet, you must create an EIP and associate the EIP with the node. For more information, see Associate an EIP with an ECS instance.	✓
Custom Security Group	You can select Basic Security Group or Advanced Security Group, but you can select only one security group type. You cannot modify the security groups of node pools or change the type of security group. For more information about security groups, see Overview. Important Each ECS instance supports up to five security groups. Make sure that the quota is sufficient. For more information about security group limits and how to increase the quota limit of security groups for your ECS instance, see Security groups. If you select an existing security group, the system does not automatically configure security group rules. This may cause errors when you access the nodes in the cluster. You must manually configure security group rules. For more information about how to manage security group rules, see Configure security group rules to enforce access control on ACK clusters.	✗
RDS Whitelist	Add node IP addresses to the whitelist of an RDS instance.	✓
Deployment Set	You must first create a deployment set in the ECS console, then specify the deployment set when you create a node pool in the ACK console. You can use a deployment set to distribute your ECS instances to different physical servers to ensure high service availability and implement underlying disaster recovery. If you specify a deployment set when you create ECS instances, the instances are created and distributed based on the deployment strategy that you preset for the deployment set within the specified region. For more information, see Best practices for associating deployment sets with node pools. Important After you select a deployment set, the maximum number of nodes that can be created in the node pool is limited. By default, the maximum number of nodes supported by a deployment set is `20 × Number of zones`. The number of zones depends on the number of vSwitches. Exercise caution when you select the deployment set. To avoid node creation failures, make sure that the ECS quota of the deployment set that you select is sufficient.	✓
Private Pool Type	The private pool resources that can be used for the currently selected zone and instance type. The types include the following: Open: The system automatically matches an open private pool. If no matching is found, resources in the public pool are used. Do Not Use: No private pool is used. Only resources in the public pool are used. Specified: You need to further select a private pool ID to specify that the instance uses only the capacity of that private pool to start. If the specified private pool is unavailable, the instance fails to start.	✓

Click Confirm Configuration.
On the Confirm Configuration page, you can click Equivalent Code in the lower-left corner to generate the equivalent Terraform or SDK code for the current node pool configuration.
In the node pool list, a Status of Initializing indicates that the node pool is being created. A status of Active indicates that the node pool is successfully created.

Edit a node pool

After you create a node pool, you can adjust some of its settings in the ACK console, such as the vSwitches, billing method, instance types, and system disk. You can also enable or disable elastic scaling. For more information about which settings can be updated, see the parameter descriptions in Create a node pool.

Editing a node pool does not affect existing nodes or services in the node pool. After the node pool configuration is updated, all new nodes added to the node pool use the new configuration by default.

Important

Updates to a node pool configuration apply only to new nodes. Existing nodes remain unaffected, except in scenarios such as Sync ECS Tags To Existing Nodes and Sync Labels And Taints To Existing Nodes.
When you change the Scaling Mode:
- Changing the mode from Manual to Auto enables auto scaling. You must also set the minimum and maximum number of instances.
- From Auto to Manual: This action disables auto scaling. The minimum and maximum number of instances are set to 0 and 2000, respectively. The Expected Number Of Nodes is automatically set to the current number of nodes in the node pool.
Update the node pool configuration by following the steps in this section. If you modify nodes using other methods, the changes are overwritten when the node pool is upgraded.

Find the target node pool in the list of node pools and click Edit in the Actions column.
On the edit node pool page, modify the parameters of the node pool as prompted.
On the node pool page, the Status of a node pool is Updating while it is being modified. After the modification is complete, the Status changes to Active.

View a node pool

You can view the basic information, monitoring data, node information, and scaling activity records of a node pool.

Click the name of the target node pool to view the following information.

Basic Information tab: displays cluster information, node pool information, and node configuration information. If auto scaling is enabled for the cluster, the auto scaling configuration is also displayed.
Monitoring tab: integrates with Alibaba Cloud Prometheus Service to display the resource usage of the node pool, such as CPU usage, memory usage, disk usage, and the average CPU and memory usage of nodes.
Node Management tab: lists all nodes in the current node pool. On this tab, you can remove, drain, schedule, and perform O&M on nodes. Click Export to export the node information to a CSV file.
Scaling Activities tab: displays recent scaling activity records for node instances, including the number of instances after scaling and a description of the scaling activity. If a scaling activity fails, you can view the reason for the failure. For more information about common error codes for scaling failures, see Manually scale a node pool.

Delete a node pool

The rules for releasing instances vary depending on their billing method. Follow the standard procedure below to delete nodes from a node pool. Before you delete a node pool, check whether Expected Number Of Nodes is set for the node pool, because this directly affects the node release behavior.

Node pool	Release rule
Node pool with Expected Number of Nodes enabled	Pay-as-you-go nodes are released when the node pool is deleted. Subscription nodes are not automatically released when the node pool is deleted. To release subscription nodes, see Change the billing method of an instance from subscription to pay-as-you-go to convert the subscription nodes to pay-as-you-go nodes, and then log on to the ECS console to release them. Deleting a node pool removes all nodes within it from the API Server.
Node pool with Expected Number of Nodes disabled	Existing nodes that were manually or automatically added to the node pool and subscription nodes are not released. Other nodes are released. To release subscription nodes, see Change the billing method of an instance from subscription to pay-as-you-go to convert the subscription nodes to pay-as-you-go nodes, and then log on to the ECS console to release them. Released nodes in the node pool are removed from the API Server. Unreleased nodes are not removed from the API Server.

(Optional) Click the name of the target node pool and on the Basic Information tab, check whether Expected Number Of Nodes is configured. If this feature is not enabled, the Expected Number Of Nodes field displays a hyphen (-).
In the Actions column of the target node pool, click > Delete. In the dialog box that appears, read the confirmation message and click OK.
Important
The lifecycles of the system disk and data disks are bound to the node. When a node is released, its disks are also released. All data on the disks is permanently lost and cannot be recovered. To ensure data persistence, use PersistentVolumes (PVs) to manage storage. This practice decouples storage data from the node lifecycle and ensures data security.

Related operations

After the node pool is active, you can perform the following operations on the node pool list page as needed.

UI Element	Description	References
Sync Node Pool	Synchronize the data of the node pool when node information is abnormal.	None
Details	View the configuration details of the node pool.	None
Scale	Manual and Auto scalings are supported. Computing resources are automatically adjusted as needed and policies to reduce cluster costs. Manual: ACK adjusts the number of nodes in the node pool based on the value of the Expected Nodes parameter. The number of nodes is always the same as the value of the Expected Nodes parameter. For more information, see Manually scale a node pool. Auto: When the capacity planning of the cluster cannot meet the requirements of pod scheduling, ACK automatically scales out nodes based on the configured minimum and maximum number of instances. By default, node instant scaling is enabled for clusters running Kubernetes 1.24 and later, and node autoscaling is enabled for clusters running Kubernetes versions earlier than 1.24. For more information, see Node scaling.	Manually scale a node pool Node scaling
Edit	Adjust the configuration of the node pool, such as the vSwitches, managed node pool configuration, billing method, instance types, and enabling or disabling elastic scaling.	See Edit a node pool in this topic.
Monitoring	Integrates with Alibaba Cloud Prometheus Service to display the resource usage of the node pool, including CPU or memory usage, disk usage, and average CPU or memory usage of nodes.	See View a node pool in this topic.
Add Existing Nodes	To add an ECS instance to an ACK cluster as a worker node after purchase, or rejoin a worker node to a node pool after removal, use the add existing nodes feature. This feature has some limits and notes. For more information, see the document.	Add existing nodes
Configure Logon Method	Set the logon method for nodes. Both key pair and password methods are supported.	See Instance and image configuration in this topic.
Managed Configuration	Enable automated O&M for the node pool, including automatic node failure recovery, automatic upgrades for kubelet and runtime, and automatic OS CVE fixes.	See Basic configuration in this topic.
Clone	Clone a node pool with the same configuration based on an existing node pool.	None
Delete	Delete a node pool that is no longer in use to reduce unnecessary resource waste. Whether the node pool has the expected number of nodes enabled and the billing mode of the nodes affect the node release behavior.	See Delete a node pool in this topic.
Kubelet Configuration	Customize the kubelet parameter settings for nodes at the node pool level to adjust node behavior, such as reserving resources for the entire cluster to allocate resource usage.	Customize kubelet configurations for a node pool
Containerd Configuration	Customize the containerd parameter settings for nodes at the node pool level. For example, you can configure multiple mirror repositories for a specified image repository at the same time, or specify to skip the security certificate verification for a certain image repository.	Customize containerd configurations for node pools
OS Configuration	Customize the OS parameter settings for nodes at the node pool level to tune system performance.	Manage node pool OS parameters
Kubelet Upgrade	Upgrade the kubelet and containerd versions of the nodes in the node pool.	Update a node pool
Change Operating System	Change the node operating system type or upgrade the operating system version.	Change the operating system
Fix CVE (OS)	Perform batch fixes for CVE vulnerabilities to improve cluster stability, security, and compliance. Some CVE vulnerability fixes require restarting nodes. For more information about the feature and its notes, see the document.	Fix OS CVEs for a node pool
Node Recovery	When a node in a managed node pool becomes abnormal, ACK automatically initiates a recovery operation for the faulty node to keep it running normally. Some complex node failures may still require manual repair. For more information about the checks provided by ACK and the specific recovery behaviors, see the document.	Enable node auto repair

Comparison of managed configuration capabilities

Managed Configuration		Disabled	Managed Node Pool	Intelligent Hosting
Node pool configuration	Instance type	Manual configuration	Manual configuration	Configurable. Supports intelligent recommendations by instance type.
	Billing method	Manual configuration	Manual configuration	Only pay-as-you-go is supported.
	Operating system	Manual configuration	Manual configuration	Only the container-optimized OS ContainerOS is supported.
	System disk	Manual configuration	Manual configuration	Default recommended configuration, 20 GiB
	Data disk	Manual configuration	Manual configuration	One data disk is used for temporary storage of the ContainerOS operating system. The size is configurable.
Auto scaling		Optional to enable, manual configuration	Optional to enable, manual configuration	Built-in instant elastic scaling is enabled, can be manually configured
Automated O&M capabilities	Automatic response to ECS system events	Not supported	Enabled by default	Enabled by default
	Node auto-repair	Not supported	Optional to enable, manual configuration	Enabled by default
	Automatic upgrade of kubelet and runtime versions	Not supported	Optional to enable, manual configuration	Enabled by default
	Automatic OS CVE vulnerability repair	Not supported	Optional to enable, manual configuration	Enabled by default

Important

After you enable intelligent hosting for a node pool, the node pool dynamically scales nodes based on your workload requirements. By default, a node pool can be scaled out to a maximum of 50 nodes. You can modify the maximum number of instances using the scaling feature of the node pool.
After you enable intelligent hosting for a node pool, ACK takes over O&M responsibilities, such as operating system (OS) upgrades, software upgrades, and vulnerability patching.
- These responsibilities include tasks such as software version upgrades, software configuration modifications, restarts, and drain evictions.
- To prevent conflicts with automation policies, avoid performing manual operations on the ECS nodes within the node pool, such as restarting nodes, attaching data disks, or modifying configurations by logging on to the nodes.
- Set reasonable replica counts for your workloads, implement PreStop graceful shutdown policies, and establish PodDisruptionBudget policies to ensure that nodes can be drained for maintenance without interrupting your business.
After you enable intelligent hosting for a node pool, ACK enhances node security based on the ContainerOS operating system, which uses an immutable root file system. Avoid using storage on the node's system path, such as HostPath. We recommend that you use PVCs for persistent storage.
After you enable intelligent hosting for a node pool, instance types such as Arm, GPU, and local disk are not supported. ACK recommends default instance types that can meet application needs in most scenarios. You can also adjust the instance types in the console based on your business scenarios. We recommend that you specify a sufficient number of instance types to improve the scaling flexibility of the node pool and prevent scaling failures.
Intelligent hosting aims to provide automated and intelligent Kubernetes cluster O&M features. However, in certain scenarios, you still need to fulfill some obligations. For more information, see Shared responsibility model.

FAQ

How do I create a custom image from an ECS instance and use the image to create a node?

After you create an ECS instance, you can customize it by installing software and deploying application environments. Then, you can create a custom image from the instance. Instances created from this custom image contain all your customizations, which eliminates the need for repeated configuration.

Log on to the ECS instance and run the following command to delete the specified files. For more information about how to log on to an instance, see Log on to a Linux instance using Workbench.

chattr -i /etc/acknode/nodeconfig-*
rm -rf /etc/acknode
systemctl disable ack-reconfig
rm -rf /etc/systemd/system/ack-reconfig.service
rm -rf /usr/local/bin/reconfig.sh
rm -rf /var/lib/cloud
systemctl stop kubelet
systemctl disable kubelet
rm -rf /etc/systemd/system/kubelet.service
rm -rf /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Create a custom image from the ECS instance. For more information about the notes and procedure, see Create a custom image from an instance.
Configure the node pool by setting the operating system type to Custom Image and completing the creation process as described earlier in this topic.

Important

Create a custom image based on the operating system supported by the ACK cluster. For more information, see Operating system.
Do not build custom images on running ECS instances in an ACK cluster. To do this, you must first remove the ECS instances from the cluster. For more information, see Remove a node.
The predefined behavior logic in a custom image may affect operations such as cluster node initialization, container launching, node OS upgrades, and automatic recovery of nodes in a managed node pool. Before you use it in a production environment, ensure that the custom image has been tested and validated.

References

If a node is no longer needed, remove it by following the standard procedure. For more information, see Remove a node.
ACK reserves a certain amount of node resources for kube components and system processes. For more information, see Node resource reservation policy.
If the planned capacity of the cluster cannot meet the scheduling requirements of application pods, enable node scaling to automatically scale node resources. For more information, see Node scaling.
The maximum number of pods that a single worker node can support is affected by the network plug-in type and cannot be changed in most scenarios. To increase the number of available pods, you can scale out the node pool, upgrade the instance specifications, or recreate the cluster and re-plan the pod CIDR block. For more information, see Adjust the number of available pods on a node.