Create ACK Edge Node Pools with Custom Images and Auto Scaling - Container Service for Kubernetes

A node pool is a group of nodes that share the same configurations: instance type, operating system, labels, and taints. A cluster can have multiple node pools with different configurations. Creating or modifying one node pool does not affect nodes or workloads in other node pools.

Before creating a node pool, read Node pools to understand the available types, features, and billing rules.

Prerequisites

Before you begin, make sure you have:

An ACK cluster
Access to the ACK console

Create a node pool

You can create a node pool from the ACK console, via the API, or with Terraform. The console is the most common starting point; for API and Terraform, see CreateClusterNodePool and Use Terraform to create a node pool that has auto scaling enabled.

Some parameters — particularly those related to network and security — cannot be changed after creation. Review the Modifiable column in each parameter table before proceeding.

In the parameter tables below:

— cannot be modified after creation
— can be modified after creation

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, choose Nodes > Node Pools.
On the Node Pools page, click Create Node Pool. In the dialog box, configure the parameters described in the sections below.
Click Confirm. To generate Terraform or SDK sample code that matches your configuration, click Generate API Request Parameters in the top-left corner before confirming. After confirmation, the node pool list shows:
- Initializing — creation in progress
- Active — creation successful

Basic configurations

Parameter	Description	Modifiable
Node Pool Name	A name for the node pool.
Confidential computing	Encrypts data in use to protect confidentiality and integrity. Requires whitelist approval — submit a ticket to apply. Available only with the containerd runtime. For details, see TEE-based confidential computing.
Container runtime	The runtime for containers in the node pool. containerd is recommended for all Kubernetes versions. Sandboxed-Container supports Kubernetes 1.31 and earlier. Docker is deprecated and supports Kubernetes 1.22 and earlier. For a comparison, see Comparison among Docker, containerd, and Sandboxed-Container.
Scaling mode	Manual: ACK maintains the node count at the Expected Nodes value. Auto: ACK scales nodes automatically when pod scheduling capacity is insufficient, based on configured minimum and maximum instance counts. Clusters running Kubernetes 1.24 and later use node instant scaling by default; earlier versions use node auto scaling.

Automated O&M configurations

Select one of the following options to set the level of automated O&M for the node pool. For a full comparison, see Comparison of managed node pool configurations.

Auto mode: ACK takes full O&M responsibility — OS upgrades, software upgrades, vulnerability patching, and dynamic scaling based on workload demand. Available only for clusters with auto mode enabled.

Managed node pool: Configure the following automated O&M parameters. You can also set a maintenance window for scheduled tasks.

Click to view parameters

Parameter	Description	Modifiable
Auto recovery rule	When enabled, ACK monitors node status and automatically repairs faulty nodes. If Restart Faulty Node is selected, ACK may drain the node and replace the system disk. For trigger conditions and repair events, see Enable auto repair for nodes.
Auto update rule	When Automatically Update Kubelet and Containerd is selected, ACK updates kubelet and containerd whenever a new version is available. For details, see Update a node pool.
Auto CVE patching (OS)	Configures automatic patching for high-, medium-, and low-risk CVE vulnerabilities. If Restart Nodes if Necessary to Patch CVE Vulnerabilities is enabled, ACK restarts nodes as needed; otherwise, restart manually. For details, see Patch OS CVE vulnerabilities for node pools.
Maintenance window	Image, runtime, and Kubernetes version updates run during this window. Click Set, then configure Cycle, Started At, and Duration.

Disable: No automated O&M. All node maintenance must be performed manually.

Network configurations

Parameter	Description	Modifiable
VPC	The virtual private cloud (VPC) of the cluster. Cannot be changed.
vSwitch	New nodes are created in the zones of the selected vSwitches during scale-out. Select vSwitches in your target zones. If none are available, click Create vSwitch. For details, see Create and manage vSwitches.

Instance and image

Parameter	Description	Modifiable
Billing method	Pay-As-You-Go, Subscription, or Preemptible Instance. Subscription requires a Duration and optionally Auto Renewal. Preemptible instances have a 1-hour protection period; afterward, the system checks spot price and availability every 5 minutes and releases the instance if the market price exceeds your bid or inventory is insufficient. ACK does not allow switching between pay-as-you-go/subscription and preemptible instances. Billing method changes apply only to newly added nodes. For preemptible instance best practices, see Best practices for preemptible instance-based node pools.
Instance type	Select ECS instances for the node pool, filtering by vCPU, memory, instance family, and architecture. Select multiple instance types to improve scale-out reliability. For GPU-accelerated instances, you can enable GPU sharing. For unsupported specifications, see ECS specification recommendations for ACK clusters.
Operating system	Public image: ACK-provided images including Alibaba Cloud Linux 3 ACK-optimized, ContainerOS, Alibaba Cloud Linux 3, Ubuntu, and Windows. Custom image: An image you create. For details, see OS images. OS changes apply only to newly added nodes.
Security hardening	Disable: no hardening. MLPS Security Hardening: aligns with Multi-Level Protection Scheme (MLPS) 2.0 level-3 standards for Alibaba Cloud Linux 2/3 images; SSH root login is blocked — use Virtual Network Computing (VNC) to log in. OS Security Hardening: available for Alibaba Cloud Linux 2/3 images. Cannot be changed after creation.
Logon type	Key Pair, Password, or Later. MLPS Security Hardening supports Password only. ContainerOS supports Key Pair and Later only. Password must be 8–30 characters, containing letters, digits, and special characters.
Username	Select root or ecs-user when using Key Pair or Password logon.

Storage configurations

Parameter	Description	Modifiable
System disk	Supported types: ESSD AutoPL, Enterprise SSD (ESSD), ESSD Entry, Standard SSD, Ultra Disk. Available types depend on the instance family. For ESSD, you can set a custom performance level (PL): PL 2 requires more than 460 GiB; PL 3 requires more than 1,260 GiB. Encryption is available for Enterprise SSD (ESSD) using the default service CMK or a BYOK key from KMS. To improve creation success rate, add fallback disk types under More System Disk Types.
Data disk	Supported types: ESSD AutoPL, Enterprise SSD (ESSD), ESSD Entry, SSD, Ultra Disk. Up to 64 data disks per ECS instance (varies by instance type; query via DescribeInstanceTypes → DiskQuantity). Mount the data disk to `/var/lib/container`; ACK mounts `/var/lib/kubelet` and `/var/lib/containerd` under `/var/lib/container`. You can also use snapshots to create data disks for container image acceleration or fast LLM loading.

Instance quantity

Parameter	Description	Modifiable
Expected number of nodes	The target node count. Set to at least 2 for cluster components to run as expected. Set to 0 to create an empty node pool and add nodes later.

Advanced configurations

Click Advanced Options (Optional) to configure the following parameters.

Parameter	Description	Modifiable
Resource group	The resource group to which the node pool belongs. Each resource can belong to only one resource group.
Scaling mode (advanced)	Requires Auto scaling mode. Standard mode: scales by creating and releasing ECS instances. Swift mode: scales by creating, stopping, and starting ECS instances — stopped nodes incur disk fees only (no compute fees). Swift mode does not apply to local disk instance families such as big data and local SSD.
Scaling policy	Priority: scales using vSwitches in the order listed (highest priority first). Cost Optimization: creates instances in ascending vCPU unit price order; preemptible instances are preferred. Distribution Balancing: distributes instances evenly across zones (requires multiple vSwitches).
Use pay-as-you-go instances when preemptible instances are insufficient	Requires Preemptible Instance billing. When preemptible instances are reclaimed, the node pool creates pay-as-you-go instances as replacements.
Enable supplemental preemptible instances	Requires Preemptible Instance billing. When preemptible instances are reclaimed, the node pool attempts to create replacement preemptible instances.
ECS tags	Tags added to ECS instances during auto scaling. ACK and Auto Scaling automatically add 3 tags (`ack.aliyun.com:<Cluster ID>`, `ack.alibabacloud.com/nodepool-id:<Node pool ID>`, `acs:autoscaling:scalingGroupId:<Scaling group ID>`), leaving room for at most 17 custom tags per instance.
Taints	Taints control pod scheduling. Set taints at the node pool level rather than on individual nodes — this way, you manage all nodes by updating the node pool once instead of updating each node individually. A taint has a key, value, and effect. Key: 1–63 characters (letters, digits, `-`, `_`, `.`); must start and end with a letter or digit. Value: up to 63 characters; same character set; can be left blank. Effect: `NoSchedule` (prevents scheduling), `NoExecute` (evicts existing pods without toleration), or `PreferNoSchedule` (prefers to avoid scheduling).
Node labels	Labels are key-value pairs. Set labels at the node pool level rather than on individual nodes — this simplifies management by letting you update all nodes through a single node pool change. Key: 1–63 characters; same rules as taint keys. The following prefixes are reserved and cannot be used: `kubernetes.io/`, `k8s.io/`, and any prefix ending in these. Usable exceptions: `kubelet.kubernetes.io/` and `node.kubernetes.io`.
Container image acceleration	Nodes automatically detect whether images support on-demand loading and accelerate container startup accordingly. Requires containerd version 1.6.34 or later.
(Deprecated) CPU policy	The CPU management policy for kubelet. None (default) or Static (enhanced CPU affinity for specific pods). Instead of using this field, customize kubelet parameters directly — see Customize the kubelet parameters of a node pool.
Custom node name	Changes the node name, ECS instance name, and ECS instance hostname. A custom node name consists of a prefix (required), IP substring, and suffix (optional). Length: 2–64 characters; must start and end with a lowercase letter or digit.
Worker RAM role	Assigns a Resource Access Management (RAM) role to the node pool. Use Custom to assign a dedicated role and reduce the risk of sharing one RAM role across all cluster nodes. Requires ACK managed clusters running Kubernetes 1.22 or later. For details, see Use custom worker RAM roles.
Pre-defined custom data	Scripts that run before nodes join the cluster. Requires whitelist approval in the Quota Center console. For details, see User-data scripts.
User data	Scripts that run after nodes join the cluster. To check execution status, log on to a node and run `grep cloud-init /var/log/messages`. For details, see User-data scripts.
CloudMonitor agent	Installs the CloudMonitor agent on new nodes for monitoring in the CloudMonitor console. Applies to newly added nodes only.
Public IP	Assigns a public IPv4 address to each new node. If enabled, configure Bandwidth Billing Method and Peak Bandwidth. Applies to newly added nodes only. To enable internet access for existing nodes, associate an EIP — see Associate an EIP with an ECS instance.
Custom security group	Select Basic Security Group or Advanced Security Group. The type cannot be changed after creation. Each ECS instance supports up to 5 security groups. If you select an existing security group, configure security group rules manually — see Configure security group rules to enforce access control on ACK clusters.
RDS whitelist	Adds node IP addresses to the whitelist of an ApsaraDB RDS instance.
Deployment set	Distributes ECS instances across different physical servers for high availability. Create the deployment set in the ECS console first, then select it here. The maximum node count per pool is `20 × number of zones` (zones = number of vSwitches). Cannot be changed after creation. For details, see Best practices for associating deployment sets with node pools.
Private pool type	Controls whether to use an ECS capacity reservation. Open: automatically matches an open private pool; falls back to the public pool if no match. Do Not Use: uses only the public pool. Specified: uses the specified private pool; fails if unavailable. For details, see Private pools.

Modify a node pool

After a node pool is created, edit it from Node Pools > Edit in the Actions column.

Key behaviors to know before making changes:

Most configuration changes apply only to newly added nodes. Exceptions: ECS tags, labels, and taints also propagate to existing nodes.
Modifying a node pool does not affect other node pools or their workloads.
When modifying node pool configurations, any changes you have made directly to individual nodes may be overwritten.
Changing Scaling mode:
- Manual → Auto: enables auto scaling; configure the minimum and maximum instance counts.
- Auto → Manual: disables auto scaling; minimum is set to 0, maximum to 2,000; Expected Nodes is set to the current node count.

During modification, the Status column shows Updating. After completion, it shows Activated.

View a node pool

Click the name of a node pool to open its details page, with the following tabs:

Overview: cluster info, node pool configuration, and node settings. If auto scaling is enabled, auto scaling configurations are also shown.
Monitor: node resource metrics from Managed Service for Prometheus — CPU usage, memory usage, disk usage, and average per-node utilization.
Nodes: the node list. Drain nodes, configure scheduling, perform O&M, or remove nodes. Click Export to download node details as a CSV file.
Scaling activities: scaling event history, instance counts after each event, and failure reasons. For common error codes, see Manually scale a node pool.

Delete a node pool

Node release behavior depends on the billing method and whether Expected Nodes is configured.

Node pool type	Pay-as-you-go nodes	Subscription nodes
Expected Nodes configured	Released when the pool is deleted; all nodes removed from the API server	Retained after deletion; removed from the API server
Expected Nodes not configured	Non-manually-added, non-subscription nodes are released; released nodes are removed from the API server	Not released; not removed from the API server

To release a subscription node: change its billing method to pay-as-you-go first (see Change the billing method from subscription to pay-as-you-go), then release it from the ECS console.

(Optional) Click the node pool name. On the Overview tab, check whether Expected Nodes is configured. A hyphen (–) indicates it is not.
Find the node pool and choose > Delete in the Actions column. Confirm the information and click OK.

What's next

After creating a node pool, the most common follow-up tasks are:

Scale the node pool: adjust capacity manually or configure auto scaling. See Manually scale a node pool and Node scaling.
Monitor node health: view CPU, memory, and disk metrics under the Monitor tab.
Enable managed O&M: configure auto repair, auto update, and CVE patching. See Enable auto repair for nodes.
Remove a node: when a node is no longer needed, see Remove a node.

For additional operations — including cloning a node pool, customizing kubelet or containerd configurations, and changing the OS — refer to the ACK documentation.

Comparison of managed node pool configurations

Feature	Disabled	Managed node pool	Auto mode
Instance type	Manual	Manual	Manual; ACK provides intelligent recommendations
Billing method	Manual	Manual	Pay-as-you-go only
OS	Manual	Manual	ContainerOS only
System disk	Manual	Manual	20 GiB (auto-applied)
Data disk	Manual	Manual	Configurable (temporary ContainerOS storage)
Auto scaling	Optional	Optional	Node instant scaling enabled by default
Automated O&M	Responds to ECS system events	Not supported	Enabled by default
Node auto repair	Not supported	Optional	Enabled by default
Automatic kubelet and runtime upgrade	Not supported	Optional	Enabled by default
OS CVE auto repair	Not supported	Optional	Enabled by default

Important

Auto mode has specific operational constraints:

Default maximum capacity is 50 nodes. Increase the limit using the node pool scaling feature.
ACK manages OS upgrades, software upgrades, vulnerability patching, restarts, and drain evictions. Avoid manual operations on ECS nodes in the pool — such as restarting, mounting data disks, or modifying configurations by logging in — to prevent conflicts. Set appropriate workload replica counts, PreStop graceful shutdown strategies, and PodDisruptionBudget policies to protect your workloads during node maintenance.
ContainerOS uses an immutable root file system. Use PVC for persistent storage instead of HostPath.
ARM, GPU, and on-premises disk instance types are not supported. Configure enough instance types to improve scaling resilience.
For shared responsibilities in auto mode, see Shared responsibility model.

FAQ

How do I create a custom image and use it for a node pool?

Create a custom image from an ECS instance after installing the software and dependencies you need. Instances created from the image inherit all customizations.

Important

Before creating the image:

Base the image on an ACK-supported OS. See OS images.
Do not use a running ECS instance in an ACK cluster as the source. Remove it from the cluster first — see Remove a node.
Custom image behavior may interfere with node initialization, container startup, node updates, and auto repair. Test the image in a non-production environment before deploying it.

Log on to the ECS instance and run the following commands to clear ACK configuration files. For how to connect, see Use Workbench to connect to a Linux instance over SSH.

chattr -i /etc/acknode/nodeconfig-*
rm -rf /etc/acknode
systemctl disable ack-reconfig
rm -rf /etc/systemd/system/ack-reconfig.service
rm -rf /usr/local/bin/reconfig.sh
rm -rf /var/lib/cloud
systemctl stop kubelet
systemctl disable kubelet
rm -rf /etc/systemd/system/kubelet.service
rm -rf /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Create a custom image from the instance. For the procedure and usage notes, see Create a custom image from an instance.
Create a node pool. For Operating System, select Custom Image and choose the image you created. Configure all other parameters as described in this topic.

References

Node resources are reserved for Kubernetes components and system processes. See Resource reservation policy.
When cluster capacity cannot meet pod scheduling requirements, enable node scaling. See Node scaling.
To increase the maximum number of pods, scale out node pools, upgrade instance specifications, or reset the pod CIDR block. See Increase the maximum number of pods in a cluster.
If a node is no longer needed, see Remove a node.