introduction to ack cluster node initialization process - Container Service for Kubernetes

When you create nodes or scale out a node pool, ACK runs a standardized initialization process that installs and configures the required software based on the node pool settings, then joins the node to the Kubernetes cluster.

Scope

This topic applies to ECS node pools and EGS node pools in ACK managed clusters and ACK dedicated clusters running cluster version 1.20 or later.

How it works

Initialization runs in four sequential phases:

Create node — ACK provisions the ECS or EGS instance and the OS completes basic setup. Cloud-init then runs the User Data script, which consists of three parts: the Pre-defined Custom Data script, ACK initialization scripts, and the User Data script.
Run pre-customization scripts — Cloud-init runs the Pre-defined Custom Data script to install dependencies, set up monitoring agents, or perform environment checks.
Run ACK initialization scripts — ACK installs Kubernetes components, optional hardware drivers, and security packages. The node then registers with the cluster API server.
Run custom scripts — Cloud-init runs the User Data script for application-level setup, in parallel with the node transitioning to Ready.

ACK initialization scripts

This phase includes seven sub-steps and supports skip mechanisms for offline environments and specialized configurations.

Prepare the basic environment

This sub-step configures the runtime environment Kubernetes needs to start on the node:

Start chronyd — synchronizes the node clock with the NTP server.
Set node ID and hostname — calculates and assigns a node ID in the cluster. If the node pool is configured with a Custom Node Name, that name is also set as the hostname.
Initialize the data disk — if the node pool has data disks, the system finds the last disk in lexicographic order (prioritizing NVMe), formats it, and mounts it as the container runtime directory (for example, /var/lib/containerd).

Skip mechanisms:

File to create	Effect
`/var/.skip-auto-fdisk`	Skips automatic disk formatting. Use this when managing partitions manually or when the disk already contains data.
`/var/.keep-container-data`	Skips formatting but copies existing runtime data (such as an image cache) to the new mount path after mounting. Use this to avoid repeated image pulls when migrating a data disk.

Install Kubernetes components

Installs and configures the core components required to run Kubernetes:

kubelet — see Custom Node Pool Kubelet Configuration.
Container runtime (such as containerd) — see Custom Node Pool Containerd Configuration.

Install optional components

ACK installs additional components based on node pool configuration:

Heterogeneous hardware drivers and device plugin — for GPU or NPU instances, the system installs the corresponding hardware drivers and a Kubernetes device plugin (for example, the NVIDIA Device Plugin), enabling the scheduler to recognize and allocate these resources.
Image acceleration software — if on-demand container image loading is enabled, the system installs the acceleration software and updates the container runtime configuration.
SGX drivers and dependencies — if ACK-TEE confidential computing is enabled, the system installs SGX drivers and related packages to provide a trusted execution environment (TEE) for confidential containers.

Join the cluster

kubelet starts, registers with the cluster's API server, and the node enters the cluster with an initial status of NotReady.

Install additional software

To ensure stability and security, ACK installs base tool packages and applies security updates.

Installed packages: pigz, container-selinux, zlib

Security updates (Alibaba Cloud Linux only):

Upgrades systemd as needed for stability.
Applies minimal CVE fixes by running: yum update-minimal --exclude kernel* --security -y

Skip mechanisms:

File to create	Effect
`/var/.skip-yum`	Skips this entire sub-step, including CVE fixes. Use this for offline environments or when strict package version control is required.
`/var/.skip-security-fix`	Skips CVE fixes only. Base tool packages are still installed.

Apply OS configuration

After all software is installed, the system reapplies the node pool's OS configuration to confirm all settings are effective. See Manage Node Pool OS Parameters.

Apply security hardening

Based on the node pool's configured Security Hardening capabilities (such as Alibaba Cloud OS hardening), the system runs the corresponding hardening scripts. These scripts harden kernel parameters and adjust system permissions to meet enterprise security requirements.

Parallel execution: custom scripts and node readiness

After a node joins the cluster in NotReady status, two tasks run in parallel:

Node status transition — kubelet continuously communicates with the control plane. When critical components are healthy and all health checks pass, the node transitions to Ready and the Kubernetes scheduler makes it available for pods.
User Data script execution — Cloud-init runs the User Data script for application-level initialization, such as populating data directories or starting non-containerized helper services.

These two tasks are independent. The User Data script's outcome — success, failure, or duration — does not block the node from transitioning to Ready. Do not assume that the node is Ready when the User Data script finishes, or that the User Data script has finished when the node becomes Ready.

Trace custom script execution

To check whether the User Data script ran successfully, log on to the node and run:

cat /var/log/cloud-init-output.log

Container Service for Kubernetes:introduction to ack cluster node initialization process