All Products
Search
Document Center

Container Service for Kubernetes:ACK node initialization process

Last Updated:Dec 09, 2025

Container Service for Kubernetes (ACK) provides stable, efficient, and predictable node management. When you create a new node or scale out an existing node pool, ACK follows a standard initialization process. This process installs and configures software based on the node pool's configuration and adds the node to a Kubernetes cluster.

Usage notes

This process applies to Elastic Compute Service (ECS) and Elastic GPU Service (EGS) node pools in ACK managed clusters and ACK dedicated clusters running Kubernetes version 1.20 or later.

Process overview

image

Step 1: Create a node

After receiving a request to scale out or automatically add a node, ACK creates an ECS or EGS instance based on the node pool configuration, such as instance type, image, and disks. The OS then performs basic initialization, such as configuring the network and mounting the system disk.

Next, the Cloud-init tool runs the User Data script. This script consists of Pre-defined Custom Data, the ACK initialization script, and User Data.

Step 2: Run the pre-customization script

Cloud-init first runs the Pre-defined Custom Data. This script typically installs specific system dependency packages or monitoring agents, and performs preliminary environment checks or configurations.

Step 3: Run the ACK initialization script

This stage includes multiple steps and provides a flexible skip mechanism for different use cases.

3.1 Prepare the basic environment

This step configures the basic environment required for Kubernetes on the node.

Execution flow

  • Start the chronyd service: This ensures that the node's time is synchronized with a Network Time Protocol (NTP) server.

  • Set the Node ID and hostname.

    • Node ID: Calculates and sets a node ID in the Kubernetes cluster.

    • Hostname: If a Custom Node Name is configured for the node pool, this name is also set as the node's hostname.

  • Data disk initialization: If a data disk is configured for the node pool, by default, the system finds the last data disk in lexicographic order (NVMe disks are prioritized), formats it, and mounts it. This disk is used to store the container runtime environment, such as the containerd working directory at /var/lib/containerd.

Skip mechanism

  • To manually manage data disk partitions and file systems, or to use an existing data disk that already contains historical data, create the file /var/.skip-auto-fdisk in the pre-defined custom data to skip automatic formatting.

  • If you do not skip automatic formatting but want to keep and migrate data from the original runtime directory (such as the image cache), create the file /var/.keep-container-data. After mounting the data disk, the system copies the existing data to the new directory. This prevents the system from pulling the images again.

3.2 Install Kubernetes add-ons

This step installs the core add-ons required to run Kubernetes.

3.3 Install add-ons as needed

ACK installs optional add-ons based on the node pool configuration.

  • Install heterogeneous hardware drivers and device plugins: If the node is a heterogeneous computing instance, such as a GPU or NPU instance, the system automatically installs the corresponding drivers and Kubernetes device plugins, such as the NVIDIA Device Plugin. This allows the cluster to detect and schedule these resources.

  • Install image acceleration software: If features such as Use on-demand loading of container images to accelerate container startup are enabled, the system installs the relevant acceleration software and modifies the container runtime configuration. This improves container startup speed.

  • Install SGX-related software: If TEE-based confidential computing is enabled, the system installs the Software Guard Extensions (SGX) driver and related dependencies. This provides a trusted execution environment for running confidential containers.

3.4 Register the cluster

After the core add-ons are installed, kubelet starts and registers with the cluster's API server. Then, the node is officially added to the cluster. Its initial status is marked as NotReady.

3.5 Install additional basic software

To ensure node stability and security, ACK installs additional basic software packages and performs security updates by default.

Execution flow

  • Install basic toolkits: Installs common tools such as pigz, container-selinux, and zlib for advanced features such as storage and networking.

  • Security updates (Alibaba Cloud Linux only):

    • Upgrade systemd as needed: The systemd version may be upgraded to ensure stability and functionality.

    • Perform minimal security vulnerability fixes: Automatically fixes Common Vulnerabilities and Exposures (CVEs) by running yum update-minimal --exclude kernel* --security -y.

Skip mechanism

  • Create the /var/.skip-yum file to completely skip this step, including CVE fixes. This is useful for offline environments or use cases that require strict versioning.

  • Create the /var/.skip-security-fix file to skip only the CVE vulnerability fixes. The basic toolkits are still installed.

3.6 Apply node pool OS configurations

After all software is installed and updated, the system re-applies the node pool's OS configurations. This ensures that all settings are effective in the final environment. For more information, see Manage OS parameters for a node pool.

3.7 Enable security hardening as needed

Based on the Security Hardening feature configured for the node pool (such as OS Security Hardening), the system runs the corresponding scripts. These scripts harden kernel parameters and restrict system permissions to meet enterprise security standards.

Step 4: Run the custom script

At this stage, the custom script (provided as User Data) is executed. This script is commonly used to perform application-level initialization tasks, such as creating application data directories or starting non-containerized helper services.

Parallel execution flow

Status change (from  NotReady to Ready)

After a node is added to a cluster and is in the NotReady state, its kubelet continuously communicates with the control plane. When the key add-ons on the node are ready and the node passes all health checks, its status automatically changes to Ready. The Kubernetes scheduler then considers the node an available resource and begins to schedule pods on it.

Parallel mechanism explained

The node status change and the execution of the User Data are parallel tasks.

  • Non-blocking: The execution of the User Data is independent of the node's transition to the Ready state. This means the script's execution result (success, failure, or duration) does not prevent the node from becoming Ready.

  • Execution timing: The two processes run in parallel. You cannot assume that application pods are running on the node when the User Data script finishes. Conversely, you cannot assume that the script has finished when pods start running on the node.

  • Log tracking: To check the execution status of the custom script, log on to the node and run the cat /var/log/cloud-init-output.log command to view the execution log and final status.