All Products
Search
Document Center

Container Service for Kubernetes:ack-ai-installer: Introduction and change log

Last Updated:Mar 26, 2026

ack-ai-installer is a collection of Device Plugins that enhances GPU scheduling capabilities in ACK Managed Cluster Pro Edition and ACK Edge Cluster Pro Edition. It works with ACK Scheduler — a unified scheduling system built on the Kubernetes Scheduling Framework extension — to support shared GPU scheduling and GPU topology-aware scheduling.

Component overview

ack-ai-installer includes the following sub-components. Each works with ACK Scheduler to extend GPU scheduling beyond the default exclusive GPU scheduling available in ACK Managed Cluster Pro Edition and ACK Edge Cluster Pro Edition.

gpushare-device-plugin

gpushare-device-plugin works with ACK Scheduler to enable shared GPU scheduling with sharing isolation. Multiple applications or processes share a single GPU card, improving resource utilization across the cluster.

cgpu-installer

cgpu-installer builds on shared GPU scheduling by integrating with cGPU, Alibaba Cloud's GPU container sharing technology. This adds:

  • GPU memory isolation: Different applications or processes are isolated from each other in GPU memory, preventing task interference.

  • GPU computing power isolation: Fine-grained allocation policies — average, preemption, and weight — control how computing power is distributed across containers.

For installation methods and scenarios, see Manage the shared GPU scheduling component and Allocate computing power using shared GPU scheduling.

gputopo-device-plugin

gputopo-device-plugin enables GPU topology-aware scheduling. It selects the GPU combination on a node that provides the optimal training speed.

For installation steps and scenarios, see GPU topology-aware scheduling.

Usage notes

  • You can install ack-ai-installer only in ACK Managed Cluster Pro Edition and ACK Edge Cluster Pro Edition from the Cloud-native AI Suite page in the console.

  • ack-ai-installer is pre-installed in ACK Lingjun managed clusters.

  • For ack-ai-installer versions earlier than 1.12.0, cluster versions 1.18.8 and later are supported.

  • For ack-ai-installer versions 1.12.0 and later, only cluster versions 1.20 and later are supported.

Change log

March 2026

Version

Changes

Last Modified

Impact

1.13.1

  • cGPU:

    • Supports Ubuntu 24.04 with 6.x kernels.

    • Supports multi-card NVSwitch scenarios. Supports scheduling cGPU containers using MPS Daemon.

    • Fixes an issue where GPU memory usage is displayed incorrectly for multi-card containers.

    • Supports the ecs.gn8ga and ecs.ebmgn8ga instance types.

  • gpushare-device-plugin:

    • Supports reporting of node NUMA topology.

    • Fixes an issue where the specified GPU memory value is inaccurate in MPS scenarios.

    • Adapts the working directory to /var/run/nvidia-gpu/nvidia-mps for MPS scenarios.

    • Fixes an issue where the device plugin restarts due to a liveness probe timeout in MPS scenarios.

March 16, 2026

This upgrade does not affect existing services.

October 2025

Version

Changes

Change Time

Impact

1.13.0

  • gpushare-device-plugin:

    • Supports querying pods in the pending state from the kubelet to reduce the pressure on the API server.

October 29, 2025

This upgrade does not affect existing services.

August 2025

Version number

Changes

Modification Time

Impact

1.12.8

cGPU 1.5.20 update:

  • Fixes a rare cGPU instance ID conflict issue that occurs during concurrent pod startups.

August 04, 2025

This upgrade does not affect existing services.

July 2025

Version

Changes

Change Time

Impact

1.12.7

  • cGPU is updated to version 1.5.19.

  • gpushare-device-plugin: Fixes an issue where retries fail if an NVML call fails at startup.

July 17, 2025

This upgrade does not affect existing services.

1.12.6

cGPU 1.5.19 update:

  • Supports Alibaba Cloud Linux 3 container-optimized OS images.

  • Supports modifying computing power allocation using time slices (policy5).

  • Fixes an issue where multi-card pods fail to be created in cgroup v2.

  • ebmgn9t supports computing power allocation (Policy 0–4).

July 16, 2025

This upgrade does not affect existing services.

June 2025

Version

Changes

Change Time

Impact

1.12.5

  • cGPU is updated to version 1.5.18.

  • Fixes an issue where the first GPU pod fails to start on a cGPU node in some scenarios.

June 23, 2025

This upgrade does not affect existing services.

1.12.4

  • cGPU is updated to version 1.5.17 and supports vLLM 0.6.6 and earlier.

  • cgpu-installer can be installed on CentOS 7 and Alibaba Cloud Linux 2.

June 19, 2025

This upgrade does not affect existing services.

May 2025

Version number

Changes

Change Time

Impact

1.12.3

  • cGPU is updated to version 1.5.16.

  • Adds a retry feature to cgpu-installer.

May 14, 2025

This upgrade does not affect existing services.

March 2025

Version

Changes

Change Time

Impact

1.12.2

  • cGPU is updated to version 1.5.15.

  • Adds node affinity to cgpu-installer to prevent it from being scheduled to Lingjun nodes.

March 17, 2025

This upgrade does not affect existing services.

February 2025

Version

Changes

Update Time

Impact

1.12.1

  • cGPU is updated to version 1.5.15.

  • Adds a health check feature for node resources to gpushare-device-plugin.

February 18, 2025

This upgrade does not affect existing services.

January 2025

Version

Changes

Modification Time

Impact

1.12.0

  • Releases cGPU 1.5.15, which supports containerized installation of cGPU.

  • Reduces the privileged permissions of the cgpu-installer container.

  • Adds a precheck before cGPU installation. If the precheck fails, a `CGPUInstallFailed` Kubernetes event is reported.

  • Starting from this version, the ack-ai-installer component supports only cluster versions 1.20 and later.

January 03, 2025

This upgrade does not affect existing services.

November 2024

Version number

Changes

Change Time

Impact

1.11.1

Releases cGPU 1.5.13. Fixes a rare kernel crash issue that may be caused by residual container processes.

November 19, 2024

This upgrade does not affect existing services.

1.10.1

Releases cGPU 1.5.12. Fixes an issue where GPU memory isolation fails for some CUDA APIs on new driver versions such as 535.

November 07, 2024

This upgrade does not affect existing services.

September 2024

Version number

Changes

Modification Time

Impact

1.9.16

  • cGPU is updated to version 1.5.11.

  • Moves the cGPU installation process to an init container.

September 26, 2024

This upgrade does not affect existing services.

1.9.15

Releases cGPU 1.5.11. Fixes decoding-related issues.

September 19, 2024

This upgrade does not affect existing services.

August 2024

Version

Changes

Change Time

Impact

1.9.14

  • Fixes some issues related to the use of MPS Daemon.

  • Releases cGPU 1.5.10. Adds policy 6 to proportionally divide computing power and GPU memory.

August 21, 2024

This upgrade does not affect existing services.

1.9.14

Releases cGPU 1.5.9. Adds policy 6 to proportionally divide computing power and GPU memory.

August 13, 2024

This upgrade does not affect existing services.

May 2024

Version

Changes

Modification Time

Impact

1.9.11

Releases cGPU 1.5.7. Supports L-series GPUs and GPU drivers of version 550 and later.

May 14, 2024

This upgrade does not affect existing services.

1.9.10

Releases cGPU 1.5.7. Fixes an issue where the cgpu policy set command is invalid.

May 09, 2024

This upgrade does not affect existing services.

January 2024

Version

Changes

Change Time

Impact

1.8.8

Releases cGPU 1.5.6. A new cGPU License Server policy is released.

January 04, 2024

This upgrade does not affect existing services.

December 2023

Version

Changes

Modification Time

Impact

1.8.7

  • cGPU is updated to version 1.5.5.

  • Supports shared GPU scheduling for MPS.

December 20, 2023

This upgrade does not affect existing services.

November 2023

Version

Changes

Change Time

Impact

1.8.5

Releases cGPU 1.5.5. Fixes a Kernel Panic issue triggered by cgpu-procfs.

November 23, 2023

This upgrade does not affect existing services.

August 2023

Version

Changes

Change Time

Impact

1.8.2

  • cGPU is updated to version 1.5.3.

  • Supports dynamic multi-instance GPU (MIG) partitioning.

  • Fixes an issue where device-plugin-recover repeatedly restarts.

August 29, 2023

This upgrade does not affect existing services.

July 2023

Version

Changes

Change Time

Impact

1.7.7

  • Releases cGPU 1.5.3.

  • Fixes an issue with incorrect symbolic links for nvidia-container-toolkit and nvidia-container-runtime-hook.

  • Fixes an incompatibility issue with later driver versions, such as 470.182.03, 515.105.01, 525.105.17, and later.

July 04, 2023

This upgrade does not affect existing services.

April 2023

Version

Changes

Modification Time

Impact

1.7.6

  • Releases cGPU 1.5.2. Fixes an issue with incorrect systemd cgroup permissions.

  • Adds support for driver versions later than 5XX in cGPU.

  • Adds support for nvidia-container-runtime 1.10 and later in cGPU.

  • Fixes an issue with cGPU 1.5.1 support on containerd.

April 26, 2023

This upgrade does not affect existing services.

1.7.5

Releases cGPU 1.5.2.

April 18, 2023

This upgrade does not affect existing services.