All Products
Search
Document Center

Container Service for Kubernetes:ACK release notes 2025

Last Updated:May 20, 2025

This topic describes the release notes for Container Service for Kubernetes (ACK) and provides links to the relevant references.

Background information

  • For more information about the Kubernetes versions supported by ACK, see Supported Kubernetes versions.

  • The following operating systems are supported by ACK: Container OS, Alibaba Cloud Linux 3, Alibaba Cloud Linux 3 Container-optimized, Alibaba Cloud Linux 3 for ARM, Alibaba Cloud Linux (UEFI 3), Red Hat, Ubuntu, and Windows. For more information, see OS images.

March 2025

Product

Feature

Description

Region

References

ACK

Auto mode supported in ACK managed Pro clusters

When creating an ACK managed cluster, you can enable auto mode to rapidly create a Kubernetes cluster that adheres to Alibaba Cloud's best practices.

After the cluster is created, a node pool with auto mode enabled will be automatically created. This node pool provides:

  • Dynamic auto scaling based on workload demands

  • Fully managed operations by ACK, including:

    • Operating system version upgrades

    • Software updates

    • Security vulnerability patches

All regions

Tracing Analysis of cluster control plane and data plane components supported

Once Tracing Analysis is enabled for the cluster API server or kubelet, the tracing information is automatically reported to Managed Service for OpenTelemetry, which provides monitoring data, such as visual link details and real-time topology.

All regions

Release of high-risk KubeConfig SMS and email notification feature

You can notify users via SMS and email that there are deleted but still risky kubeconfig files associated with the current account.

All regions

None

Intelligent routing and traffic management supported based on ACK Gateway with Inference Extension

You can configure the inference service extension by using the ACK Gateway with the Inference Extension component to achieve intelligent routing and efficient traffic management.

All regions

Implement intelligent routing and traffic management by using ACK Gateway with Inference Extension

Distributed Cloud Container Platform for Kubernetes (ACK One)

Unified management of multi-cluster fleet components supported

ACK One Fleet provides unified and automated component management capabilities for cluster O&M engineers. It allows for the definition of baselines that include multiple components and their versions, which can be deployed across multiple clusters. Additionally, ACK One Fleet supports features such as component configuration, deployment batches, and rollback, thereby enhancing the stability of the system.

All regions

Multi-cluster component management

Dynamic distribution and rescheduling supported

ACK One Fleet can partition workloads into replicas based on the available resources of the subclusters by using PropagationPolicy. The descheduling feature of ACK One Fleet is enabled by default, and this feature conducts automatic checks every two minutes. If a pod remains in an unschedulable state for more than 30 seconds, the descheduling for that replica is triggered.

All regions

Dynamic distribution and descheduling

cloud-native AI suite

Slurm queue priority configuration supported

New best practices are added to introduce how to achieve optimal performance in a Slurm system environment by using appropriate queue configuration strategies when job submissions or job state changes occur. This aims to maximize task scheduling and processing.

All regions

Configure Slurm queue priority based on ACK clusters

February 2025

Product

Feature

Description

Region

References

ACK

Security group and time zone changeable for the control plane

The security group and time zone of the control plane of an ACK cluster can be changed. When the security group and time zone of a cluster do not meet your business requirements, you can change the security group and time zone on the Basic Information tab of the cluster details page in the ACK console.

All regions

View cluster information

containerd parameters customizable for node pools

The containerd parameters of a node pool can be customized. For example, you can configure multiple registry mirrors for an image registry or configure the container runtime to skip certificate authentication when pulling container images from an image registry.

All regions

Customize the containerd parameters of a node pool

Node pool scalability level displayed in the ACK console

The scalability levels of node pools are displayed in the ACK console. When instances are out of stock or specific instance types are not supported in the zones where a node pool is deployed, the node pool may fail to be scaled out. You can evaluate the configuration availability and instance inventory sufficiency of a node pool based on its scalability level. The ACK console provides suggestions for node pools based on their scalability levels.

All regions

Check the scalability of a node pool

Batch job orchestration supported

Batch job orchestration is supported. Argo Workflows is a Kubernetes-native workflow engine. It allows you to use YAML or Python to orchestrate concurrent jobs in order to simplify the automation and management of containerized applications. It is suitable for CI/CD pipelines, data processing, and machine learning. You can install the Argo Workflows component to enable batch job orchestration. Then, you can use the Argo CLI or console to create and manage workflows.

All regions

Enable batch task orchestration

GPU diagnostics supported

GPU diagnostics is supported. ack-node-problem-detector is a monitoring component that ACK develops based on the open source node-problem-detector project for node exception detection. This component provides a rich variety of check items and enhanced GPU fault detection. When a GPU fault is detected, a Kubernetes event or Kubernetes node condition is generated to record the fault type and relevant information.

All regions

GPU diagnostics

Distributed Cloud Container Platform for Kubernetes (ACK One)

Spark jobs schedulable and distributable based on idle resources in multiple clusters

Idle resources can be used to schedule and distribute Spark jobs to multiple clusters. This topic describes how to use an ACK One Fleet instance and the ACK Koordinator component to use the idle resources in the clusters associated with the Fleet instance to schedule and distribute a Spark job across multiple clusters. This helps you utilize idle resources in multiple clusters. You can configure job priority and the colocation feature to prevent the online services from being affected by the Spark job.

All regions

Use idle resources to schedule and distribute Spark jobs in multiple clusters

ACK Edge

New pod vSwitches addable

New pod vSwitches can be added for ACK Edge clusters. In ACK Edge cluster scenarios with the Terway Edge plug-in deployed, if you run out of vSwitch IP addresses or need to expand your pod CIDR block, you can add new pod vSwitches to provide additional IP addresses for the cluster.

All regions

Add pod vSwitches for IP capacity expansion

GPU monitoring supported

GPU monitoring is supported. ACK Edge clusters allow you to manage GPU-accelerated nodes in the data center and at the edge. You can manage heterogeneous computing power across multiple regions and environments. You can connect an ACK Edge cluster to Managed Service for Prometheus. This way, GPU-accelerated nodes in the data center and at the edge can be monitored in the same way as nodes in the cloud.

All regions

Best practices for monitoring GPU resources in ACK Edge clusters

Cloud-native AI Suite

Inference services deployable from DeepSeek distilled models in ACK

Inference services deployable from DeepSeek distilled models in ACK. You can use KServe to deploy an inference service from a DeepSeek-R1-Distill-Qwen-7B model in an ACK cluster in a production environment.

All regions

Deploy an inference service from a DeepSeek distilled model in ACK

Best practices for deploying the DeepSeek full version across multiple nodes in ACK released

A new topic is released to provide best practices for deploying a distributed inference service from a DeepSeek-R1-671B model across multiple nodes in ACK. This topic describes how to use Arena to efficiently deploy a distributed inference service on two nodes with a hybrid parallelism strategy. This topic also describes how to seamlessly integrate a DeepSeek-R1 deployed in ACK into the Dify platform to build an enterprise-level intelligent Q&A system that supports long text comprehension.

All regions

Practice for deploying the DeepSeek full version across multiple nodes in ACK

January 2025

Product

Feature

Description

Region

References

ACK

On-demand image loading supported to accelerate container startup in node pools

On-demand image loading is supported to accelerate container startup in node pools. ACK supports on-demand image loading based on the Data Accelerator for Disaggregated Infrastructure (DADI) feature. This allows you to download images on demand and decompress the image data online, which greatly reduces the container startup time.

All regions

Use on-demand image loading to accelerate container startup

Alibaba Cloud Linux 3 Container-optimized supported

Alibaba Cloud Linux 3 Container-optimized is supported. Alibaba Cloud Linux 3.2104 LTS 64-bit Container-optimized images are optimized for container scenarios based on the default standard images for Alibaba Cloud Linux, which is a cloud-native operating system. Alibaba Cloud Linux 3 Container-optimized images are developed by Alibaba Cloud in-house based on the extensive practical experience of a large number of customers on ACK. Alibaba Cloud Linux 3 Container-optimized images are suitable for container scenarios that require higher business deployment density, faster startup speeds, and a higher level of security isolation.

All regions

Kubernetes 1.32 supported

Kubernetes 1.32 is supported. You can create ACK clusters that run Kubernetes 1.32 or upgrade ACK clusters from earlier Kubernetes versions to Kubernetes 1.32.

All regions

Kubernetes 1.32

Resource utilization improvable by using ElasticQuotaTree and ack-kube-queue

Resource utilization can be improved by using ElasticQuotaTree and ack-kube-queue. To allow different teams and jobs to share computing resources in a cluster and ensure effective resource allocation and isolation, you can use ack-kube-queue, ElasticQuotaTree, and ack-scheduler.

All regions

None

Best practices for fine-grained resource management of ACK clusters by using resource groups released

A new topic is released to provide best practices for using resource groups to manage resources in ACK clusters in a fine-grained manner. You can use resource groups to sort the resources into groups and manage the resource groups. Resource groups allow you to sort resources into groups by department, project, and environment, and use Resource Access Management (RAM) to isolate resources and manage resource permissions in a fine-grained manner within a single Alibaba Cloud account.

All regions

Fine-grained resource management by using resource groups

ACK One

Computing power of ACS available in ACK One registered clusters

Computing power of Alibaba Cloud Container Compute Service (ACS) is available in ACK One registered clusters.

All regions

Schedule pods to ACS using virtual nodes

Cross-cluster Service access supported by using domain names

Services can be accessed across clusters by using domain names. ACK One provides the multi-cluster Services (MCS) feature to allow you to access Services across Kubernetes clusters by using domain names. This achieves cross-cluster Service traffic routing without the need to modify your business code or modify the dnsConfig field or CoreDNS configurations for your business pods.

All regions

Access Services across clusters by using domain names

Multi-cluster resources accessible by using the SDK for Go

Multi-cluster resources can be accessed by using the SDK for Go. If you want to integrate ACK One Fleet instances into the platform to access the resources of each cluster, you can use the SDK for Go.

All regions

Access multi-cluster resources by using the SDK for Go

ACK Edge

ECS nodes supported for scaling activities in ACK Edge clusters

ECS nodes can be used for scaling activities in ACK Edge clusters. When the resources provided by on-premises machines in an ACK Edge cluster become insufficient, you can use the node auto scaling feature to automatically add ECS nodes to the cluster to supplement resource capacity for scheduling.

All regions

ECS node elasticity

Elastic inference services deployable from LLMs in hybrid cloud environments

Large language models (LLMs) can be used to deploy elastic inference services in hybrid cloud environments. You can use the ack-kserve component and the auto scaling feature of ACK Edge clusters to deploy LLMs as elastic inference services in hybrid cloud environments. This helps you flexibly schedule resources in the cloud and resources in data centers and reduce the operational costs of inference services deployed by using LLMs.

All regions

GPU sharing supported

GPU sharing is supported. GPU sharing allows you to schedule multiple pods to the same GPU to share the computing resources of the GPU. This improves GPU utilization and reduces costs.

  • The cloud nodes of the ACK Edge cluster support the GPU sharing, GPU memory isolation, and computing power isolation features.

  • The edge node pools of the ACK Edge cluster support only GPU sharing. The GPU memory isolation and computing power isolation features are not supported.

All regions

Use GPU sharing

Centralized management of ECS resources in multiple regions supported

A new topic is released to provide best practices for how to use an ACK Edge clusters to centrally manage compute resources that reside in multiple regions. This topic helps you implement full lifecycle management and efficient resource scheduling for cloud-native applications.

All regions

Centrally manage ECS resources in multiple regions

References

To view the historical release notes for ACK, see Historical release notes (before 2025).