All Products
Search
Document Center

Container Service for Kubernetes:ACK release notes for 2025

Last Updated:Nov 08, 2025

This topic describes the latest release notes for Container Service for Kubernetes (ACK).

Background information

  • For information about the Kubernetes versions that Container Service for Kubernetes (ACK) supports, see Version guide.

  • Container Service for Kubernetes (ACK) supports operating systems such as ContainerOS, Alibaba Cloud Linux 3 Container Optimized Edition, Alibaba Cloud Linux 3, Alibaba Cloud Linux 3 for Arm, Alibaba Cloud Linux UEFI 3, Red Hat, Ubuntu, and Windows. For more information, see Operating systems.

October 2025

Product

Feature

Description

Region

References

Support for scheduling GPUs using DRA

In AI training and inference scenarios where multiple applications need to share GPU resources, you can deploy the NVIDIA Dynamic Resource Allocation (DRA) driver in your ACK cluster to overcome the scheduling limitations of traditional device plug-ins. The Kubernetes DRA API allows for dynamic GPU allocation and fine-grained resource control among pods, which improves GPU utilization and reduces costs.

All

Schedule GPUs using DRA

Distributed Cloud Container Platform for Kubernetes (ACK One)

Registered clusters support ACS GPU-HPN capacity reservation

By registering an on-premises Kubernetes cluster with the cloud and using the GPU High-Performance Network (GPU-HPN) capacity reservation mechanism, you can uniformly manage and intelligently schedule on-premises and cloud GPU resources. This provides stable, high-performance computing for key workloads such as AI training and inference.

All

Example of using ACS GPU HPN computing power in an ACK One registered cluster

Support for collecting metrics of control plane components using a self-managed Prometheus

For hybrid cloud environments that use a self-managed Prometheus monitoring system, you can install the Metrics Aggregator component and configure a ServiceMonitor to centrally manage the health status of the control plane of an ACK One registered cluster. This integrates core component metrics into your existing monitoring system for unified alerting and observability.

All

Collect metrics of control plane components using a self-managed Prometheus

Cloud Native AI Suite

Support for submitting PyTorch distributed training jobs accelerated by eRDMA using Arena

In multi-node GPU training, if network communication latency degrades overall performance, you can use Arena to submit PyTorch distributed jobs and configure elastic Remote Direct Memory Access (eRDMA) network acceleration. This shortens the model training cycle by enabling low-latency, high-throughput communication between nodes, which improves training efficiency and cluster utilization.

All

Submit PyTorch distributed training jobs accelerated by eRDMA using Arena

September 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

Support for Kubernetes 1.34

ACK now supports Kubernetes 1.34. You can create clusters that run Kubernetes 1.34 or upgrade existing clusters to Kubernetes 1.34.

All

Kubernetes 1.34

Support for hybrid cloud node pools

To manage on-premises server resources in an ACK cluster, you can create a hybrid cloud node pool in an ACK Pro cluster to achieve elastic scheduling and cost optimization for both cloud and on-premises resources. Add your existing hybrid cloud nodes to the cluster to leverage your current IT assets while maintaining unified orchestration.

All

Create and manage hybrid cloud node pools

Support for configuring DNS resolution for hybrid cloud node pools

If a hybrid cloud node pool uses CoreDNS on the cloud for domain name resolution, frequent access can increase the load on the leased line and may cause resolution failures due to an unstable connection. You can configure NodeLocal DNSCache to mitigate these issues.

All

Configure NodeLocal DNSCache for a hybrid cloud node pool

Support for the Terway Hybrid network plug-in

When a hybrid cloud node pool is connected to an on-premises data center, its complex network topology and cross-domain routing requirements exceed the capabilities of regular container network plug-ins. The Terway Hybrid network plug-in is designed for hybrid cloud node pools and ensures network connectivity between pods in the cluster, whether they are in the data center or on the cloud.

All

Use the Terway Hybrid network plug-in

ossfs 2.0 supports RRSA authentication

For applications that require persistent storage or data sharing among multiple pods, you can mount an OSS bucket as an ossfs 2.0 volume using a dynamically provisioned PV. We recommend using RAM Roles for Service Accounts (RRSA) for authentication. RRSA provides a higher level of security with auto-rotated temporary credentials and supports pod-level permission isolation, making it suitable for production, multitenancy, and other high-security environments.

All

Use dynamically provisioned ossfs 2.0 volumes

Distributed Cloud Container Platform for Kubernetes (ACK One)

Support for accessing cloud GPU computing power

ACK One registered clusters support unified scheduling and O&M for various heterogeneous computing resources. This significantly improves the resource utilization of Kubernetes clusters that use heterogeneous computing.

All

Access cloud GPU computing power

Support for migrating single-cluster applications to a fleet and distributing them to multiple clusters

To resolve issues such as repetitive operations, errors, and synchronization difficulties in multi-cluster application deployments, you can use the AMC command-line interface (CLI) to quickly deploy applications to multiple clusters. This also enables unified management and automatic synchronization of subsequent updates.

All

Migrate a single-cluster application to a fleet and distribute it to multiple clusters

August 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

Support for KV Cache-aware load balancing using smart inference routing

KV Cache-aware load balancing is designed for generative AI inference scenarios. It dynamically allocates requests to the optimal compute nodes to significantly improve the efficiency of large language model (LLM) services.

All

Use prefix cache-aware routing in precision mode

Support for custom CNI plug-ins

The default Terway and Flannel Container Network Interface (CNI) plug-ins provided by ACK meet most container network requirements. However, in some scenarios, to use specific features of other CNI plug-ins, ACK lets you install a custom CNI plug-in in your cluster using the Bring Your Own CNI (BYOCNI) mode.

All

Use a custom CNI plug-in in an ACK cluster

Intelligent hosting mode clusters support the managed policy governance component

To meet cluster compliance requirements and enhance cluster security, enable the security policy management feature. Security policy rules include Infra, Compliance, Pod Security Policy (PSP), and K8s-general.

All

Enable security policy management

Knative supports ACS computing power

Knative services can be configured to use Container Compute Service (ACS) computing power. The diverse computing types and quality of ACS help meet the workload demands of different business scenarios and optimize costs.

All

Use ACS resources

Gateway with Inference Extension supports more flexible configurations

  • Support for custom inference extension configurations: You can adjust routing policies by configuring annotations or modify and overwrite the extension's deployment configuration by creating a ConfigMap.

  • Support for custom Gateway configurations: You can adjust the actual Gateway parameters, such as the service type, number of deployment replicas, and resources, by modifying the EnvoyProxy resource configuration.

All

Support for securely deploying vLLM inference services in ACK confidential computing clusters for heterogeneous computing

Large language model (LLM) inference involves sensitive data and core model assets, which are at risk of leakage when run in untrusted environments. The ACK Confidential AI (ACK-CAI) solution integrates hardware-based confidential computing technologies such as Intel Trust Domain Extensions (TDX) and GPU Trusted Execution Environments (TEE) to provide end-to-end security for model inference.

All

Securely deploy vLLM inference services in an ACK confidential computing cluster for heterogeneous computing

Cloud Native AI Suite

AI Inference Suite is launched

With the widespread use of large language models (LLMs), efficiently, stably, and scalably deploying and managing them in production environments has become a core challenge for enterprises. The Cloud Native AI Inference Suite (AI Serving Stack), built on Alibaba Cloud Container Service for Kubernetes, is an end-to-end solution designed for cloud-native AI inference. The suite addresses the full lifecycle of LLM inference, providing integrated capabilities for deployment management, smart routing, elastic scaling, and deep observability. Whether you are just starting or already have large-scale AI operations, the Cloud Native AI Inference Suite can handle complex cloud-native AI inference scenarios.

All

AI Inference Suite

July 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

Support for accessing ECS instance metadata only in enforced mode

You can retrieve ECS metadata, such as instance IDs, VPC information, and network interface card information, from within an ECS instance using the Instance Metadata Service (IMDS). In an ACK cluster, the default access mode for node instance metadata is compatible with both normal mode and enforced mode. You can switch to the enforced mode only (IMDSv2) to further enhance the security of the IMDS.

All

Access ECS instance metadata only in enforced mode

Support for subscribing to images from overseas sources

To periodically synchronize images from overseas image repositories such as Docker Hub, Google Container Registry (GCR), and Quay to an Enterprise Edition instance, you can use the artifact subscription feature of the Enterprise Edition instance.

All

Obtain images from overseas sources through artifact subscription

Support for mounting NAS file systems using the EFC client through CNFS

Extreme File Client (EFC) provides capabilities such as distributed caching to improve the access performance of File Storage NAS. It also supports high-concurrency and parallel access to large datasets, making it suitable for data-intensive and high-concurrency containerized application scenarios, such as big data analytics, AI training, and AI inference. Compared with mounting NAS using the default NFS protocol, mounting NAS using EFC can accelerate file access and improve read and write performance.

All

Mount a NAS file system using the EFC client through CNFS

Distributed Cloud Container Platform for Kubernetes (ACK One)

Support for a console-based GitOps experience

You can use the console to manage the full range of GitOps capabilities. This includes enabling or disabling features, enabling public network access and configuring access control lists (ACLs), using the ApplicationSet UI, configuring Argo CD ConfigMaps and restarting components, and using monitoring and logging observability features.

All

Quick Start for GitOps

Multi-cluster GitOps supports Argo CD ConfigMap configuration

ACK One lets you manage GitOps-related features and permissions by configuring the Argo CD ConfigMap.

All

Configure an Argo CD ConfigMap

Support for enabling inventory-aware elastic scheduling for multi-cluster fleets

In multi-region application deployments, ACK One multi-cluster fleets use an inventory-aware smart scheduler to manage resource allocation. This scheduler works with instant elasticity. If a fleet's clusters have insufficient resources, application services are scheduled to clusters that have available inventory. The instant elasticity feature then scales out the required nodes in those clusters to accommodate the services. This approach improves scheduling success rates and reduces resource costs.

All

Enable inventory-aware elastic scheduling for a multi-cluster fleet

Container Service for Edge (ACK Edge)

Support for configuring PrivateLink for leased line access

ACK Edge clusters support network access through leased lines. This allows edge nodes in an ACK Edge cluster to securely and efficiently access Alibaba Cloud services such as ACK and Container Registry (ACR), resolving issues such as network conflicts and the lack of fixed IP addresses.

All

Configure PrivateLink for leased line access

June 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

Use AI Profiling in the console

AI Profiling is a non-intrusive performance analysis tool based on extended Berkeley Packet Filter (eBPF) and dynamic process injection. It is natively designed for Kubernetes container scenarios and supports online detection of container processes that run GPU jobs. It provides comprehensive data collection capabilities and lets you dynamically start and stop performance data collection on running GPU jobs. For online services, this dynamically attachable and detachable profiling tool enables real-time, detailed analysis without modifying the service code.

All

AI Profiling

GPU node auto-healing

The node auto-healing feature now supports auto-healing for instance failures caused by GPU software and hardware anomalies.

ACK provides Kubernetes-side auto-healing for instance failures on underlying Elastic GPU Service (EGS) nodes and Lingjun nodes that are caused by GPU software and hardware anomalies. It offers automated O&M capabilities for the entire process, from fault detection, alerting, and automatic isolation to draining a node and automatic repair. It also supports executing repairs only after user authorization, which further enhances automated fault O&M capabilities and reduces cluster O&M costs.

All

Enable node auto-healing

Statically provisioned volumes for CPFS for Lingjun

CPFS for Lingjun delivers ultra-high throughput and input/output operations per second (IOPS) and supports end-to-end RDMA network acceleration. It is suitable for intelligent computing scenarios such as AIGC and autonomous driving. You can create statically provisioned volumes for CPFS for Lingjun in your cluster and use them in workloads.

All

Use CPFS for Lingjun with statically provisioned volumes

ACK VPD CNI component

The ACK VPD CNI component provides container network management for Lingjun nodes in ACK Pro clusters. As a CNI plug-in for Lingjun nodes, ACK VPD CNI allocates and manages container network resources for Lingjun nodes that use Lingjun Connect.

All

ACK VPD CNI

ack-kms-agent-webhook-injector component

The ack-kms-agent-webhook-injector injects the Key Management Service (KMS) Agent as a sidecar container into pods. This allows application pods to use a local HTTP interface to retrieve credentials from a KMS instance through the KMS Agent and cache them in memory. This avoids hard coding sensitive information and enhances data security.

All

Import Alibaba Cloud KMS credentials for an application

Expanded capabilities for the Gateway with Inference Extension component

Gateway with Inference Extension supports multiple generative AI inference service frameworks, such as vLLM and SGLang. It provides enhanced capabilities for generative AI inference services that are deployed based on different frameworks. These capabilities include support for creating phased release policies, inference load balancing, and model name-based routing. You can also configure rate limiting and circuit breaking policies for inference services.

All

Overview of Gateway with Inference Extension

Implement the CAA confidential container solution based on confidential VMs

In scenarios that require confidential computing, such as finance risk control and healthcare, you can deploy confidential computing workloads in an ACK cluster using the Cloud API Adaptor (CAA) solution. This solution uses Intel® TDX technology to protect sensitive data from external attacks or potential threats from cloud providers, helping you meet industry compliance requirements.

All

Implement the CAA confidential container solution based on confidential VMs

Cloud Native AI Suite

Schedule Dify workflows using XXL-JOB

Dify workflows in many scenarios, such as risk monitoring, data analytics, content generation, and data synchronization, rely on scheduling to automate jobs. However, Dify does not natively support scheduling. To address this, this best practice describes how to integrate XXL-JOB, a distributed job scheduler, to schedule and monitor the status of workflow applications and ensure their stable operation.

All

Schedule Dify workflow applications using XXL-JOB

May 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

Support for Kubernetes 1.33

Support for Kubernetes 1.33 is available. You can create a cluster that runs Kubernetes 1.33 or upgrade an existing cluster to Kubernetes 1.33.

All

Kubernetes 1.33

The ack-ram-authenticator component is installed by default

Starting from Kubernetes 1.33, the latest version of the ack-ram-authenticator managed component is installed by default on newly created ACK managed clusters. This does not consume your cluster node resources.

All

[Service notice] The ack-ram-authenticator component is installed by default on ACK managed clusters that run Kubernetes 1.33 and later

containerd 2.1.1 is released

containerd 2.1.1 supports features such as Node Resource Interface (NRI), Container Device Interface (CDI), and Sandbox API.

All

containerd runtime release notes

Support for ossfs 2.0

ossfs 2.0 is a client based on Filesystem in Userspace (FUSE) that can mount Alibaba Cloud OSS as a local file system. This allows application containers to access OSS data through POSIX operations as if they were accessing local files. Compared with ossfs 1.0, ossfs 2.0 provides improved performance in sequential read and write operations and high-concurrency small file reads. It is suitable for scenarios with high storage access performance requirements, such as AI training, inference, big data processing, and autonomous driving.

All

ossfs 2.0

Distributed Cloud Container Platform for Kubernetes (ACK One)

Use ApplicationSet to coordinate multi-environment deployments and application dependencies

A new best practice is available. It describes how to build an automated deployment system that supports dependency management for multiple applications between development and pre-production environments. This is based on the Progressive Syncs feature of Argo CD and the multi-environment resource orchestration capabilities of ApplicationSet.

All

Use ApplicationSet to coordinate multi-environment deployments and application dependencies

April 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

Create and manage Lingjun node pools

You can create and manage Lingjun node pools in ACK Pro clusters.

All

Lingjun node pools

Configure node pools by specifying instance properties

You can configure the instance types of a node pool by specifying instance properties, such as vCPU and memory. The node pool automatically selects instance types that meet the requirements for scale-out, which improves the success rate of scale-out operations.

All

Configure a node pool by specifying instance properties

Real-time AI Profiling

In Kubernetes container scenarios, AI Profiling is a non-intrusive performance analysis tool based on eBPF and dynamic process injection. It supports online detection of container processes that run GPU jobs. For online services, this dynamically attachable and detachable profiling tool enables real-time, detailed analysis without modifying the service code.

All

Use AI Profiling from the command line

Enable preemption

When cluster resources are scarce, high-priority jobs may fail to run due to insufficient resources. After you enable preemption, the ACK scheduler can use resource impersonation to identify and evict low-priority pods, releasing computing resources to ensure that high-priority jobs start quickly.

All

Enable preemption

Access services through Gateway with Inference Extension

The Gateway with Inference Extension component is built on the Envoy Gateway project. It supports the full range of basic Gateway API capabilities and open source Envoy Gateway extension resources.

All

Access services through Gateway with Inference Extension

Enhancements for generative AI services

You can use the Gateway with Inference Extension component to implement features such as smart routing, efficient traffic management, phased releases for generative AI inference services, circuit breaking for inference services, and traffic mirroring for inference services.

All

Enhancements for generative AI services

PVC-to-PVC persistent volume backup and recovery

You can back up and recover disk data within an ACK cluster on the cloud, or between ACK clusters in the same or different regions. After a backup is completed in the source cluster, you can use the backup center to recover a new set of persistent volume claims (PVCs) and their corresponding PVs in the current cluster or another cluster. You can then directly mount them without adjusting any workload YAML configurations.

All

Backup center

alibabacloud-privateca-issuer is released

AlibabaCloud Private CA Issuer is released. It lets you use cert-manager to create and manage Alibaba Cloud Private CA certificates in your cluster. The issuer is now available in the ACK App Marketplace.

All

None

Deploy a workload and implement load balancing in an ACK managed cluster (intelligent hosting mode)

This topic describes how to deploy a workload in an ACK managed cluster (intelligent hosting mode) and use an ALB Ingress for public network access. After you complete the steps, you can access the application through the configured domain name to achieve efficient external traffic management and load balancing.

All

Deploy a workload and implement load balancing

Best practices for Datapath V2

This topic describes how to optimize the network configuration of a cluster that uses the Terway network plug-in after Datapath V2 is enabled. This includes configuring Conntrack parameters and managing Identity resources to improve cluster performance and stability.

All

Best practices for Datapath V2

Dify component upgrade guide

A new best practice is available. It describes how to upgrade ack-dify from an earlier version to v1.0.0 or later. The steps include backing up data, installing the plug-in migration tool into the plug-in system, and enabling the new plug-in ecosystem.

All

Upgrade the Dify component in an ACK cluster

Distributed Cloud Container Platform for Kubernetes (ACK One)

Use PrivateLink to resolve IP address conflicts in data center network segments

After a Kubernetes cluster in a data center is connected to an ACK One registered cluster through a leased line, conflicts may occur when you use Serverless computing resources because other services in the internal network use the same network segment. Use PrivateLink to resolve IP address conflicts in data center network segments.

All

Use PrivateLink to resolve IP address conflicts in data center network segments

Cross-region scheduling of ACS pods

ACK One registered clusters support seamless integration of Serverless computing resources from multiple regions into a Kubernetes cluster. This enables dynamic scheduling and unified management of cross-region GPU resources.

All

Cross-region scheduling of ACS pods

Log collection

You can configure log collection using SLS CRDs or environment variables to automatically collect container logs based on Alibaba Cloud Simple Log Service (SLS).

All

Container Service for Edge (ACK Edge)

Version 1.32 is released

Version 1.32 is supported. Features include optimizing requests from CoreDNS, kube-proxy, and kubelet to kube-apiserver, and reducing cloud-to-edge communication traffic.

All

Release notes for ACK Edge with Kubernetes 1.32

Network element configuration in a leased line environment

You can connect on-premises data center IDC server devices to a cluster through the Internet or a leased line for containerization management. When you connect through a leased line, you must configure the network elements of the infrastructure before access.

All

Network element configuration in a leased line environment

Cloud Native AI Suite

Support for the HistoryServer component

The native Ray Dashboard is available only when the cluster is running. After the cluster is stopped, you cannot obtain historical logs and monitoring data. You can use the RayCluster HistoryServer to collect node logs in real time during cluster runtime and persist them to OSS.

All

Install the HistoryServer component in ACK

Support for the KubeRay component

You can deploy the KubeRay Operator component and integrate it with Alibaba Cloud SLS and Prometheus monitoring to enhance log management, system observability, and high availability.

All

Install the KubeRay component in ACK

March 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

ACK Pro clusters support intelligent hosting mode

When you create an ACK managed cluster, you can enable intelligent hosting mode to quickly create a Kubernetes cluster that follows best practices.

After the cluster is created, an intelligent managed node pool is created by default. This node pool dynamically scales in or out based on workload requirements. ACK is responsible for O&M tasks such as operating system version upgrades, software version upgrades, and security vulnerability fixes.

All

Support for enabling tracing analysis for control plane and data plane components

After you enable tracing analysis for the cluster API server or kubelet, trace information is automatically reported to Managed Service for OpenTelemetry. This provides monitoring data such as visualized trace details and real-time topology.

All

Text message and email notifications for high-risk KubeConfigs are released

You can receive text message and email notifications about KubeConfigs that have been deleted but still pose a risk to your account.

All

None

Support for implementing smart routing and traffic management using ACK Gateway with Inference Extension

You can use the ACK Gateway with Inference Extension component to configure inference service extensions to implement smart routing and efficient traffic management.

All

Use Gateway with Inference Extension to implement smart routing and traffic management

Distributed Cloud Container Platform for Kubernetes (ACK One)

Support for unified management of multi-cluster fleet components

ACK One fleets provide unified and automated component management for cluster O&M engineers. You can define baselines that include multiple components and their versions and deploy them to multiple clusters. It also supports component configuration, deployment batches, and rollbacks to improve system stability.

All

Multi-cluster component management

Support for dynamic distribution and descheduling

An ACK One fleet can use a PropagationPolicy to chunk workload replicas based on the available resources of sub-clusters. By default, ACK One fleets have descheduling enabled. An automatic check is performed every two minutes. If a pod remains in an unschedulable state for more than 30 seconds, the replica is descheduled.

All

Dynamic distribution and descheduling

Cloud Native AI Suite

Support for setting Slurm queue priorities

A new best practice is available that describes how to use appropriate queue configuration policies in a Slurm system environment. These policies help schedule and process the maximum number of jobs when a job is submitted or its status changes, which optimizes performance.

All

Set Slurm queue priorities in an ACK cluster

February 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

Support for modifying control plane security groups and time zones

If the security group and time zone selected when you created a cluster no longer meet your requirements, you can modify the control plane security group and cluster time zone on the cluster's basic information page.

All

View cluster information

Node pools support custom containerd configurations

You can customize the containerd parameter settings for nodes in a node pool. For example, you can configure multiple mirror repositories for a specified image repository or skip the security certificate verification for a specific image repository.

All

Customize containerd parameters for a node pool

Elasticity strength tips are added for node pools

When a node pool is scaled out, the operation may fail due to insufficient instance inventory or because the ECS instance type is not supported in the specified zone. You can use the elasticity strength to evaluate the availability of the node pool configuration and the health of instance provisioning, and receive corresponding configuration suggestions.

All

View the elasticity strength of a node pool

Support for enabling batch job orchestration

Argo Workflows is a Kubernetes-native workflow engine that supports orchestrating parallel jobs through YAML or Python. It simplifies the automation and management of containerized applications and is suitable for scenarios such as CI/CD pipelines, data processing, and machine learning. You can enable batch job orchestration by installing the Argo Workflows component and use the Alibaba Cloud Argo CLI or the console to create and manage flow tasks.

All

Enable batch job orchestration

GPU fault detection

The ack-node-problem-detector component provided by ACK further enhances the monitoring capability for anomalous activities on cluster nodes based on the open source project node-problem-detector. The component provides a rich set of GPU-related fault detection items to enhance fault detection in GPU scenarios. When a fault is detected, a corresponding Kubernetes Event or Kubernetes Node Condition is generated based on the fault type.

All

GPU fault detection and automatic isolation

Distributed Cloud Container Platform for Kubernetes (ACK One)

Schedule and distribute multi-cluster Spark jobs based on actual remaining resources

This best practice describes how to use ACK One fleets and the ACK Koordinator component to schedule and distribute multi-cluster Spark jobs based on the actual remaining resources of each cluster (rather than requested resources). This maximizes the utilization of idle resources in multiple clusters and ensures the normal operation of online services through priority control and offline hybrid deployment.

All

Schedule and distribute multi-cluster Spark jobs based on actual remaining resources

Container Service for Edge (ACK Edge)

Support for adding pod vSwitches

In ENS edge scenarios, if an ACK Edge cluster uses the Terway Edge plug-in, you can add pod vSwitches to increase the IP address resources available to the cluster when the vSwitch has insufficient IP addresses or the pod CIDR block needs to be expanded.

All

Add a pod vSwitch

GPU resource monitoring

ACK Edge clusters can manage GPU nodes in data centers and at the edge, unifying the management of heterogeneous computing power across multiple regions and environments. You can connect an ACK Edge cluster to Alibaba Cloud Prometheus Monitoring to provide GPU nodes in data centers and at the edge with the same observability capabilities as those on the cloud.

All

Best practices for monitoring GPU resources in ACK Edge clusters

Cloud Native AI Suite

Deploy a DeepSeek distilled model inference service based on ACK

This topic uses the DeepSeek-R1-Distill-Qwen-7B model as an example to describe how to use KServe in Alibaba Cloud Container Service for Kubernetes (ACK) to deploy a production-ready DeepSeek distilled model inference service.

All

Deploy a DeepSeek distilled model inference service based on ACK

Best practice for deploying the full DeepSeek model for inference in a distributed multi-node deployment on ACK

This best practice describes a solution for distributed inference of the DeepSeek-R1-671B large model based on ACK. The solution uses a hybrid parallelism policy and the Alibaba Cloud Arena tool to achieve efficient distributed deployment on two nodes. It also describes how to seamlessly integrate the deployed DeepSeek-R1 into the Dify platform to quickly build an enterprise-level AI chat system that supports long-text understanding.

All

Practice for deploying a full-blood DeepSeek model for inference in a distributed multi-node deployment on ACK

January 2025

Product

Feature

Description

Region

References

Container Service for Kubernetes

Node pools support on-demand image acceleration

ACK supports on-demand loading of container images based on the Data Accelerator for Disaggregated Infrastructure (DADI) image acceleration technology. This eliminates the need for full image downloads and enables online decompression to significantly reduce application startup time.

All

Accelerate container startup using on-demand container image loading

Support for the Alibaba Cloud Linux 3 Container Optimized Edition operating system is added

Alibaba Cloud Linux 3 Container Optimized Edition (Alibaba Cloud Linux 3.2104 LTS 64-bit Container Optimized Edition) is an image version based on the default standard image of Alibaba Cloud Linux that is optimized for container scenarios. Based on extensive practical experience from numerous customers of Container Service for Kubernetes, Alibaba Cloud developed the Alibaba Cloud Linux 3 Container Optimized Edition image. This self-developed, cloud-native operating system is designed to meet the demands of container scenarios for higher deployment density, faster startup speeds, and stronger security isolation.

All

Support for Kubernetes 1.32

ACK now supports Kubernetes 1.32. You can create clusters that run Kubernetes 1.32 or upgrade existing clusters to Kubernetes 1.32.

All

Kubernetes 1.32

Support for improving resource utilization using ElasticQuotaTree and job queues

To allow different teams and jobs to share computing resources in a cluster while ensuring proper resource allocation and isolation, you can use ack-kube-queue, ElasticQuotaTree, and ack-scheduler to achieve flexible resource management.

All

None

New best practice for fine-grained control over cluster resources using resource groups

To more efficiently manage resources in Container Service for Kubernetes, you can use resource groups. Resource groups allow you to organize resources by dimensions such as department, project, or environment. When combined with Resource Access Management (RAM), this enables resource isolation and fine-grained permission management within a single Alibaba Cloud account.

All

Use resource groups for fine-grained resource control

Distributed Cloud Container Platform for Kubernetes (ACK One)

ACK One registered clusters can access ACS computing power

You can use the container computing power provided by ACS in an ACK One registered cluster.

All

Schedule pods to ACS using virtual nodes

Support for cross-cluster service access using native service domain names

ACK One multi-cluster services support cross-cluster service access using native service domain names through MultiClusterService. You can route cross-cluster traffic using native services directly, without modifying your service code, the DNSConfig configuration of your application pods, or the CoreDNS configuration.

All

Use native Service domain names for cross-cluster service access

Support for accessing multi-cluster resources using the Go SDK

If you want to integrate an ACK One fleet into your platform to access resources in sub-clusters, you can use the Go SDK.

All

Access multi-cluster resources using the Go SDK

Container Service for Edge (ACK Edge)

Support for scaling cloud nodes

When on-premises node resources are insufficient, the node autoscaling feature can automatically scale out cloud nodes for an ACK Edge cluster to supplement scheduling capacity.

All

Elasticity of cloud ECS nodes

Support for deploying hybrid cloud LLM elastic inference services

By installing the ack-kserve component and using the cloud elasticity feature of ACK Edge clusters, you can deploy hybrid cloud LLM elastic inference services. This lets you flexibly schedule cloud and on-premises resources and reduce the operating costs of LLM inference services.

All

Support for shared GPU scheduling

With shared GPU scheduling, you can schedule multiple pods to the same GPU card to share its computing resources. This improves GPU utilization and saves costs.

  • The cloud nodes of the ACK Edge cluster support the GPU sharing, GPU memory isolation, and computing power isolation features.

  • The edge node pools of the ACK Edge cluster support only GPU sharing. The GPU memory isolation and computing power isolation features are not supported.

All

Use shared GPU scheduling

Support for unified management of ECS resources across multiple regions

A new best practice is available. It describes how to use an ACK Edge cluster to centrally manage computing resources distributed across different regions. This enables full lifecycle management and efficient resource scheduling for cloud-native applications.

All

Centrally manage ECS resources across multiple regions

More information

For release notes for ACK before 2025, see Release notes (before 2025).