All Products
Search
Document Center

Container Service for Kubernetes:ack-arena

Last Updated:Mar 26, 2026

ack-arena is the ACK distribution of Arena, a command-line tool for managing AI job lifecycle on Kubernetes. It supports TensorFlow, PyTorch, Horovod, Spark, JupyterLab, TF-Serving, and Triton Inference Server, and provides SDKs for Golang, Java, and Python.

As part of the cloud-native AI suite, ack-arena abstracts the full AI production pipeline—from data preparation and model development through model training, evaluation, inference, and online operations—so you can submit and run AI jobs without managing the underlying infrastructure.

Usage notes

ack-arena can be installed only on ACK Pro, ACK Serverless Pro, and ACK Edge Pro clusters running Kubernetes 1.18 or later. Install it directly from the Container Service for Kubernetes (ACK) console. For setup instructions, see Configure the Arena client.

Release notes

March 2025

VersionImageRelease dateImpact
0.14.2registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.14.2-aliyun-d4972322025-03-10No impact on workloads

Bug fixes

  • PyTorchJob worker pod init container: requests and limits are now set to the same value.

February 2025

VersionImageRelease dateImpact
0.14.1registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.14.1-aliyun-19abf192025-02-24No impact on workloads

Bug fixes

  • Fixed an issue where devices did not support Kubernetes resource quantities.

  • Fixed an issue where PyTorchJob did not support backoff limit.

  • Fixed an issue where NVIDIA_VISIBLE_DEVICES was not set when GPU sharing was enabled.

January 2025

VersionImageRelease dateImpact
0.13.1registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.13.1-aliyun-ce9c5f32025-01-13No impact on workloads

New features

  • Added Linux/arm64 support to tf-operator, pytorch-operator, cron-operator, and et-operator.

December 2024

VersionImageRelease dateImpact
0.13.0registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.13.0-aliyun-f098f1a2024-12-23No impact on workloads

New features

  • PyTorchJob now supports torchrun.

Improvements

  • Querying PyTorchJob information no longer triggers list job and StatefulSet operations, reducing API overhead.

November 2024

VersionImageRelease dateImpact
0.12.1registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.12.1-aliyun.02024-11-25No impact on workloads
0.12.0registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.12.0-aliyun.02024-11-11No impact on workloads

New features (0.12.0)

  • RayJob submission is now supported.

  • Distributed inference job submission is now supported.

New features (0.12.1)

  • MPIJob training jobs now support common-type devices.

Bug fixes (0.12.1)

  • Fixed clean pod policy issues in tf-operator.

  • Fixed a rendering issue when elastic training jobs used an on-premises logging directory.

  • Fixed an issue where cron-operator failed to clean up completed jobs.

October 2024

VersionImageRelease dateImpact
0.10.1registry-cn-hangzhou.ack.aliyuncs.com/acs/arena-deploy-manager:0.10.1-aliyun.02024-10-14No impact on workloads

New features

  • Multiple device types are now supported.

  • TFJob now supports successPolicy.

Bug fixes

  • Fixed an issue where SparkApplication submission failed.

April 2024

VersionImageRelease dateImpact
0.9.14registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.14-adb43b82024-04-11No impact on workloads

New features

  • Model management is now supported.

March 2024

VersionImageRelease dateImpact
0.9.13registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.13-5ac396c2024-03-18No impact on workloads

New features

  • Added the backend parameter to Triton inference services.

  • The mounted directory of a KServe inference service can now be updated.

February 2024

VersionImageRelease dateImpact
0.9.12registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.12-a707f812024-02-04No impact on workloads

Improvements

  • Updated the base image of Triton Inference Server.

  • Improved training-operator custom resource definition (CRD) compatibility.

November 2023

VersionImageRelease dateImpact
0.9.11registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.11-ce87d102023-11-17No impact on workloads

New features

  • KServe inference service deployment is now supported.

  • livenessProbe and readinessProbe can now be configured for inference services.

August 2023

VersionImageRelease dateImpact
0.9.10registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.10-4b5c18c2023-08-02No impact on workloads

New features

  • An SSH secret can be created when submitting an elastic or DeepSpeed training job.

Improvements

  • Permissions to the et-operator Secret are removed by default and can be granted manually.

June 2023

VersionImageRelease dateImpact
0.9.9registry.cn-beijing.aliyuncs.com/acs/arena-deploy-manager:0.9.9-ce4a78d2023-06-29No impact on workloads

New features

  • DeepSpeed distributed training job submission is now supported.

  • imagePullPolicy is now configurable.

May 2023

VersionImageRelease dateImpact
0.9.8registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.7-d51fe2e2023-05-23No impact on workloads

New features

  • SDKs now support specifying a cleanup time for completed jobs.

Improvements

  • Role-Based Access Control (RBAC) permissions are restricted.

April 2023

VersionImageRelease dateImpact
0.9.7registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.7-d51fe2e2023-04-11No impact on workloads
0.9.6registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.6-b3c2c7f2023-04-04No impact on workloads

New features (0.9.7)

  • The completion time of scheduled jobs can now be specified.

New features (0.9.6)

  • The ownerreference parameter can now be set when submitting TensorFlow or PyTorch training jobs.

Improvements (0.9.6)

  • Updated the et-operator image.

March 2023

VersionImageRelease dateImpact
0.9.5registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.5-c3948e22023-03-16No impact on workloads

New features

  • running-timeout, starting-timeout, and ttl-after-finished are now configurable for TensorFlow training jobs.

  • running-timeout and ttl-after-finished are now configurable for PyTorch training jobs.

  • jobsupervisor charts are now supported.

Improvements

  • SDK for Java updated to 1.0.4.

  • Updated images for tf-operator, pytorch-operator, and et-operator.

Bug fixes

  • Fixed inconsistent gang pod label formatting.