ack-arena is the ACK distribution of Arena, a command-line tool for managing AI job lifecycle on Kubernetes. It supports TensorFlow, PyTorch, Horovod, Spark, JupyterLab, TF-Serving, and Triton Inference Server, and provides SDKs for Golang, Java, and Python.
As part of the cloud-native AI suite, ack-arena abstracts the full AI production pipeline—from data preparation and model development through model training, evaluation, inference, and online operations—so you can submit and run AI jobs without managing the underlying infrastructure.
Usage notes
ack-arena can be installed only on ACK Pro, ACK Serverless Pro, and ACK Edge Pro clusters running Kubernetes 1.18 or later. Install it directly from the Container Service for Kubernetes (ACK) console. For setup instructions, see Configure the Arena client.
Release notes
March 2025
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.14.2 | registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.14.2-aliyun-d497232 | 2025-03-10 | No impact on workloads |
Bug fixes
PyTorchJob worker pod init container: requests and limits are now set to the same value.
February 2025
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.14.1 | registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.14.1-aliyun-19abf19 | 2025-02-24 | No impact on workloads |
Bug fixes
Fixed an issue where devices did not support Kubernetes resource quantities.
Fixed an issue where PyTorchJob did not support backoff limit.
Fixed an issue where
NVIDIA_VISIBLE_DEVICESwas not set when GPU sharing was enabled.
January 2025
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.13.1 | registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.13.1-aliyun-ce9c5f3 | 2025-01-13 | No impact on workloads |
New features
Added Linux/arm64 support to tf-operator, pytorch-operator, cron-operator, and et-operator.
December 2024
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.13.0 | registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.13.0-aliyun-f098f1a | 2024-12-23 | No impact on workloads |
New features
PyTorchJob now supports torchrun.
Improvements
Querying PyTorchJob information no longer triggers list job and StatefulSet operations, reducing API overhead.
November 2024
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.12.1 | registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.12.1-aliyun.0 | 2024-11-25 | No impact on workloads |
| 0.12.0 | registry-cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.12.0-aliyun.0 | 2024-11-11 | No impact on workloads |
New features (0.12.0)
RayJob submission is now supported.
Distributed inference job submission is now supported.
New features (0.12.1)
MPIJob training jobs now support common-type devices.
Bug fixes (0.12.1)
Fixed clean pod policy issues in tf-operator.
Fixed a rendering issue when elastic training jobs used an on-premises logging directory.
Fixed an issue where cron-operator failed to clean up completed jobs.
October 2024
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.10.1 | registry-cn-hangzhou.ack.aliyuncs.com/acs/arena-deploy-manager:0.10.1-aliyun.0 | 2024-10-14 | No impact on workloads |
New features
Multiple device types are now supported.
TFJob now supports
successPolicy.
Bug fixes
Fixed an issue where SparkApplication submission failed.
April 2024
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.14 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.14-adb43b8 | 2024-04-11 | No impact on workloads |
New features
Model management is now supported.
March 2024
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.13 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.13-5ac396c | 2024-03-18 | No impact on workloads |
New features
Added the
backendparameter to Triton inference services.The mounted directory of a KServe inference service can now be updated.
February 2024
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.12 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.12-a707f81 | 2024-02-04 | No impact on workloads |
Improvements
Updated the base image of Triton Inference Server.
Improved training-operator custom resource definition (CRD) compatibility.
November 2023
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.11 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.11-ce87d10 | 2023-11-17 | No impact on workloads |
New features
KServe inference service deployment is now supported.
livenessProbeandreadinessProbecan now be configured for inference services.
August 2023
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.10 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.10-4b5c18c | 2023-08-02 | No impact on workloads |
New features
An SSH secret can be created when submitting an elastic or DeepSpeed training job.
Improvements
Permissions to the et-operator Secret are removed by default and can be granted manually.
June 2023
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.9 | registry.cn-beijing.aliyuncs.com/acs/arena-deploy-manager:0.9.9-ce4a78d | 2023-06-29 | No impact on workloads |
New features
DeepSpeed distributed training job submission is now supported.
imagePullPolicyis now configurable.
May 2023
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.8 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.7-d51fe2e | 2023-05-23 | No impact on workloads |
New features
SDKs now support specifying a cleanup time for completed jobs.
Improvements
Role-Based Access Control (RBAC) permissions are restricted.
April 2023
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.7 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.7-d51fe2e | 2023-04-11 | No impact on workloads |
| 0.9.6 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.6-b3c2c7f | 2023-04-04 | No impact on workloads |
New features (0.9.7)
The completion time of scheduled jobs can now be specified.
New features (0.9.6)
The
ownerreferenceparameter can now be set when submitting TensorFlow or PyTorch training jobs.
Improvements (0.9.6)
Updated the et-operator image.
March 2023
| Version | Image | Release date | Impact |
|---|---|---|---|
| 0.9.5 | registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.5-c3948e2 | 2023-03-16 | No impact on workloads |
New features
running-timeout,starting-timeout, andttl-after-finishedare now configurable for TensorFlow training jobs.running-timeoutandttl-after-finishedare now configurable for PyTorch training jobs.jobsupervisor charts are now supported.
Improvements
SDK for Java updated to 1.0.4.
Updated images for tf-operator, pytorch-operator, and et-operator.
Bug fixes
Fixed inconsistent gang pod label formatting.