All Products
Search
Document Center

Container Service for Kubernetes:What is ACK Lingjun?

Last Updated:Oct 20, 2023

Container Service for Kubernetes (ACK) Lingjun managed clusters are developed based on Intelligent Computing Lingjun. ACK creates and maintains the control planes of ACK Lingjun managed clusters for you. This way, the applications in your cluster can use the high-performance computing capabilities that are provided by Intelligent Computing Lingjun. This topic introduces ACK Lingjun managed clusters, and describes the features and advantages of ACK Lingjun managed clusters.

Table of contents

Usage notes

To use ACK Lingjun managed clusters, you must first create a ACK Lingjun cluster service in the Intelligent Computing Lingjun console.

Product introduction

ACK Lingjun managed clusters provide fully-managed and highly-available control planes, and support efficient heterogeneous resource management and heterogeneous task scheduling. This type of cluster can be used as the cloud-native base of Machine Learning Platform for AI, and provides enhanced cloud-native capabilities that are suitable for AI scenarios and High Performance Computing (HPC) scenarios. The following figure shows the architecture of an ACK Lingjun managed cluster. In the architecture, software and hardware are decoupled. The architecture integrates with various Alibaba Cloud services and allows ACK Lingjun managed clusters to provide stable, reliable, efficient, and secure infrastructure services for cloud-native AI workloads.

Lingjun

Overview

  • Cluster management

    ACK Lingjun managed clusters and ACK Pro clusters provide the same cluster management capabilities. ACK creates and manages the control planes of ACK Lingjun managed clusters. By default, the control planes of an ACK Lingjun managed cluster are deployed across three zones to ensure high availability. You can manage the lifecycle of an ACK Lingjun managed cluster. For example, you can grant permissions on the cluster, monitor the cluster, update the cluster, and manage the components in the cluster.

  • Node management

    ACK Lingjun managed clusters provide Lingjun node pools where you can deploy Lingjun computing nodes. Lingjun node pools support lifecycle management and provide the same management and O&M features as Elastic Compute Service (ECS) node pools. For example, you can add or remove nodes in batches, configure nodes, maintain nodes, use fully-managed nodes, schedule applications to specified nodes, monitor nodes, diagnose nodes, and run automatic node O&M tasks.

  • Cloud-native AI

    By default, ACK Lingjun managed clusters provide components to enhance cloud-native capabilities. For example, ACK Lingjun managed clusters support topology-aware multi-GPU scheduling, and enable GPU scheduling and isolation based on eGPU, which is a GPU virtualization component for GPU-accelerated containers. ACK Lingjun managed clusters provide gang scheduling and capacity scheduling, and support the binpack scheduling policy. In addition, ACK Lingjun managed clusters support dataset orchestration and access acceleration.

Competitive advantages

  • Security and stability

    ACK Lingjun managed clusters provide the same enterprise-grade features as ACK Pro clusters and highly-available and managed control planes. This eliminates the need to manually build and configure clusters. ACK Lingjun managed clusters ensure the stability, reliability, and security of clusters and support service level agreements (SLAs) that contain compensation clauses. ACK Lingjun managed clusters can meet the requirements of enterprises in large-scale production environments.

  • Simplified O&M

    ACK Lingjun managed clusters provide Kubernetes-native services and are deeply integrated with Intelligent Computing Lingjun and relevant Alibaba Cloud services. ACK Lingjun managed clusters simplify operations and automate O&M for clusters and Lingjun computing nodes, provide the same management experience as ECS nodes, and significantly reduce adaption and O&M costs.

  • Improved efficiency and acceleration

    ACK Lingjun managed clusters provide GPU sharing, GPU scheduling, and topology-aware GPU scheduling to improve the efficiency and performance of heterogeneous resources. ACK Lingjun managed clusters provide rich scheduling policies and priority-based job queue management for AI and HPC tasks. These features can improve the execution efficiency of AI training jobs and inference tasks, and provide a unified and standard method to manage and deliver AI resources and workloads.