All Products
Search
Document Center

Platform For AI:Terms

Last Updated:Mar 14, 2024

This topic describes the basic terms in PAI-Lingjun AI Computing Service (Lingjun) to help you better understand and use this service. Lingjun allows you to plan multiple clusters based on your business requirements. You can divide the nodes in a cluster into node groups. This way, you can use node resources in an efficient manner.

image

Term

Description

cluster

A cluster is a collection of high-performance heterogeneous accelerated compute nodes equipped with the Lingjun optimization suite. The nodes in a cluster communicate with each other over high-speed and low-latency remote direct memory access (RDMA) networks with a bandwidth of 800 Gbit/s. You can use the native physical cluster services of Lingjun. You can also use the native services of Lingjun together with other Alibaba Cloud services.

node group

A node group is a collection of nodes that constitute a subset of a cluster. In most cases, a node group consists of one or more nodes with the same specifications or features. For example, you can add all GU100 nodes in a cluster to a node group.

node

Compute nodes are high-performance GPU servers accelerated by the Lingjun optimization suite. You can select the operating system of the compute nodes in a cluster. The CentOS 7.9 operating system is supported.

optimization suite

Lingjun provides an optimization suite that is applicable to clusters for large-scale parallel computing. The optimization suite consists of optimization components such as data loading optimization, collective communication optimization, computing resource optimization, and network optimization.