ACK Lingjun managed clusters provide a fully managed, high-availability Kubernetes control plane and serve as the cloud-native foundation for Platform for AI (PAI). They support heterogeneous compute resources and are optimized for high-performance computing (HPC) workloads such as AI training. For more information about ACK Lingjun managed clusters, see What is an ACK Lingjun managed cluster.
ACK Lingjun managed clusters are billed separately from other Intelligent Computing LINGJUN resources. For details, see Billing rules.
Prerequisites
Before you begin, ensure that you have:
Purchased the required Intelligent Computing LINGJUN resources, including compute nodes and a Lingjun connection instance. For details, see Purchase products
Purchased and configured the required cloud products: Cloud Enterprise Network (CEN), Application Real-Time Monitoring Service (ARMS), and Virtual Private Cloud (VPC). For details, see Purchase and configure other cloud products
Completed account identity verification and maintained a cash or credit balance of at least CNY 100
Step 1: Configure cluster and node groups
Plan node groups to organize compute nodes and improve resource utilization.
Log on to the Intelligent Computing LINGJUN console.
In the left navigation pane, choose Resources & Nodes > Cluster Management.
Click Create Cluster.
Click the PAI LINGJUN Cluster (including PAI, ACK, and CPFS) card.
Configure cluster information: enter the cluster name, root password for cluster nodes, and resource group. For details on resource groups, see Create a resource group.
Click Create Group to add a node group, then configure the following:
Group name and node information: specify the node model and runtime image.
Node instances: click Select Node Instances to choose the nodes to add to this group.
Click Save and Next: Network Configuration.
Step 2: Configure network
By default, clusters run in isolated network environments. To connect a cluster to the public cloud, link it to a Lingjun connection instance and CEN instance, and specify a VPC for monitoring.
Configure the cluster CIDR block
Enter a valid private CIDR block to assign IP addresses to compute nodes.
The cluster CIDR block must meet all of the following requirements:
Must not overlap with any connected VPC CIDR blocks or on-premises data center networks
Must not overlap with the monitoring VPC CIDR block
Use a mask length shorter than /22 to reserve enough IP addresses for future scaling. The number of available IP addresses in the CIDR block determines the maximum number of nodes in the cluster
The cluster subnet is a subnet of the cluster CIDR block. For details, see Manage Lingjun CIDR blocks.
(Optional) Configure subnet bond allocation policy
This policy controls how bond interfaces on physical network cards are allocated. Configure one of the following policy types — only one bond policy per cluster is allowed:
Bond policy: applies a single policy at the cluster level. Optionally configure a default bond that applies to interfaces without a specific policy, or select Apply to all to apply it to all interfaces.
Model policy: applies a bond policy to all node instances of a specific node model. Select the node model, then configure the bond policy. Create up to the number of groups in the cluster.
Node policy: applies a bond policy to individual node instances, allowing different bond ports on the same node to connect to different Lingjun CIDR blocks or subnets. Select the node instance, then configure the bond policy.
Node models have different numbers of bond interfaces. The cluster bond interface count is determined by the model with the most interfaces. Bond interfaces are named bond0, bond1, and so on (zero-based). For example, if a cluster has two models with 3 and 4 bond interfaces respectively, the cluster has 4 interfaces (bond0–bond3). The 3-interface model uses policies for bond0–bond2.
Configure a Lingjun connection instance
Click Authorize to grant the Lingjun connection instance the permissions it needs to access CEN and other cloud products. For details on the required permissions, see Appendix: Service-linked role for Lingjun connection instances.
Select the Lingjun connection instance ID to use for connecting the cluster to the public cloud.
Select the CEN instance to connect through the Lingjun connection instance.
The transit router in your CEN instance must be in the same region as the Lingjun nodes. For details, see Transit router instances.
Configure monitoring network
The monitoring network is a VPC that Lingjun uses to monitor the connectivity of the Lingjun connection instance.
Create a new VPC or connect an existing VPC to the transit router in the selected CEN instance. Make sure the vSwitch in the VPC has at least one available IP address. For details on enabling CEN, see Enable and configure CEN.
Important- Select a VPC from the drop-down list only after connecting it to the selected transit router. - The monitoring VPC CIDR block must not overlap with the cluster CIDR block or with any other connected networks (VPCs or on-premises data centers).
Click the
icon next to the VPC and vSwitch drop-down lists, then select the VPC and vSwitch.After configuring the cluster network, verify your CEN network configuration. For details, see Purchase and configure CEN.
Click Save and Next: Basic Software Instance Parameters.
Step 3: Configure basic parameters for software instances
Configure parameters for each software instance on the corresponding tab.
ACK tab: configure parameters for the ACK Lingjun managed cluster. For details on available parameters, see Create an ACK managed cluster.
ImportantThe Service CIDR block, Lingjun cluster CIDR block, internet CIDR block, and VPC CIDR block must not overlap.
CPFS tab: configure parameters for Cloud Parallel File Storage (CPFS). After the CPFS file system is created, view its instance information in the CPFS console.
PAI tab: configure parameters for PAI. For details on configuring ApsaraDB RDS, cloud storage, Container Registry (ACR) image repositories, and OAuth authentication, see Activate and configure other cloud products.
Click Save and Go to Next Step: Mappings Between Software Instances and Groups.
Step 4: Configure mappings between software instances and groups
ACK Lingjun managed clusters use Lingjun node pools to manage Lingjun compute nodes in groups. Node pools let you configure nodes, apply batch operations, control scheduling, and configure GPUs. For an overview of Lingjun node pools, see Overview of Lingjun node pools.
Click Create Node Pool to create an ACK node pool.
Set Node Pool Name, Maximum Number of Nodes, and other parameters.
Click Select Associated Group. In the dialog box, select the cluster groups to associate, then click OK.
Click Save and Go to Next Step: Confirm Configuration.
Step 5: Confirm and submit
On the Confirm Configuration page, review the basic cluster information, network configuration, mappings between software instances and groups, and software instance parameters. Click Submit Configuration to start creating the cluster.
In the Dependency Check section, click Complete Authorization to grant the required permissions to Container Service.