All Products
Search
Document Center

Platform For AI:Create a resource group and purchase Lingjun resources

Last Updated:Apr 22, 2024

As a core component of the AI computing engine of Alibaba Cloud Machine Learning Platform for AI (PAI), Lingjun resources are designed for large-scale and high-density computing. Lingjun resources provide heterogeneous computing power tailor-made for high-performance AI training and computing. You can use Lingjun resources in Data Science Workshop (DSW), Deep Learning Containers (DLC), and Elastic Algorithm Service (EAS) to facilitate AI development, training, and service deployment. This topic describes how to create a resource group and purchase Lingjun resources.

Overview

Lingjun resource

Lingjun resources are the new-generation intelligent computing resources developed by Alibaba Cloud with the following highlights:

  • High-speed Remote Direct Memory Access (RDMA) network architecture

  • High-performance communication library

  • High-performance acceleration software

  • Technical solution of GPU virtualization

Lingjun resources can better meet your requirements for high-performance computing.

Lingjun resource group

PAI provides fully managed Lingjun resources that can be easily purchased and used in the PAI console. If you have purchased Lingjun hardware resources, you can add these resources to the PAI console as semi-managed resources in PAI for running training jobs.

Limits

  • Supported regions

    Lingjun resources are available only in the China (Ulanqab) and Singapore regions.

  • Supported users

    Only users in the whitelist can use Lingjun resources. If you want to use Lingjun resources to run training jobs, submit a ticket to apply for the qualification.

  • Supported job types

    Lingjun resources support only TensorFlow, PyTorch, ElasticBatch, and MPIJob training jobs.

Account and permission requirements

  • Alibaba Cloud account (recommended): You can use an Alibaba Cloud account to complete all operations without additional authorization.

  • RAM user: You must grant the AliyunPAIFullAccess permission to the RAM user. For more information, see Appendix: AliyunPAIFullAccess. In this case, the RAM user has all the permissions on PAI. Proceed with caution.

Dependencies

Lingjun resources depend on the following Alibaba Cloud services. To create, purchase, and use Lingjun resources, you need to be familiar with and activate these Alibaba Cloud services and prepare resources based on your business requirements.

VPC (required)

When you allocate Lingjun resources, you must associate the resources with a virtual private cloud (VPC) in the same region and configure a vSwitch and a security group. This ensures the network connectivity between the Lingjun resources and other Alibaba Cloud services.

Internet NAT gateway and EIP (optional)

Your Lingjun resources may need to access the Internet. For example, they may need to pull custom images from the Internet. In this case, you must configure an Internet NAT gateway with SNAT enabled and associate an elastic IP address (EIP) with the Internet NAT gateway.

For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.

OSS, NAS, and CPFS (optional)

To submit DLC training jobs to Lingjun resources, you must create datasets first. Lingjun resources supports only Object Storage Service (OSS), Apsara File Storage NAS (NAS), and Cloud Parallel File Storage (CPFS) datasets. For more information, see the Prepare a dataset section of the "General process" topic.

Procedure

Create a Lingjun resource group

  1. Go to the Resource Pool page in the PAI console.

  2. On the Intelligent Computing Lingjun resources tab, click Create Resource Group.

  3. In the Create Resource Group dialog box, configure the parameters described in the following table and click OK.

    Parameter

    Description

    Type

    Select Dedicated Resource Group.

    Resource Group Name

    Enter a resource group name based on the naming rule.

Purchase Lingjun resources

To purchase Lingjun resources for a dedicated resource group, perform the following steps. For more information about the billing of Lingjun resources, see Billing of Lingjun resources (Serverless Edition).

  1. On the Intelligent Computing Lingjun resources tab, click the name of the resource group that you want to manage.

  2. In the upper-right corner of the resource group details page, click Create Order.

  3. On the buy page, configure the parameters such as Node Specification, Nodes, and Duration. Then, click Buy Now.image

  4. After the payment is complete, the purchased Lingjun resources are displayed on the Orders tab of the resource group details page.image

References

After you create a resource group and purchase computing resources, you can perform the following operations:

  • On the resource group details page, view the basic information about the resource group and manage the purchased resources. For more information, see the Manage resources section of the "Overview" topic.

  • Allocate the purchased resources to specific training jobs by configuring resource quotas. For more information, see Lingjun resource quotas.