Why PAI-Lingjun Intelligent Computing Service
PAI-Lingjun Intelligent Computing Service is a PaaS service for large-scale deep learning and integrated intelligent computing. PAI-Lingjun Intelligent Computing Service provides both the Serverless Edition on the Alibaba Cloud public cloud and the Exclusive Edition. Based on the integrated optimization technology of software and hardware, PAI-Lingjun Intelligent Computing Service builds a high-performance heterogeneous computing base and AI capabilities for process engineering. PAI-Lingjun Intelligent Computing Service's core benefits includes high performance, efficiency, and utilization to meet the requirements of high-performance computing for large model training, autonomous driving, scientific research, finance, etc.
-
Serverless
Lingjun Serverless Edition can help you quickly set up and run AI computing tasks. It manages complex heterogeneous systems based on automatic operations and maintenance (O&M), and seamlessly integrates with Alibaba Cloud computing, storage, and network services.
-
High-Performance RDMA Network
Alibaba Cloud’s high-performance Remote Direct Memory Access (RDMA) networks greatly accelerate AI training, with high-speed and low-latency transmission at 800 Gbit/s and GPU direct connection technologies that improve transmission stability and security.
-
Efficient CPFS Storage System
Cloud Paralleled File System (CPFS) uses a fully parallel storage architecture and supports POSIX/MPI-IO and Network File System (NFS) protocols. A single cluster supports data throughput of up to 2 TB/s and 30 million IOPS, providing efficient and reliable storage services for AI training.
-
Comprehensive AI Acceleration
Our distributed training acceleration engine provides data set acceleration, computing acceleration, algorithm optimization, scheduling algorithms, and resource optimization. This ensures computing power is fully utilized, comprehensively improving the speed and efficiency of AI training and inference.
Editions
-
PAI-Lingjun Intelligent Computing Service Serverless Edition
The Serverless Edition provides a flexible and cost-effective option. You can purchase network and storage services based on your business needs and scale the services with a few clicks. After purchasing the compute nodes, you can achieve zero-cost O&M without the need to plan CPU management nodes.
-
Core Components:
-
GPUs for PAI-Lingjun Intelligent Computing Service
-
CPFS storage system
-
PAI-Lingjun Intelligent Computing Service Exclusive Edition
The Exclusive Edition allows you to create exclusive clusters on Alibaba Cloud - which provides an AI platform and O&M services exclusive for your business, and convenient operations management based on the standard and interconnected computing, storage, and network services on Alibaba Cloud.
-
Core Components:
-
GPUs for PAI-Lingjun Intelligent Computing Service
-
Lingjun Cloud Connection
-
CPFS Storage System
-
Container Service for Kubernetes (ACK) for Lingjun
-
Elastic Compute Service (ECS) Instances
-
ApsaraDB RDS
Features
Next-generation AI Computing Platform That Provides Large-scale AI Computing Power
Enterprise-Class AI Development Platform
Full-process AI engineering capabilities such as AI development and AI training, with support for AI role management, and computing resource management
One-Stop AI Computing Services
You can activate and manage compute clusters, high-performance storage systems, container services, and AI development platforms with a few clicks, as well as perform lifecycles management, and quickly run AI computing tasks with fully automated O&M.
Easy-to-Use Distributed Computing
Foundation model training tasks can be distributed to run automatically and concurrently after simple configurations. The optimized computing, network, communication, and storage architectures can improve resource utilization and accelerate model training, significantly reducing the costs and time.
Cluster Management
You can quickly create clusters in the console or by calling API operations, monitor clusters, troubleshoot errors of hosts and services in a visualized manner with a wide range of monitoring metrics, events, and statistics. You can also perform root cause analysis and performance tuning with associated diagnostics and analysis tools for hosts, networks, and tasks.
RDMA Network
High-performance RDMA computing, storage, and control networks enable high-performance and high-availability access to Alibaba Cloud services, with features including strong security isolation, minute-level deployment, continuous acceleration, and high reliability.
High-Performance Storage
The parallel I/O architecture improves storage performance. A single cluster supports data throughput of up to 2 TB/s and 30 million IOPS, and can communicate with cloud and on-premises storage systems.
How It Works
Overview
PAI-Lingjun Intelligent Computing Service supports AI development with serverless computing, and training tasks of foundation models such as Stable Diffusion, Llama 2, and Open Pre-trained Transformer (OPT). It provides highly optimized intelligent computing services for image processing (such as image generation based on generative AI), natural language processing (such as text generation based on generative AI), audio processing, and video processing, to improve AI training performance and efficiency.
Comprehensive Optimization for Higher Efficiency
-
Ultra-High Throughput and IOPS
For AI training tasks, data is pre-loaded to persistent storage to ensure high bandwidth for data loading and writing, improving training efficiency.
-
High Resource Utilization
Fine-grained segmentation and highly efficient scheduling of GPU resources facilitate collaborative development. The technology has been verified in large-scale applications during Double 11 Global Shopping Festival, with a 3-time increase in resource utilization.
Overview
The ultra-large-scale integrated computing power supports the unified deployment and scheduling of deep learning and high-performance computing tasks. This also provides unified standard computing services for scientific research, medical R&D, and engineering simulation, etc., promoting innovation, improving efficiency, and facilitating the integration of AI and HPC ecosystems.
Integrated Development for Innovation
-
Support for New Scientific Research
Lingjun supports cloud-native AI and HPC application development, and provides unified computing services for scientific research, medical R&D, and engineering simulation. This improves cross-region collaboration, resource utilization, and the integration of technical ecosystems.
-
Comprehensive Platform for Scientific Research
Based on RDMA network and Alibaba Cloud's high-performance communication library, Lingjun reduces point-to-point latency of AI and HPC applications to 2 microseconds, and enables up to tens of thousands of compute nodes to run in parallel, significantly improving the efficiency of large-scale scientific computing.
Lingjun accelerates Alibaba Cloud Generative AI
User Guide
- You can buy compute nodes of the required type based on your business requirements.
- 1. Log on to the Lingjun Intelligent Computing Service console.
- 2. In the left-side navigation pane, choose Cluster and Node > Node Management.
- 3. Click Buy a new node to go to the purchase page, then choose product specifications.
- A node cluster in Lingjun can be connected via only one Lingjun Cloud Connection instance to the public cloud.
- 1. Log on to the Lingjun Intelligent Computing Service console and go to the purchase page of Lingjun Cloud Connection.
- 2. Choose specifications, then click Buy Now and complete the payment as prompted.
- Our technical experts will help you complete network configuration after purchase.
- 1. Go to the purchase page of CPFS storage system.
- 2. Choose specifications, then click Buy Now and complete the payment as prompted.
- Note: The CPFS storage system must be in the same region as Lingjun Intelligent Computing Service. CPFS storage will separately billed.
Upgraded Support For You
1 on 1 Presale Consultation, 24/7 Technical Support, Faster Response, and More Free Tickets.