PAI-Lingjun Intelligent Computing Service

0.0.201

A comprehensive AI computing platform for high-performance computing tasks, such as foundation model training

Contact Sales

Please contact sales for purchasing and consultation.

Overview
Editions
Features
Scenarios
User Guide

Overview

Overview
Editions
Features
Scenarios
User Guide

Why PAI-Lingjun Intelligent Computing Service

PAI-Lingjun Intelligent Computing Service is a PaaS service for large-scale deep learning and integrated intelligent computing. PAI-Lingjun Intelligent Computing Service provides both the Serverless Edition on the Alibaba Cloud public cloud and the Exclusive Edition. Based on the integrated optimization technology of software and hardware, PAI-Lingjun Intelligent Computing Service builds a high-performance heterogeneous computing base and AI capabilities for process engineering. PAI-Lingjun Intelligent Computing Service's core benefits includes high performance, efficiency, and utilization to meet the requirements of high-performance computing for large model training, autonomous driving, scientific research, finance, etc.

Serverless

Lingjun Serverless Edition can help you quickly set up and run AI computing tasks. It manages complex heterogeneous systems based on automatic operations and maintenance (O&M), and seamlessly integrates with Alibaba Cloud computing, storage, and network services.

High-Performance RDMA Network

Alibaba Cloud’s high-performance Remote Direct Memory Access (RDMA) networks greatly accelerate AI training, with high-speed and low-latency transmission at 800 Gbit/s and GPU direct connection technologies that improve transmission stability and security.

Efficient CPFS Storage System

Cloud Paralleled File System (CPFS) uses a fully parallel storage architecture and supports POSIX/MPI-IO and Network File System (NFS) protocols. A single cluster supports data throughput of up to 2 TB/s and 30 million IOPS, providing efficient and reliable storage services for AI training.

Comprehensive AI Acceleration

Our distributed training acceleration engine provides data set acceleration, computing acceleration, algorithm optimization, scheduling algorithms, and resource optimization. This ensures computing power is fully utilized, comprehensively improving the speed and efficiency of AI training and inference.

Editions

: PAI-Lingjun Intelligent Computing Service Serverless Edition

The Serverless Edition provides a flexible and cost-effective option. You can purchase network and storage services based on your business needs and scale the services with a few clicks. After purchasing the compute nodes, you can achieve zero-cost O&M without the need to plan CPU management nodes.

Core Components:
GPUs for PAI-Lingjun Intelligent Computing Service
CPFS storage system

: PAI-Lingjun Intelligent Computing Service Exclusive Edition

The Exclusive Edition allows you to create exclusive clusters on Alibaba Cloud - which provides an AI platform and O&M services exclusive for your business, and convenient operations management based on the standard and interconnected computing, storage, and network services on Alibaba Cloud.

Core Components:
GPUs for PAI-Lingjun Intelligent Computing Service
Lingjun Cloud Connection
CPFS Storage System
Container Service for Kubernetes (ACK) for Lingjun
Elastic Compute Service (ECS) Instances
ApsaraDB RDS

Features

Next-generation AI Computing Platform That Provides Large-scale AI Computing Power

Enterprise-Class AI Development Platform

Full-process AI engineering capabilities such as AI development and AI training, with support for AI role management, and computing resource management

One-Stop AI Computing Services

You can activate and manage compute clusters, high-performance storage systems, container services, and AI development platforms with a few clicks, as well as perform lifecycles management, and quickly run AI computing tasks with fully automated O&M.

Easy-to-Use Distributed Computing

Foundation model training tasks can be distributed to run automatically and concurrently after simple configurations. The optimized computing, network, communication, and storage architectures can improve resource utilization and accelerate model training, significantly reducing the costs and time.

Cluster Management

You can quickly create clusters in the console or by calling API operations, monitor clusters, troubleshoot errors of hosts and services in a visualized manner with a wide range of monitoring metrics, events, and statistics. You can also perform root cause analysis and performance tuning with associated diagnostics and analysis tools for hosts, networks, and tasks.

RDMA Network

High-performance RDMA computing, storage, and control networks enable high-performance and high-availability access to Alibaba Cloud services, with features including strong security isolation, minute-level deployment, continuous acceleration, and high reliability.

High-Performance Storage

The parallel I/O architecture improves storage performance. A single cluster supports data throughput of up to 2 TB/s and 30 million IOPS, and can communicate with cloud and on-premises storage systems.

How It Works

Overview

PAI-Lingjun Intelligent Computing Service supports AI development with serverless computing, and training tasks of foundation models such as Stable Diffusion, Llama 2, and Open Pre-trained Transformer (OPT). It provides highly optimized intelligent computing services for image processing (such as image generation based on generative AI), natural language processing (such as text generation based on generative AI), audio processing, and video processing, to improve AI training performance and efficiency.

Comprehensive Optimization for Higher Efficiency

Ultra-High Throughput and IOPS

For AI training tasks, data is pre-loaded to persistent storage to ensure high bandwidth for data loading and writing, improving training efficiency.
High Resource Utilization

Fine-grained segmentation and highly efficient scheduling of GPU resources facilitate collaborative development. The technology has been verified in large-scale applications during Double 11 Global Shopping Festival, with a 3-time increase in resource utilization.

Overview

The ultra-large-scale integrated computing power supports the unified deployment and scheduling of deep learning and high-performance computing tasks. This also provides unified standard computing services for scientific research, medical R&D, and engineering simulation, etc., promoting innovation, improving efficiency, and facilitating the integration of AI and HPC ecosystems.

Integrated Development for Innovation

Support for New Scientific Research

Lingjun supports cloud-native AI and HPC application development, and provides unified computing services for scientific research, medical R&D, and engineering simulation. This improves cross-region collaboration, resource utilization, and the integration of technical ecosystems.
Comprehensive Platform for Scientific Research

Based on RDMA network and Alibaba Cloud's high-performance communication library, Lingjun reduces point-to-point latency of AI and HPC applications to 2 microseconds, and enables up to tens of thousands of compute nodes to run in parallel, significantly improving the efficiency of large-scale scientific computing.

Lingjun accelerates Alibaba Cloud Generative AI

Learn More

User Guide

Buy Compute Nodes

Contact Sales

You can buy compute nodes of the required type based on your business requirements.
1. Log on to the Lingjun Intelligent Computing Service console.
2. In the left-side navigation pane, choose Cluster and Node > Node Management.
3. Click Buy a new node to go to the purchase page, then choose product specifications.

Buy Lingjun Cloud Connection

Contact Sales

A node cluster in Lingjun can be connected via only one Lingjun Cloud Connection instance to the public cloud.
1. Log on to the Lingjun Intelligent Computing Service console and go to the purchase page of Lingjun Cloud Connection.
2. Choose specifications, then click Buy Now and complete the payment as prompted.

Buy CPFS Storage

Contact Sales

Our technical experts will help you complete network configuration after purchase.
1. Go to the purchase page of CPFS storage system.
2. Choose specifications, then click Buy Now and complete the payment as prompted.
Note: The CPFS storage system must be in the same region as Lingjun Intelligent Computing Service. CPFS storage will separately billed.

Upgraded Support For You

1 on 1 Presale Consultation, 24/7 Technical Support, Faster Response, and More Free Tickets.

1 on 1 Presale Consultation

Consulting by experienced cloud experts.Learn More

24/7 Technical Support

Extended service time from 10 hours 5 days a week to 24/7. Learn More

6 Free Tickets per Quarter

The number of free tickets doubled from 3 to 6 per quarter. Learn More

Faster Response

Shorten after-sale response time from 36 hours to 18 hours. Learn More

Why PAI-Lingjun Intelligent Computing Service

Serverless

High-Performance RDMA Network

Efficient CPFS Storage System

Comprehensive AI Acceleration

Editions