New Features

Platform for AI (PAI) - Announcing EAS Computing Power Detection and Fault Tolerance

EAS's computing power detection and fault tolerance feature comprehensively inspects all inference resources, automatically isolates faulty nodes, and triggers automated backend O&M workflows. This reduces the likelihood of issues during initial deployment and improves the success rate of inference services.
Content

Target customers: Customers using AI inference, large language model (LLM) serving, or AIGC applications. New feature: As the parameter count of Mixture-of-Experts (MoE) models grows from the hundreds of billions to the trillions, distributed deployment for large model inference is becoming increasingly common. You may encounter the following issues when you deploy distributed services on a large scale: Resource Failures Leading to Wasted Time and GPUs: A service can fail to start inference due to faulty underlying resources, even after a time-consuming initialization process like model loading. This requires manual investigation and redeployment, resulting in significant GPU resource waste. This leads to wasted GPU resources. Difficulty in Locating Performance Bottlenecks: Performance degradation during inference is often caused by "slow nodes" in the cluster, but there is a lack of efficient methods to quickly identify these problematic nodes. Lack of Convenient Benchmarking Tools: Additionally, there are no convenient and reliable benchmark tools to test the GPU computing power and network communication performance of instances within a resource group. To address these issues, EAS and AIMaster provide a SanityCheck feature to inspect the health and performance of computing resources before the actual deployment of a distributed inference service. You can enable this feature when creating an EAS distributed inference service. The health check inspects all participating resources, automatically isolates faulty nodes, and triggers automated backend O&M workflows, effectively reducing initial deployment failures and improving the overall success rate. After the check, a report on GPU computing power and communication performance is generated. This report helps you identify and locate issues that could degrade inference performance, thus improving overall diagnostic efficiency.

7th Gen ECS Is Now Available

Increase instance computing power by up to 40% and Fully equipped with TPM chips.
Powered by Third-generation Intel® Xeon® Scalable processors (Ice Lake).

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.