New Features

Platform for AI (PAI) - The model weight service feature is released.

The Model Weight Service dramatically reduces cold-start and scale-out latency, addressing the long-standing industry challenge of slow model loading and overcoming the efficiency bottleneck in ultra-large-scale LLM deployments.
Content

Target customers: Organizations leveraging AI inference, LLM serving, or AIGC applications. New Feature / Specification: With the exponential growth in the number of large language model (LLM) parameters, for example, open source models have DeepSeekV3-671B reached more than 700GB in size. The model loading time has become a key bottleneck that affects the efficiency of inference services. In scenarios such as elastic scaling and multi-instance deployment, the model loading time is too long, which affects the agility of service expansion and the efficiency of model deployment. To address these challenges, Alibaba Cloud’s PAI platform has innovatively launched the Model Weight Service, which significantly reduces cold-start and scale-out latency, resolves the industry-wide issue of slow model loading, and overcomes the efficiency bottleneck in ultra-large-scale LLM deployments. The core technical features of PAI’s Model Weight Service include: Distributed caching architecture: Leverages node memory to build a shared model weight cache pool. High-speed transfer mechanism: RDMA-based inter-node communication enables low-latency data transmission. Intelligent sharding strategy: An algorithm that dynamically adapts to network topology for optimal weight sharding. Memory-sharing optimization: Enables zero-copy weight sharing across multiple processes on a single node. Intelligent weight prefetching: Proactively loads model weights during idle periods. Efficient caching policy: Ensures balanced distribution of model shards across instances. Real-world deployments demonstrate significant efficiency gains in ultra-large-scale instance clusters. For the traditional pull mode, the expansion speed is increased by 10 times; the bandwidth utilization is increased by 60% +; the service cold start time is shortened to seconds.

Help Document

https://www.alibabacloud.com/help/en/pai/user-guide/model-weight-service?spm=a3c0i.23458820.2359477120.1.6c136e9b9IA5u2

7th Gen ECS Is Now Available

Increase instance computing power by up to 40% and Fully equipped with TPM chips.
Powered by Third-generation Intel® Xeon® Scalable processors (Ice Lake).

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.