New Features

Platform for AI (PAI) - EAS launches Prefill-Decode (PD) separation feature

EAS now supports Prefill-Decode (PD) separation, offering both static and dynamic deployment modes. It supports multiple inference engines including vLLM, SGLang, and BladeLLM, enabling significant reduction in inference latency, especially for large LLM workloads.
Content

Target customers: Designed for customers building LLM-driven applications and services on the EAS platform. 1. Enterprises running high-traffic consumer-facing applications: Improved user experience with reduced Time to First Token (TTFT) and Time Per Output Token (TPOT). 2. Organizations processing long-context workloads: Reduced end-to-end latency for long input sequences. New Feature/Specification: EAS supports enabling Prefill-Decode (PD) separation during LLM service deployment. This feature divides the inference task into two independent phases, Prefill and Decode, and allocates them to their own computing resources for execution. This significantly improves system throughput while meeting strict latency requirements.

7th Gen ECS Is Now Available

Increase instance computing power by up to 40% and Fully equipped with TPM chips.
Powered by Third-generation Intel® Xeon® Scalable processors (Ice Lake).

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.