Elastic Algorithm Service (EAS) deploys trained models as online inference services or AI web applications. It combines heterogeneous resource support, automatic scaling, one-click stress testing, canary releases, and real-time monitoring to maintain service stability under high concurrency at lower cost.
Service architecture

Core capabilities
EAS covers resource management, model deployment, and service O&M.
Flexible resource and cost management
-
Heterogeneous hardware support: Supports CPUs, GPUs, and specialized AI accelerator instances for different model workloads.
-
Cost optimization: Use preemptible instances to reduce computing costs. Scheduled scaling lets you set policies based on business cycles for precise resource control.
-
Elastic resource pools: When a dedicated resource group reaches capacity, new instances automatically overflow to a public resource group, balancing stability and cost.
Comprehensive stability and high availability
-
Elastic scaling: Adjusts service replicas based on real-time load to handle traffic spikes and prevent resource idling or overload.
-
High-availability mechanism: Automatic fault recovery ensures service continuity. Dedicated resources provide physical isolation with no resource contention.
-
Safe releases: Canary releases route a percentage of traffic to a new version for validation. Traffic mirroring copies live traffic to a test service without affecting real requests.
Efficient deployment and O&M
-
One-click stress testing: Dynamically increases load to detect performance limits. Provides real-time second-level monitoring and stress test reports.
-
Real-time monitoring: Tracks key metrics such as QPS, response time, and CPU utilization. Supports monitoring alerts for service health.
-
Multiple deployment methods: Deploy services using a runtime image (recommended) or a processor to suit different technology stacks.
Diverse inference modes
-
Real-time synchronous inference: High throughput and low latency. Suitable for latency-sensitive scenarios such as search, recommendation, and chatbots.
-
Near real-time asynchronous inference: Built-in message queue for time-consuming tasks such as text-to-image generation and video processing. Supports automatic scaling based on queue backlog.
-
Offline batch inference: For latency-tolerant batch tasks such as voice data conversion. Supports preemptible instances to reduce costs.
How it works (runtime image deployment)
An EAS service runs in one or more isolated container instances. The startup process involves these core elements:
-
Runtime image: A read-only template containing an OS, base libraries (such as CUDA), a language environment (such as Python), and dependencies. Use an official PAI image or create a custom image.
-
Code and model: Your business logic and model files. Store them in Object Storage Service (OSS) or File Storage NAS to decouple code from the environment and update without rebuilding the image.
-
Storage mounting: At startup, EAS mounts the specified external storage path to a local directory in the container, making remote files accessible as local files.
-
Run command: The command executed after container startup, typically to start an HTTP service that receives inference requests.
Startup process:
-
The service pulls the specified runtime image to create a container.
-
It then mounts the external storage to the specified path in the container.
-
It then executes the run command inside the container.
-
After the command runs successfully, the service listens on the specified port and processes inference requests.
EAS supports two deployment methods: runtime image deployment and processor deployment. Runtime image deployment is recommended for its flexibility and maintainability. Processor deployment has known limitations regarding environments and frameworks.
Usage flow
Step 1: Preparations
-
Prepare inference resources: Select an EAS resource type based on your model size, concurrency requirements, and budget. For resource selection guidance, see Overview of EAS deployment resources.
NoteYou must purchase dedicated EAS resources or Lingjun resources before use.
-
Prepare files: Upload your trained model, code, and dependencies to a cloud storage service such as OSS. Access these files in your service through storage mounting.
Step 2: Deploy the service
Deploy and manage services using the console, the EASCMD command line, or an SDK.
-
Console: Provides custom deployment and scenario-based deployment. Suitable for beginners.
-
EASCMD command line: Create, update, and manage services. Suitable for algorithm engineers familiar with EAS.
-
SDK: Suitable for large-scale, unified scheduling and O&M.
Step 3: Invoke and stress test the service
-
Web application: If deployed as an AI-Web application, open the interactive page in a browser to test.
-
API service: Use online debugging to verify functionality, or make synchronous/asynchronous calls through the API. For more information, see Service invocation.
-
Service stress testing: Use the built-in one-click stress testing tool to evaluate performance under load. For more information, see Service stress testing.
Step 4: Monitor and manage the service
-
Monitoring and alerts: View service status in the Inference Services list. Enable service monitoring alerts to track health in real time.
-
Elastic scaling: Configure automatic scaling or scheduled scaling policies to dynamically manage compute resources.
-
Service updates: In the Actions column, click Update to deploy a new version. After the update, view version information or switch between versions.
WarningServices are temporarily interrupted during updates, which may cause dependent requests to fail. Proceed with caution.
Important notes
-
If an EAS service remains in a non-Running state for 180 consecutive days, the system automatically deletes it.
-
For information about the regions where EAS is available, see Regions and zones.
Billing
For billing details, see Billing of Elastic Algorithm Service (EAS).
Quick Start
For more information, see Deploy and call a model service on EAS.
Scenarios
-
LLM: Deploy a large language model (LLM) | Deploy an MoE model based on expert parallelism and PD separation
-
AIGC: AI video generation - Deploy ComfyUI | AI art - Deploy SD-WebUI
-
Others: Best practices for accessing a dedicated gateway across VPCs using CEN | Best practices for PAI-EAS Spot Instances
FAQ
Q: What is the difference between dedicated resources and public resources?
-
Public resources: Suitable for development, testing, or small-scale applications that are cost-sensitive and can tolerate performance fluctuations. Low-cost but may experience resource contention during peak hours.
-
Dedicated resources: Suitable for production services requiring high stability and performance. Physically isolated, which eliminates the risk of preemption. The elastic resource pool allows traffic to overflow to public resources when dedicated capacity is full, balancing cost and stability during peak hours. To reserve instance types with limited inventory, purchase them as dedicated resources.
Q: What are the advantages of EAS compared to self-managed services?
EAS provides managed O&M with automatic resource scheduling, fault recovery, monitoring, elastic scaling, and canary releases. This lets developers focus on model development, reducing O&M costs and accelerating time to market.