Once a model is trained, you can use Elastic Algorithm Service (EAS) to quickly deploy it as an online inference service or an AI web application. EAS supports heterogeneous resources and combines features like Automatic scaling, One-click stress testing, Canary release, and Real-time monitoring to ensure stable, continuous service in high-concurrency scenarios at a lower cost.
EAS features

Billing
Billing overview
When you use EAS to deploy services, you may be charged for computing resources, system disks, and dedicated gateways.
Computing resources: includes public resources, dedicated resources, and Lingjun resources.
(Optional) System disks: provides free quotas, including 30 GB for each instance created by using public resources and 200 GB for each instance created by using dedicated resources. You are charged for additional system disks.
(Optional) Dedicated gateways: By default, a free shared gateway is used. If you require security isolation, access control, or custom domain names, you can purchase a dedicated gateway. To use a dedicated gateway, you must manually configure it.
EAS provides the following billing methods:
Pay-as-you-go: You are charged based on service run time (not the number of requests). This billing method is suitable for uncertain and fluctuating demand scenarios.
Subscription: This billing method is more cost-effective. It is suitable for long-term stable business scenarios.
EAS provides Stable Diffusion web UI Serverless Edition and ComfyUI Serverless Edition that you can use to deploy services free of charge. You are charged only based on the actual inference duration when the service is called.
If you use other Alibaba Cloud services, such as Elastic IP Address (EIP), Object Storage Service (OSS), and File Storage NAS, fees are generated.
For more information, see Billing of Elastic Algorithm Service (EAS).
Usage workflow
Step 1: Prepare
Prepare inference resources
Choose the appropriate EAS resource type based on your model size, concurrency requirements, and budget. Dedicated EAS resources or Lingjun intelligent computing resources must be purchased before use. For more information about resource selection and purchase, see Overview of EAS deployment resources.
Prepare model and code files
Prepare your trained model, code files, and other dependencies. Upload these files to a designated cloud storage service, such as Object Storage Service (OSS). You can then access the data required for service deployment by using storage mounting.
Step 2: Deploy the service
Deployment tools: You can deploy and manage services by using the PAI-EAS console, the EASCMD command line, or an SDK.
Console: Provides custom deployment and scenario-based deployment methods. The console is user-friendly and ideal for beginners.
EASCMD command-line tool: Supports service creation, updates, viewing, and more. It is suitable for algorithm engineers familiar with EAS deployment.
SDK: Suitable for large-scale, unified scheduling and O&M.
Deployment methods: Supports image-based deployment (recommended) and Processor-based deployment. For the differences, see Deployment principles.
Step 3: Invoke and test the service
Deploy the model as a WebUI application: Open an interactive page in your browser from the console to directly experience the model's capabilities.
Deploy the model as an API service:
You can send HTTP requests using online service debugging to verify that the inference feature works as expected.
Make synchronous or asynchronous calls via an API. EAS supports multiple service invocation methods, including through a shared gateway, a dedicated gateway, and high-speed direct connections.
Use the built-in universal stress testing tool in EAS to perform One-click stress testing on the deployed service. This helps you evaluate the service's performance under pressure and understand its inference capacity. For more information about stress testing, see Automatic stress testing.
Step 4: Monitor and scale the service
After the service is running, enable service monitoring and alerting to stay informed about resource usage, performance metrics, and potential anomalies, ensuring the service runs smoothly.rvice runs smoothly.
Enable horizontal or scheduled auto-scaling to achieve real-time, dynamic management of online service compute resources. For more information, see Auto Scaling.
Step 5: Use asynchronous inference services
For time-consuming requests, such as text-to-image generation or video processing, enable Asynchronous inference services. A queue service receives requests, and after processing, the results are written to an output queue. The client then asynchronously queries the results. This prevents request backlogs and data loss, improving system throughput. EAS supports automatic scaling based on queue backlog to intelligently adjust the number of instances. For more information, see Asynchronous inference services.
Step 6: Update the service
In the inference service list, click Update in the Actions column of the target service to update the service version.
Service is temporarily interrupted during an update, which can cause dependent requests to fail. Proceed with caution.
After the service update is complete, click the current version to view Version Information or switch the service version.
Quick Start
Scenarios and examples
LLM: Deploy large language models (LLMs) | Deploy MoE models using expert parallelism and PD separation
AIGC (AI-Generated Content): AI video generation - ComfyUI deployment | AI art - SDWebUI deployment
Others: Best practices for accessing a dedicated gateway across VPCs using CEN | Best practices for PAI-EAS Spot instances
FAQ
Q: What's the difference between dedicated and public resources in EAS?
The primary difference lies in performance isolation, cost, and availability guarantees.
Public resources: Use these for development, testing, or small-scale applications where cost is a primary concern and some performance fluctuation is acceptable. These are shared resources, so you might experience resource contention during peak hours.
Dedicated resources: Use these for production-level, core business applications that demand high stability and performance. These resources are physically isolated, eliminating the risk of preemption. You must also purchase dedicated resources to lock in specific instance types with limited inventory.
The Elastic Resource Pool feature provides a hybrid approach: if your dedicated resources are fully utilized, EAS can automatically scale out to public resources to handle traffic spikes, balancing cost with service stability.
Q: Why should I use EAS instead of self-managing my model inference services?
EAS is a fully managed service that handles the operational overhead of deploying and maintaining model inference infrastructure.
By using EAS, you offload the following tasks:
Resource scheduling, fault recovery, and real-time monitoring.
Implementing complex features like auto-scaling and canary releases from scratch.
This allows your team to focus on model development rather than infrastructure management, which reduces O&M costs and accelerates time-to-market.
Q: How can I troubleshoot common errors when my EAS service fails?
For a comprehensive guide to diagnosing and resolving common deployment and runtime issues, see EAS FAQ.