Service Deployment - Platform For AI - Alibaba Cloud Documentation Center

Deploy your models or algorithms as online inference services on EAS. Three deployment methods are available for different use cases.

How it works (image-based deployment)

Each EAS service runs in one or more isolated container instances. A deployment involves the following core components:

Image: A read-only template that contains the operating system, base libraries (such as CUDA), language runtime (such as Python), and required dependencies. Use an official PAI image, or build a custom image for specific requirements.
Code and model: Your business logic and model files. Store them in Object Storage Service (OSS) or Apsara File Storage NAS. This decouples your code and models from the runtime environment, so you can update them without rebuilding the image.
Storage mounting: At startup, EAS mounts your specified external storage paths to local directories in the container. Code inside the container can then access external files as if they were local.
Startup command: The first command to run after the container starts. Typically starts an HTTP server to receive inference requests.

The deployment workflow is as follows:

Pull the specified image and create a container.
Mount external storage to the specified paths in the container.
Run the startup command inside the container.
After the command runs successfully, listen on the specified port and process inference requests.

Note

EAS supports both image-based and Processor-based deployment. Image-based deployment is recommended because it offers greater flexibility and maintainability, whereas Deploy services using processors has more restrictions on runtime environments and frameworks.

Deployment methods

The EAS console provides three deployment methods. All three submit the same JSON service configuration and differ only in how the configuration is created.

Deployment method	Description	Use case	Documentation
Scenario-based model deployment	Uses prefilled templates. Only a few parameters need to be modified.	Quick deployment of popular models or tools such as OpenClaw, LLMs, and RAG.	Deploy pre-built AI services
Custom model deployment (console)	Fill in configuration forms in the console, which generates the JSON.	Interactive deployment of individual services. Covers most common configurations.	Deploy a custom inference service
Custom model deployment (JSON)	Edit the complete JSON configuration directly.	Batch or versioned deployments, CI/CD integration, or configurations not available in the console.	Deploy services with JSON

JSON configurations can also be deployed through eascmd client or PAI Python SDK.

Key configurations

In addition to choosing a deployment method, you typically configure the following when deploying a service:

Compute resources: Select a resource type, configure the system disk, and more. For details, see Resource configuration.
Storage mounting: Mount model files and data from external storage such as OSS and NAS to the service container. For details, see Storage mounts.
Network access: Configure the network environment for the service, such as VPC and public network access. For details, see EAS access to public and private resources.
Dynamic parameters: Configure key-value pairs that can be updated at runtime to adjust inference behavior without restarting the service. For details, see Configure dynamic parameters.