Online prediction service deployment is a key component for applying algorithm models to businesses. To help users apply algorithms from end to end, Elastic Algorithm Service (EAS) supports online prediction service deployment, which allows you to load CPU/GPU compute-based models and handle service requests in real time.
PAI EAS allows you to deploy a model online as a Restful API. You can then send HTTP requests to call the service. PAI EAS supports elastic scaling and blue-green deployment. You can deploy high-concurrent and stable online algorithm models at a lower resource cost. PAI EAS also supports resource group management, version management, and resource monitoring. You can use these features to apply models to your businesses. Both the public offering of Alibaba Cloud and Apsara Stack support PAI EAS.
In the public offering of Alibaba Cloud, PAI EAS is billed based on the amount of resources that you used. Model services can be deployed in a public resource group and a dedicated resource group. In a public resource group, fees are charged based on the amount of resources used by each model service. In a dedicated resource group, fees are charged based on billing method of the node resources in the group: pay-as-you-go or subscription.
Currently, PAI EAS in the public offering of Alibaba Cloud supports the following regions:
China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), Singapore, Germany (Frankfurt), India (Mumbai), and Indonesia (Jakarta).
- Resource groups: in the PAI EAS, cluster resources in different resource groups are isolated. When you create an online model service, you can deploy the service in a default public resource group or dedicated resource group.
- Model service: a service, made up of model files and online prediction logic. You can create, update, stop, start, scale in, and scale out model services.
- Model file: a model generated through offline model training. The model file format varies with frameworks. In most cases, model and processor files are deployed together.
- Processor: a program package that contains online prediction logic. PAI EAS has provided built-in official processors for PMML, TensorFlow (SavedModel), and Caffe models.
- Custom processor: If built-in processors cannot meet your service deployment requirements, you can use custom processors to expand the deployment. Custom processors support C++, Java, and Python.
- Service instance: each model service can have more than one instance deployed to handle more concurrent requests. If resource groups contain multiple node resources, PAI EAS can automatically distribute different instances to different nodes to ensure high service availability.
- High-speed direct connection: after your dedicated resource group is connected with your VPC, you can access each instance through the client in your VPC.
In addition to deploying models in the public resource pool provided by EAS, you can also create and manage resource groups. For more information about resource groups, see Use resource groups.
In the PAI EAS console page, you can manage model services:
view model call information,
debug services online,
view log, monitoring, and service deployment information,
scale in/out a service, start/stop a service, and delete a service.
In order to help more developers use the model service deployment function, all operations related to service deployment can be performed with EASCMD.
- If you want to call a deployed service over the public network, you must activate Alibaba Cloud API Gateway Service . For pricing details, see API gateway pricing.
- For services that need to be called over the public network, you must bind your domain name in the API gateway. Otherwise, you can only make up to 1000 calls per day over the public network.