Define and deploy an online inference service in EAS by using a JSON configuration file.
Quick start
1. Prepare a JSON file
A JSON file with all configurations is required to deploy a service. First-time users can auto-generate this file by configuring parameters under Custom Model Deployment > Custom Deployment, and then modify the generated JSON as needed.
Example service.json file. For all parameters, see Appendix: JSON Parameter Reference.
{
"metadata": {
"name": "demo",
"instance": 1,
"workspace_id": "your-workspace-id"
},
"cloud": {
"computing": {
"instances": [
{
"type": "ecs.c7a.large"
}
]
}
},
"containers": [
{
"image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/python-inference:py39-ubuntu2004",
"script": "python app.py",
"port": 8000
}
]
}
2. Deploy the service
-
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
-
On the Inference Service tab, click Deploy Service. In the Custom Model Deployment section, select JSON Deployment.
-
Paste the JSON content and click Deploy. Deployment succeeds when the service status changes to running.
Appendix: JSON parameters
|
Parameter |
Required |
Description |
|
metadata |
Yes |
Service metadata. See metadata parameter description. |
|
cloud |
No |
Compute and VPC resource configuration. See cloud parameter description. |
|
containers |
No |
Image configuration. See containers parameter description. |
|
dockerAuth |
No |
Required to access a private repository that requires authentication. The value is a Base64-encoded string of |
|
networking |
No |
Service invocation configuration. See networking parameter description. |
|
storage |
No |
Mounts data from sources such as OSS or NAS into the container. See storage mount. |
|
token |
No |
The access token for service authentication. If omitted, the system generates one automatically. |
|
aimaster |
No |
Enables computing power check and fault tolerance for multi-node distributed inference. |
|
model_path |
Yes |
Required when deploying a service with a processor. The model_path and processor_path parameters specify the data source paths for the model and processor. Both parameters support the following path formats:
|
|
oss_endpoint |
No |
The OSS endpoint. Example: oss-cn-beijing.aliyuncs.com. For other valid values, see Regions and endpoints. Note
By default, this parameter is not required. The system uses the internal OSS endpoint in the current region to download the model or processor files. Specify this parameter when accessing OSS across regions. For example, if the service is deployed in the China (Hangzhou) region and model_path specifies an OSS path in the China (Beijing) region, set this parameter to the public OSS endpoint of the China (Beijing) region. |
|
model_entry |
No |
The entry file for the model. Can be any file. If not specified, the filename in model_path is used. The entry file path is passed to the initialize() function in the Processor. |
|
model_config |
No |
Model configuration. Supports any text. The value is passed to the second parameter of the initialize() function in the Processor. |
|
processor |
No |
|
|
processor_path |
No |
The path to the processor package. See the model_path parameter description. |
|
processor_entry |
No |
The main file of the Processor, such as libprocessor.so or app.py, which contains the implementations of the This parameter is required if processor_type is set to cpp or python. |
|
processor_mainclass |
No |
The processor's main class in the JAR package, for example, com.aliyun.TestProcessor. This parameter is required if processor_type is set to java. |
|
processor_type |
No |
The language in which the processor is implemented. Valid values:
|
|
warm_up_data_path |
No |
The path to the request file for model prefetch. See model prefetch. |
|
runtime.enable_crash_block |
No |
Whether a service instance automatically restarts after it crashes due to an exception in the processor code. Valid values:
|
|
autoscaler |
No |
Horizontal auto scaling configuration. See horizontal auto scaling. |
|
labels |
No |
Configure labels for EAS. The format is |
|
unit.size |
No |
The number of machines per service instance in a distributed inference deployment. The default value is 2. |
|
sinker |
No |
Persists all service requests and responses to MaxCompute or Simple Log Service (SLS). See sinker parameter description. |
|
confidential |
No |
Enables secure, encrypted inference through a trust management service. Data, models, and code remain encrypted during service deployment and invocation. The format is as follows: Note
This secure encryption feature applies to files on your mounted storage. Ensure you mount the required storage files before enabling this feature.
The parameters are as follows.
|
Metadata parameters
Advanced parameters
Cloud parameters
|
Parameter |
Required |
Description |
|
|
computing |
instances |
No |
Instance types for deploying a service in a public resource group. If a bid for a spot instance fails or an instance type has insufficient inventory, the system tries the next instance type in order.
|
|
disable_spot_protection_period |
No |
This parameter applies to spot instances. Valid values:
|
|
|
networking |
vpc_id |
No |
The ID of the VPC. |
|
vswitch_id |
No |
The ID of the VSwitch. |
|
|
security_group_id |
No |
The ID of the security group. |
|
|
destination_cidrs |
No |
If the CIDR block of the configured VSwitch conflicts with the EAS management CIDR blocks (10.224.0.0/16 or 10.240.0.0/12), set this field to the CIDR block of the VSwitch.
Replace |
|
Example:
{
"cloud": {
"computing": {
"instances": [
{
"type": "ecs.c8i.2xlarge",
"spot_price_limit": 1
},
{
"type": "ecs.c8i.xlarge",
"capacity": "20%"
}
],
"disable_spot_protection_period": false
},
"networking": {
"vpc_id": "vpc-bp1oll7xawovg9*****",
"vswitch_id": "vsw-bp1jjgkw51nsca1e****",
"security_group_id": "sg-bp1ej061cnyfn0b*****"
}
}
}
Parameter containers
To deploy a service with a custom image, see Custom Images.
|
Parameter |
Required |
Description |
|
|
image |
Yes |
The address of the image for the model service. |
|
|
env |
name |
No |
The name of the environment variable. |
|
value |
No |
The value of the environment variable. |
|
|
command |
Either |
The entry point command for the container. Only single commands are supported. For complex scripts, such as |
|
|
script |
The entry point script that runs in the container. This parameter supports complex scripts. Use |
||
|
port |
No |
The container port. Important
|
|
|
prepare |
pythonRequirements |
No |
A list of Python packages to install before the service instance starts. The
|
|
pythonRequirementsPath |
No |
The path to a
|
|
Networking parameters
|
Parameter |
Required |
Description |
|
gateway |
No |
The dedicated gateway configured for the EAS service. |
|
gateway_policy |
No |
Example configuration:
|
Sinker parameters
|
Parameter |
Required |
Description |
|
|
type |
No |
The storage type. Supported types are:
|
|
|
config |
maxcompute.project |
No |
The MaxCompute project name. |
|
maxcompute.table |
No |
The MaxCompute table name. |
|
|
sls.project |
No |
The Log Service (SLS) project name. |
|
|
sls.logstore |
No |
The Logstore name. |
|
Configuration examples:
MaxCompute
"sinker": {
"type": "maxcompute",
"config": {
"maxcompute": {
"project": "cl****",
"table": "te****"
}
}
}
Log Service (SLS)
"sinker": {
"type": "sls",
"config": {
"sls": {
"project": "k8s-log-****",
"logstore": "d****"
}
}
}
JSON configuration example
Sample JSON configuration:
{
"token": "****M5Mjk0NDZhM2EwYzUzOGE0OGMx****",
"processor": "tensorflow_cpu_1.12",
"model_path": "oss://examplebucket/exampledir/",
"oss_endpoint": "oss-cn-beijing.aliyuncs.com",
"model_entry": "",
"model_config": "",
"processor_path": "",
"processor_entry": "",
"processor_mainclass": "",
"processor_type": "",
"warm_up_data_path": "",
"runtime": {
"enable_crash_block": false
},
"unit": {
"size": 2
},
"sinker": {
"type": "MaxCompute",
"config": {
"maxcompute": {
"project": "cl****",
"table": "te****"
}
}
},
"cloud": {
"computing": {
"instances": [
{
"capacity": 800,
"type": "dedicated_resource"
},
{
"capacity": 200,
"type": "ecs.c7.4xlarge",
"spot_price_limit": 3.6
}
],
"disable_spot_protection_period": true
},
"networking": {
"vpc_id": "vpc-bp1oll7xawovg9t8****",
"vswitch_id": "vsw-bp1jjgkw51nsca1e****",
"security_group_id": "sg-bp1ej061cnyfn0b****"
}
},
"autoscaler": {
"min": 2,
"max": 5,
"strategies": {
"qps": 10
}
},
"storage": [
{
"mount_path": "/data_oss",
"oss": {
"endpoint": "oss-cn-shanghai-internal.aliyuncs.com",
"path": "oss://bucket/path/"
}
}
],
"confidential": {
"trustee_endpoint": "xx",
"decryption_key": "xx"
},
"metadata": {
"name": "test_eascmd",
"resource": "eas-r-9lkbl2jvdm0puv****",
"instance": 1,
"workspace_id": "1405**",
"gpu": 0,
"cpu": 1,
"memory": 2000,
"gpu_memory": 10,
"gpu_core_percentage": 10,
"qos": "",
"cuda": "11.2",
"enable_grpc": false,
"enable_webservice": false,
"rdma": 1,
"rpc": {
"batching": false,
"keepalive": 5000,
"io_threads": 4,
"max_batch_size": 16,
"max_batch_timeout": 50,
"max_queue_size": 64,
"worker_threads": 5,
"rate_limit": 0,
"enable_sigterm": false
},
"rolling_strategy": {
"max_surge": 1,
"max_unavailable": 1
},
"eas.termination_grace_period": 30,
"scheduling": {
"spread": {
"policy": "host"
}
},
"resource_rebalancing": false,
"workload_type": "elasticjob",
"shm_size": 100
},
"features": {
"eas.aliyun.com/extra-ephemeral-storage": "100Gi",
"eas.aliyun.com/gpu-driver-version": "tesla=550.127.08"
},
"networking": {
"gateway": "gw-m2vkzbpixm7mo****"
},
"containers": [
{
"image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
"prepare": {
"pythonRequirements": [
"numpy==1.16.4",
"absl-py==0.11.0"
]
},
"command": "python app.py",
"port": 8000
}
],
"dockerAuth": "dGVzdGNhbzoxM*******"
}