Define and deploy Elastic Algorithm Service (EAS) online services using JSON configuration files with customizable resource, networking, and container parameters.
Quick start
Prepare a JSON configuration file
Create a JSON configuration file with required settings. For first-time users, configure parameters on the Custom Model Deployment > Custom Deployment page. The system automatically generates the corresponding JSON, which can then be modified and extended.
The following example shows a service.json file. For parameter details, see JSON parameters.
{
"metadata": {
"name": "demo",
"instance": 1,
"workspace_id": "your-workspace-id"
},
"cloud": {
"computing": {
"instances": [
{
"type": "ecs.c7a.large"
}
]
}
},
"containers": [
{
"image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/python-inference:py39-ubuntu2004",
"script": "python app.py",
"port": 8000
}
]
}
Deploy the service
-
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
-
On the Inference Service tab, click Deploy Service. In the Custom Model Deployment section, select JSON Deployment.
-
Paste your JSON configuration file content and click Deploy. Deployment is complete when the service status changes to running.
JSON parameters
|
Parameter |
Required |
Description |
|
metadata |
Yes |
Service metadata. For parameter details, see metadata parameter descriptions. |
|
cloud |
No |
Compute resources and VPC configuration. For details, see cloud parameter descriptions. |
|
containers |
No |
Container image configuration. For details, see containers parameter descriptions. |
|
dockerAuth |
No |
Authentication credentials for a private repository. Value is a Base64-encoded string of the repository's |
|
networking |
No |
Network and invocation configuration. For parameter details, see networking parameter descriptions. |
|
storage |
No |
Storage mount configuration to mount data from OSS or NAS into the container. For configuration details, see Mount storage. |
|
token |
No |
Access token for service authentication. If omitted, the system automatically generates a token. |
|
aimaster |
No |
Enables Computing power check and fault tolerance for multi-node distributed inference services. |
|
model_path |
Yes |
Required when deploying a service with a processor. The model_path and processor_path parameters specify paths of the input data sources for the model and processor. Supported address formats:
|
|
oss_endpoint |
No |
OSS endpoint. Example: oss-cn-beijing.aliyuncs.com. For other values, see Regions and endpoints. Note
By default, this parameter can be omitted. The system uses the internal OSS endpoint of the current region to download model or processor files. This parameter is required for cross-region OSS access. For example, if you deploy a service in the China (Hangzhou) region and specify an OSS address in the China (Beijing) region for the model_path parameter, specify the public OSS endpoint for the China (Beijing) region. |
|
model_entry |
No |
Entry file of the model. Can be any file. If unspecified, the system uses the file name from model_path. The system passes the main file path to the initialize() function in the processor. |
|
model_config |
No |
Model configuration. Any text is supported. The system passes this value as the second argument to the initialize() function in the processor. |
|
processor |
No |
|
|
processor_path |
No |
Path of the processor's package. For more information, see the model_path parameter description. |
|
processor_entry |
No |
Main file of the processor. Examples: libprocessor.so and app.py. File must contain implementations of the This parameter is required when processor_type is set to cpp or python. |
|
processor_mainclass |
No |
Main class of the processor in the JAR package. Example: com.aliyun.TestProcessor. This parameter is required when processor_type is set to java. |
|
processor_type |
No |
Implementation language of the processor. Valid values:
|
|
warm_up_data_path |
No |
Path to the request file used for model prefetch. For more information about model prefetch, see Prefetch a model service. |
|
runtime.enable_crash_block |
No |
Specifies whether a service instance automatically restarts after crashing due to an exception in processor code. Valid values:
|
|
autoscaler |
No |
Configuration for horizontal auto scaling. For parameter details, see Horizontal auto scaling. |
|
labels |
No |
Labels for the EAS service. Format is |
|
unit.size |
No |
Number of machines deployed for a single instance in a distributed inference configuration. Default value is 2. |
|
sinker |
No |
Persists all service requests and responses to MaxCompute or Simple Log Service (SLS). For parameter details, see sinker parameter descriptions. |
|
confidential |
No |
Enables a secure and verifiable inference service by configuring a system trust management service. This ensures data, models, and code are securely encrypted during service deployment and invocation. Format: Note
The secure encryption environment is primarily for files on your mounted storage. Mount storage before enabling this feature.
The following list describes the parameters.
|
Metadata parameters
Advanced parameters
Cloud parameters
|
Parameter |
Required |
Description |
|
|
computing |
instances |
No |
Specifies the list of instance specifications for deploying a service in a public resource group. If a bid for an instance specification fails or its inventory is insufficient, the system attempts to create the service by using the next instance specification in the configured order.
|
|
disable_spot_protection_period |
No |
This parameter applies only to spot instances. Valid values:
|
|
|
networking |
vpc_id |
No |
Specifies the VPC, vSwitch, and security group for the EAS service. |
|
vswitch_id |
No |
||
|
security_group_id |
No |
||
Example:
{
"cloud": {
"computing": {
"instances": [
{
"type": "ecs.c8i.2xlarge",
"spot_price_limit": 1
},
{
"type": "ecs.c8i.xlarge",
"capacity": "20%"
}
],
"disable_spot_protection_period": false
},
"networking": {
"vpc_id": "vpc-bp1oll7xawovg9*****",
"vswitch_id": "vsw-bp1jjgkw51nsca1e****",
"security_group_id": "sg-bp1ej061cnyfn0b*****"
}
}
}
Container parameters
To deploy a service using a custom image, see Deploy services with custom images.
|
Parameter |
Required |
Description |
|
|
image |
Yes |
The URI of the container image for the model service. |
|
|
env |
name |
No |
The name of the environment variable. |
|
value |
No |
The value of the environment variable. |
|
|
command |
Either command or script is required. |
The entry point command for the container. Only single commands are supported. For complex scripts, such as |
|
|
script |
The entry point script for the container. Use |
||
|
port |
No |
The container port. Important
|
|
|
prepare |
pythonRequirements |
No |
List of Python packages to install before the service instance starts. The image must have
|
|
pythonRequirementsPath |
No |
The path to a
|
|
Networking parameters
|
Parameter |
Required |
Description |
|
gateway |
No |
Specifies the dedicated gateway for the EAS service. |
|
gateway_policy |
No |
Example configuration:
|
Sinker parameters
|
Parameter |
Required |
Description |
|
|
type |
No |
The storage type to persist records. The following types are supported:
|
|
|
config |
maxcompute.project |
No |
The MaxCompute project name. |
|
maxcompute.table |
No |
The name of the MaxCompute table. |
|
|
sls.project |
No |
The SLS project name. |
|
|
sls.logstore |
No |
The name of the SLS Logstore. |
|
The following sections provide configuration examples.
MaxCompute
"sinker": {
"type": "maxcompute",
"config": {
"maxcompute": {
"project": "cl****",
"table": "te****"
}
}
}
Simple Log Service
"sinker": {
"type": "sls",
"config": {
"sls": {
"project": "k8s-log-****",
"logstore": "d****"
}
}
}
Appendix: JSON configuration example
The following JSON example uses the parameters described above:
{
"token": "****M5Mjk0NDZhM2EwYzUzOGE0OGMx****",
"processor": "tensorflow_cpu_1.12",
"model_path": "oss://examplebucket/exampledir/",
"oss_endpoint": "oss-cn-beijing.aliyuncs.com",
"model_entry": "",
"model_config": "",
"processor_path": "",
"processor_entry": "",
"processor_mainclass": "",
"processor_type": "",
"warm_up_data_path": "",
"runtime": {
"enable_crash_block": false
},
"unit": {
"size": 2
},
"sinker": {
"type": "maxcompute",
"config": {
"maxcompute": {
"project": "cl****",
"table": "te****"
}
}
},
"cloud": {
"computing": {
"instances": [
{
"capacity": 800,
"type": "dedicated_resource"
},
{
"capacity": 200,
"type": "ecs.c7.4xlarge",
"spot_price_limit": 3.6
}
],
"disable_spot_protection_period": true
},
"networking": {
"vpc_id": "vpc-bp1oll7xawovg9t8****",
"vswitch_id": "vsw-bp1jjgkw51nsca1e****",
"security_group_id": "sg-bp1ej061cnyfn0b****"
}
},
"autoscaler": {
"min": 2,
"max": 5,
"strategies": {
"qps": 10
}
},
"storage": [
{
"mount_path": "/data_oss",
"oss": {
"endpoint": "oss-cn-shanghai-internal.aliyuncs.com",
"path": "oss://bucket/path/"
}
}
],
"confidential": {
"trustee_endpoint": "xx",
"decryption_key": "xx"
},
"metadata": {
"name": "test_eascmd",
"resource": "eas-r-9lkbl2jvdm0puv****",
"instance": 1,
"workspace_id": "1405**",
"gpu": 0,
"cpu": 1,
"memory": 2000,
"gpu_memory": 10,
"gpu_core_percentage": 10,
"qos": "",
"cuda": "11.2",
"enable_grpc": false,
"enable_webservice": false,
"rdma": 1,
"rpc": {
"batching": false,
"keepalive": 5000,
"io_threads": 4,
"max_batch_size": 16,
"max_batch_timeout": 50,
"max_queue_size": 64,
"worker_threads": 5,
"rate_limit": 0,
"enable_sigterm": false
},
"rolling_strategy": {
"max_surge": 1,
"max_unavailable": 1
},
"eas.termination_grace_period": 30,
"scheduling": {
"spread": {
"policy": "host"
}
},
"resource_rebalancing": false,
"workload_type": "elasticjob",
"shm_size": 100
},
"features": {
"eas.aliyun.com/extra-ephemeral-storage": "100Gi",
"eas.aliyun.com/gpu-driver-version": "tesla=550.127.08"
},
"networking": {
"gateway": "gw-m2vkzbpixm7mo****"
},
"containers": [
{
"image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
"prepare": {
"pythonRequirements": [
"numpy==1.16.4",
"absl-py==0.11.0"
]
},
"command": "python app.py",
"port": 8000
}
],
"dockerAuth": "dGVzdGNhbzoxM*******"
}