You can use the EASCMD client to create services for deployment in Elastic Algorithm Service (EAS) of Machine Learning Platform for AI (PAI). Before you create a service, you must specify its related parameters in a JSON object. This topic describes the parameters in the JSON object.
For more information about how to use the EASCMD client, see Download the EASCMD client and complete user authentication.
The following table describes the parameters.
Parameter | Required | Description |
name | Yes | The name of the service. The name must be unique in a region. |
token | No | The authentication token. If this parameter is not specified, the system automatically generates a token. |
model_path | Yes | The model_path parameter specifies the path of the input model package. The processor_path parameter specifies the path of the input processor package. You can set the parameters in one of the following formats:
|
oss_endpoint | No | The public endpoint of OSS in a region. Example: oss-cn-beijing.aliyuncs.com. For more information, see Regions and endpoints. Note By default, the system uses the endpoint of OSS in the current region to download the model package or processor package. As such, this parameter does not need to be specified. If you want to access OSS in a different region, you must specify this parameter. For example, if you want to deploy a service in the China (Hangzhou) region but want to use an OSS endpoint in the China (Beijing) region for the model_path parameter, you must specify this parameter. |
model_entry | No | The entry file of the model package. The file can be an arbitrary file. If this parameter is not specified, the value of the model_path parameter is used. The path of the main file in the model package is passed to the initialize() function of the processor. |
model_config | No | The model configuration. The value is of the TEXT type. This value of this parameter is passed to the second parameter of the initialize() function in the processor. |
processor | No |
|
processor_path | No | The path of the processor package. For more information, see the description of the model_path parameter. |
processor_entry | No | The main file of the processor package, such as libprocessor.so, or app.py. The main file contains the implementations of the If the processor_type parameter is set to cpp or python, you must set this parameter. |
processor_mainclass | No | The main class in the JAR package of the processor, such as com.aliyun.TestProcessor. If the processor_type parameter is set to java, you must set this parameter. |
processor_type | No | The language that is used to implement the processor. Valid values:
|
warm_up_data_path | No | The path of the request file that is used for model warm-up. For more information, see Warm up model services (advanced). |
runtime.enable_crash_block | No | Specifies whether to restart the service instance if the instance crashes due to a processor code exception. Valid values:
|
cloud | No | If you use the public resource group to deploy a service, you must use the cloud.computing.instance_type parameter to specify the instance type that is used to deploy the service.
For more information, see Usage notes for the shared resource group. |
autoscaler | No | The horizontal auto-scaling configuration of the model service. For more information, see Enable or disable the horizontal auto-scaling feature. |
containers | No | The container information of the custom image that you want to use to deploy the service. For more information, see Deploy a model service by using a custom image. |
storage | No | The storage information of the service. |
metadata | Yes | The metadata of the service. For more information, see the metadata parameters section of this topic. |
features | No | The specific features of the service. For more information, see the features parameters section of this topic. |
Parameter | Required | Description | |
Basic parameters | instance | Yes | The number of service instances. |
cpu | No | The number of CPUs that each instance requires. | |
memory | No | The amount of memory that each instance requires. The value must be an integer. Unit: MB. For example, | |
gpu | No | The number of GPUs that each instance requires. | |
gpu_memory | No | The amount of GPU memory that each instance requires. The value must be an integer. Unit: GB. PAI allows memory resources of a GPU to be allocated to multiple instances. If you want multiple instances to share the memory resources of a GPU, set the gpu parameter to 0. If you set the gpu parameter to 1, each instance occupies a GPU and the gpu_memory parameter does not take effect. Important PAI does not enable the strict isolation of GPU memory. To prevent out-of-memory (OOM) errors, make sure that the GPU memory used by each instance does not exceed the requested amount. | |
qos | No | The quality of service (QoS) level of each instance. You can leave the parameter empty or set the parameter to BestEffort. If the qos parameter is set to BestEffort, all instances on a node share the CPU cores of the node. This way, the system schedules instances based on GPU and memory and is not limited to the CPU number of the node. All instances share the CPU cores on the node. In this case, the cpu parameter specifies the maximum number of CPU cores allowed for each instance, while memory and GPU resources are still allocated to the instances based on values of the memory and GPU parameters. | |
resource | No | The ID of the resource group. Set this parameter based on the following rules:
| |
cuda | No | The Compute Unified Device Architecture (CUDA) version that is required by the service. When the service starts, the CUDA of a specified version is automatically mounted to the Supported CUDA versions: 8.0, 9.0, 10.0, 10.1, 10.2, 11.0, 11.1, and 11.2. Example: | |
enable_grpc | No | Specifies whether to enable the gRPC connection for the service gateway. Default value: false. Valid values:
Note If you use a custom image to deploy the service and the image uses the gRPC server, you must set this parameter to true. | |
enable_webservice | No | Specifies whether to enable the webserver feature. If the feature is enabled, the system deploys the service as an AI-powered web application. Default value: false. Valid values:
| |
Advanced parameters Important We recommend that you set these parameters with caution. | rpc.batching | No | Specifies whether batch processing is enabled on the server to accelerate GPU-based modeling. Valid values:
|
rpc.keepalive | No | The maximum processing time for a single request. If the amount of request processing time exceeds this value, the server returns the timeout error code 408 and closes the connection. Default value: 5000. Unit: milliseconds. | |
rpc.io_threads | No | The number of threads that are used by each instance to process network I/O. Default value: 4. | |
rpc.max_batch_size | No | The maximum size of each batch. Default value: 16. This parameter takes effect only if the rpc.batching parameter is set to true. | |
rpc.max_batch_timeout | No | The timeout period of each batch. Default value: 50. Unit: milliseconds. This parameter takes effect only if the rpc.batching parameter is set to true. | |
rpc.max_queue_size | No | The size of the request queue. Default value: 64. When the queue is full, the server returns the error code 450 and closes the connection. To prevent the server from being overloaded, the request queue instructs the client to send requests to other instances when the queue is full. If the response time is too long, set this parameter to a smaller value to prevent a request from timing out. | |
rpc.worker_threads | No | The number of threads that are used by each instance to process concurrent requests. Default value: 5. | |
rpc.rate_limit | No | Specifies whether to enable QPS-based throttling for an instance and limit the maximum number of queries that can be handled by an instance per second. Default value: 0. A value of 0 indicates that QPS-based throttling is disabled. For example, if you set this parameter to 2000, when the queries per second (QPS) exceeds 2,000, new requests are denied and status code 429 is returned. | |
rolling_strategy.max_surge | No | The maximum number of additional instances that can be created for the service during a rolling update. You can set the value to a positive integer which indicates the number of additional instances. You can also set the value to a percentage, such as 2%, that indicates the ratio of the number of the additional instances to the original number of the service instances. The default value is 2%. The higher the value, the faster the service is updated. For example, if the number of service instances is set to 100 and this parameter is set to 20, when you update the service, 20 additional instances are immediately created. | |
rolling_strategy.max_unavailable | No | The maximum number of service instances that become unavailable during a rolling update. During a rolling update, the system can release existing instances to free up resources for newly created instances. This prevents update failures caused by insufficient resources. If a dedicated resource group is used, this parameter is set to 1 by default. If the public resource group is used, this parameter is set to 0 by default. For example, if this parameter is set to N, N instances are immediately stopped when a service update starts. Note If idle resources are sufficient, you can set this parameter to 0. If this parameter is set to a large value, service stability may be affected. This is because a larger value results in a reduced number of available instances during a service update and heavier workloads for each instance. You must consider service stability and the resources you need before you set this parameter. | |
eas.termination_grace_period | No | The maximum amount of time allowed for a graceful shutdown. Unit: seconds. Default value: 30. EAS services use the rolling update policy. When an instance is to be released, it enters the Terminating state but continues to process the requests that it has received, while the system switches its traffic to other instances. After the instance finishes processing the in-progress requests, it is released. The amount of time taken by this process must be within the value of this parameter. If requests take a long time to process, you can increase the value of this parameter to ensure that all requests can be processed when the service is updated. Important If you set this parameter to a small value, service stability may be affected. If you set this parameter to a large value, service update may be prolonged. We recommend that you use the default value unless you have special requirements. | |
scheduling.spread.policy | No | The policy that is used to distribute instances during instance scheduling. Valid values:
|
Parameter | Required | Description |
eas.aliyun.com/extra-ephemeral-storage | No | The additional amount of system disk memory that you can specify. If the free quota does not meet your business requirements, set this parameter. The value must be a positive integer. Valid values: 0 to 2000. Unit: GB. |