Parameters of model services - Platform For AI - Alibaba Cloud Documentation Center

You can use the EASCMD client to create services for deployment in Elastic Algorithm Service (EAS) of Machine Learning Platform for AI (PAI). Before you create a service, you must specify its related parameters in a JSON object. This topic describes the parameters in the JSON object.

Note

For more information about how to use the EASCMD client, see Download the EASCMD client and complete user authentication.

The following table describes the parameters.

Parameter	Required	Description
name	Yes	The name of the service. The name must be unique in a region.
token	No	The authentication token. If this parameter is not specified, the system automatically generates a token.
model_path	Yes	The model_path parameter specifies the path of the input model package. The processor_path parameter specifies the path of the input processor package. You can set the parameters in one of the following formats: HTTP URL: If an HTTP URL is used, the input package must be in the TAR.GZ, TAR, BZ2, or ZIP format. OSS path: You can specify the path of a specific object or directory in Object Storage Service (OSS). If you use an OSS path in another region, you must also specify the oss_endpoint parameter. Example: `"model_path":"oss://wowei-beijing-tiyan/alink/", "oss_endpoint":"oss-cn-beijing.aliyuncs.com",` On-premises path: If you want to run the `test` command to perform debugging on your device, you can use an on-premises path.
oss_endpoint	No	The public endpoint of OSS in a region. Example: oss-cn-beijing.aliyuncs.com. For more information, see Regions and endpoints. Note By default, the system uses the endpoint of OSS in the current region to download the model package or processor package. As such, this parameter does not need to be specified. If you want to access OSS in a different region, you must specify this parameter. For example, if you want to deploy a service in the China (Hangzhou) region but want to use an OSS endpoint in the China (Beijing) region for the model_path parameter, you must specify this parameter.
model_entry	No	The entry file of the model package. The file can be an arbitrary file. If this parameter is not specified, the value of the model_path parameter is used. The path of the main file in the model package is passed to the initialize() function of the processor.
model_config	No	The model configuration. The value is of the TEXT type. This value of this parameter is passed to the second parameter of the initialize() function in the processor.
processor	No	If you use a built-in processor, you can directly specify the processor code here. For more information about the processor codes that are used by the `EASCMD` client, see Built-in processors. If you use a custom processor, you do not need to set this parameter. Instead, you need to only set the processor_path, processor_entry, processor_mainclass, and processor_type parameters.
processor_path	No	The path of the processor package. For more information, see the description of the model_path parameter.
processor_entry	No	The main file of the processor package, such as libprocessor.so, or app.py. The main file contains the implementations of the `initialize()` and `process()` functions that are required for prediction. If the processor_type parameter is set to cpp or python, you must set this parameter.
processor_mainclass	No	The main class in the JAR package of the processor, such as com.aliyun.TestProcessor. If the processor_type parameter is set to java, you must set this parameter.
processor_type	No	The language that is used to implement the processor. Valid values: cpp java python
warm_up_data_path	No	The path of the request file that is used for model warm-up. For more information, see Warm up model services (advanced).
runtime.enable_crash_block	No	Specifies whether to restart the service instance if the instance crashes due to a processor code exception. Valid values: true: The service instance is not automatically restarted. This can help you with troubleshooting. false: The service instance is automatically restarted.
cloud	No	If you use the public resource group to deploy a service, you must use the cloud.computing.instance_type parameter to specify the instance type that is used to deploy the service. `"cloud":{ "computing":{ "instance_type":"ecs.gn6i-c24g1.6xlarge" } }` For more information, see Usage notes for the shared resource group.
autoscaler	No	The horizontal auto-scaling configuration of the model service. For more information, see Enable or disable the horizontal auto-scaling feature.
containers	No	The container information of the custom image that you want to use to deploy the service. For more information, see Deploy a model service by using a custom image.
storage	No	The storage information of the service.
metadata	Yes	The metadata of the service. For more information, see the metadata parameters section of this topic.
features	No	The specific features of the service. For more information, see the features parameters section of this topic.

Table 1. metadata parameters
Parameter		Required	Description
Basic parameters	instance	Yes	The number of service instances.
	cpu	No	The number of CPUs that each instance requires.
	memory	No	The amount of memory that each instance requires. The value must be an integer. Unit: MB. For example, `"memory": 4096` indicates that each instance requires 4 GB of memory.
	gpu	No	The number of GPUs that each instance requires.
	gpu_memory	No	The amount of GPU memory that each instance requires. The value must be an integer. Unit: GB. PAI allows memory resources of a GPU to be allocated to multiple instances. If you want multiple instances to share the memory resources of a GPU, set the gpu parameter to 0. If you set the gpu parameter to 1, each instance occupies a GPU and the gpu_memory parameter does not take effect. Important PAI does not enable the strict isolation of GPU memory. To prevent out-of-memory (OOM) errors, make sure that the GPU memory used by each instance does not exceed the requested amount.
	qos	No	The quality of service (QoS) level of each instance. You can leave the parameter empty or set the parameter to BestEffort. If the qos parameter is set to BestEffort, all instances on a node share the CPU cores of the node. This way, the system schedules instances based on GPU and memory and is not limited to the CPU number of the node. All instances share the CPU cores on the node. In this case, the cpu parameter specifies the maximum number of CPU cores allowed for each instance, while memory and GPU resources are still allocated to the instances based on values of the memory and GPU parameters.
	resource	No	The ID of the resource group. Set this parameter based on the following rules: If the service is deployed in the public resource group, you can ignore this parameter. In this case, the service is billed on a pay-as-you-go basis. If the service is deployed in a dedicated resource group, set this parameter to the ID of the resource group. Example: eas-r-6dbzve8ip0xnzte5rp.
	cuda	No	The Compute Unified Device Architecture (CUDA) version that is required by the service. When the service starts, the CUDA of a specified version is automatically mounted to the `/usr/local/cuda` directory of the instance. Supported CUDA versions: 8.0, 9.0, 10.0, 10.1, 10.2, 11.0, 11.1, and 11.2. Example: `"cuda":"11.2"`.
	enable_grpc	No	Specifies whether to enable the gRPC connection for the service gateway. Default value: false. Valid values: false: disables the gRPC connection. In this case, HTTP requests are supported by default. true: enables the gRPC connection. Note If you use a custom image to deploy the service and the image uses the gRPC server, you must set this parameter to true.
	enable_webservice	No	Specifies whether to enable the webserver feature. If the feature is enabled, the system deploys the service as an AI-powered web application. Default value: false. Valid values: false: disables the webserver feature. true: enables the webserver feature. The system deploys the service as an AI-powered web application.
Advanced parameters Important We recommend that you set these parameters with caution.	rpc.batching	No	Specifies whether batch processing is enabled on the server to accelerate GPU-based modeling. Valid values: false: disables batch processing on the server. true: enables batch processing on the server.
	rpc.keepalive	No	The maximum processing time for a single request. If the amount of request processing time exceeds this value, the server returns the timeout error code 408 and closes the connection. Default value: 5000. Unit: milliseconds.
	rpc.io_threads	No	The number of threads that are used by each instance to process network I/O. Default value: 4.
	rpc.max_batch_size	No	The maximum size of each batch. Default value: 16. This parameter takes effect only if the rpc.batching parameter is set to true.
	rpc.max_batch_timeout	No	The timeout period of each batch. Default value: 50. Unit: milliseconds. This parameter takes effect only if the rpc.batching parameter is set to true.
	rpc.max_queue_size	No	The size of the request queue. Default value: 64. When the queue is full, the server returns the error code 450 and closes the connection. To prevent the server from being overloaded, the request queue instructs the client to send requests to other instances when the queue is full. If the response time is too long, set this parameter to a smaller value to prevent a request from timing out.
	rpc.worker_threads	No	The number of threads that are used by each instance to process concurrent requests. Default value: 5.
	rpc.rate_limit	No	Specifies whether to enable QPS-based throttling for an instance and limit the maximum number of queries that can be handled by an instance per second. Default value: 0. A value of 0 indicates that QPS-based throttling is disabled. For example, if you set this parameter to 2000, when the queries per second (QPS) exceeds 2,000, new requests are denied and status code 429 is returned.
	rolling_strategy.max_surge	No	The maximum number of additional instances that can be created for the service during a rolling update. You can set the value to a positive integer which indicates the number of additional instances. You can also set the value to a percentage, such as 2%, that indicates the ratio of the number of the additional instances to the original number of the service instances. The default value is 2%. The higher the value, the faster the service is updated. For example, if the number of service instances is set to 100 and this parameter is set to 20, when you update the service, 20 additional instances are immediately created.
	rolling_strategy.max_unavailable	No	The maximum number of service instances that become unavailable during a rolling update. During a rolling update, the system can release existing instances to free up resources for newly created instances. This prevents update failures caused by insufficient resources. If a dedicated resource group is used, this parameter is set to 1 by default. If the public resource group is used, this parameter is set to 0 by default. For example, if this parameter is set to N, N instances are immediately stopped when a service update starts. Note If idle resources are sufficient, you can set this parameter to 0. If this parameter is set to a large value, service stability may be affected. This is because a larger value results in a reduced number of available instances during a service update and heavier workloads for each instance. You must consider service stability and the resources you need before you set this parameter.
	eas.termination_grace_period	No	The maximum amount of time allowed for a graceful shutdown. Unit: seconds. Default value: 30. EAS services use the rolling update policy. When an instance is to be released, it enters the Terminating state but continues to process the requests that it has received, while the system switches its traffic to other instances. After the instance finishes processing the in-progress requests, it is released. The amount of time taken by this process must be within the value of this parameter. If requests take a long time to process, you can increase the value of this parameter to ensure that all requests can be processed when the service is updated. Important If you set this parameter to a small value, service stability may be affected. If you set this parameter to a large value, service update may be prolonged. We recommend that you use the default value unless you have special requirements.
	scheduling.spread.policy	No	The policy that is used to distribute instances during instance scheduling. Valid values: host: The instances are distributed across as many nodes as possible. zone: The instances are distributed across as many zones as possible. default: The instances are scheduled by using the default policy and are not intentionally distributed.

Table 2. features parameters
Parameter	Required	Description
eas.aliyun.com/extra-ephemeral-storage	No	The additional amount of system disk memory that you can specify. If the free quota does not meet your business requirements, set this parameter. The value must be a positive integer. Valid values: 0 to 2000. Unit: GB.