All Products
Search
Document Center

Platform For AI:Parameters of model services

Last Updated:Aug 17, 2023

You can use the EASCMD client to create services for deployment in Elastic Algorithm Service (EAS) of Machine Learning Platform for AI (PAI). Before you create a service, you must specify its related parameters in a JSON object. This topic describes the parameters in the JSON object.

Note

For more information about how to use the EASCMD client, see Download the EASCMD client and complete user authentication.

The following table describes the parameters.

Parameter

Required

Description

name

Yes

The name of the service. The name must be unique in a region.

token

No

The authentication token. If this parameter is not specified, the system automatically generates a token.

model_path

Yes

The model_path parameter specifies the path of the input model package. The processor_path parameter specifies the path of the input processor package. You can set the parameters in one of the following formats:

  • HTTP URL: If an HTTP URL is used, the input package must be in the TAR.GZ, TAR, BZ2, or ZIP format.

  • OSS path: You can specify the path of a specific object or directory in Object Storage Service (OSS). If you use an OSS path in another region, you must also specify the oss_endpoint parameter. Example:

    "model_path":"oss://wowei-beijing-tiyan/alink/",
    "oss_endpoint":"oss-cn-beijing.aliyuncs.com",
  • On-premises path: If you want to run the test command to perform debugging on your device, you can use an on-premises path.

oss_endpoint

No

The public endpoint of OSS in a region. Example: oss-cn-beijing.aliyuncs.com. For more information, see Regions and endpoints.

Note

By default, the system uses the endpoint of OSS in the current region to download the model package or processor package. As such, this parameter does not need to be specified. If you want to access OSS in a different region, you must specify this parameter. For example, if you want to deploy a service in the China (Hangzhou) region but want to use an OSS endpoint in the China (Beijing) region for the model_path parameter, you must specify this parameter.

model_entry

No

The entry file of the model package. The file can be an arbitrary file. If this parameter is not specified, the value of the model_path parameter is used. The path of the main file in the model package is passed to the initialize() function of the processor.

model_config

No

The model configuration. The value is of the TEXT type. This value of this parameter is passed to the second parameter of the initialize() function in the processor.

processor

No

  • If you use a built-in processor, you can directly specify the processor code here. For more information about the processor codes that are used by the EASCMD client, see Built-in processors.

  • If you use a custom processor, you do not need to set this parameter. Instead, you need to only set the processor_path, processor_entry, processor_mainclass, and processor_type parameters.

processor_path

No

The path of the processor package. For more information, see the description of the model_path parameter.

processor_entry

No

The main file of the processor package, such as libprocessor.so, or app.py. The main file contains the implementations of the initialize() and process() functions that are required for prediction.

If the processor_type parameter is set to cpp or python, you must set this parameter.

processor_mainclass

No

The main class in the JAR package of the processor, such as com.aliyun.TestProcessor.

If the processor_type parameter is set to java, you must set this parameter.

processor_type

No

The language that is used to implement the processor. Valid values:

  • cpp

  • java

  • python

warm_up_data_path

No

The path of the request file that is used for model warm-up. For more information, see Warm up model services (advanced).

runtime.enable_crash_block

No

Specifies whether to restart the service instance if the instance crashes due to a processor code exception. Valid values:

  • true: The service instance is not automatically restarted. This can help you with troubleshooting.

  • false: The service instance is automatically restarted.

cloud

No

If you use the public resource group to deploy a service, you must use the cloud.computing.instance_type parameter to specify the instance type that is used to deploy the service.

"cloud":{
      "computing":{
          "instance_type":"ecs.gn6i-c24g1.6xlarge"
      }
  }

For more information, see Usage notes for the shared resource group.

autoscaler

No

The horizontal auto-scaling configuration of the model service. For more information, see Enable or disable the horizontal auto-scaling feature.

containers

No

The container information of the custom image that you want to use to deploy the service. For more information, see Deploy a model service by using a custom image.

storage

No

The storage information of the service.

metadata

Yes

The metadata of the service. For more information, see the metadata parameters section of this topic.

features

No

The specific features of the service. For more information, see the features parameters section of this topic.

Table 1. metadata parameters

Parameter

Required

Description

Basic parameters

instance

Yes

The number of service instances.

cpu

No

The number of CPUs that each instance requires.

memory

No

The amount of memory that each instance requires. The value must be an integer. Unit: MB. For example, "memory": 4096 indicates that each instance requires 4 GB of memory.

gpu

No

The number of GPUs that each instance requires.

gpu_memory

No

The amount of GPU memory that each instance requires. The value must be an integer. Unit: GB.

PAI allows memory resources of a GPU to be allocated to multiple instances. If you want multiple instances to share the memory resources of a GPU, set the gpu parameter to 0. If you set the gpu parameter to 1, each instance occupies a GPU and the gpu_memory parameter does not take effect.

Important

PAI does not enable the strict isolation of GPU memory. To prevent out-of-memory (OOM) errors, make sure that the GPU memory used by each instance does not exceed the requested amount.

qos

No

The quality of service (QoS) level of each instance. You can leave the parameter empty or set the parameter to BestEffort. If the qos parameter is set to BestEffort, all instances on a node share the CPU cores of the node. This way, the system schedules instances based on GPU and memory and is not limited to the CPU number of the node. All instances share the CPU cores on the node. In this case, the cpu parameter specifies the maximum number of CPU cores allowed for each instance, while memory and GPU resources are still allocated to the instances based on values of the memory and GPU parameters.

resource

No

The ID of the resource group. Set this parameter based on the following rules:

  • If the service is deployed in the public resource group, you can ignore this parameter. In this case, the service is billed on a pay-as-you-go basis.

  • If the service is deployed in a dedicated resource group, set this parameter to the ID of the resource group. Example: eas-r-6dbzve8ip0xnzte5rp.

cuda

No

The Compute Unified Device Architecture (CUDA) version that is required by the service. When the service starts, the CUDA of a specified version is automatically mounted to the /usr/local/cuda directory of the instance.

Supported CUDA versions: 8.0, 9.0, 10.0, 10.1, 10.2, 11.0, 11.1, and 11.2. Example: "cuda":"11.2".

enable_grpc

No

Specifies whether to enable the gRPC connection for the service gateway. Default value: false. Valid values:

  • false: disables the gRPC connection. In this case, HTTP requests are supported by default.

  • true: enables the gRPC connection.

Note

If you use a custom image to deploy the service and the image uses the gRPC server, you must set this parameter to true.

enable_webservice

No

Specifies whether to enable the webserver feature. If the feature is enabled, the system deploys the service as an AI-powered web application. Default value: false. Valid values:

  • false: disables the webserver feature.

  • true: enables the webserver feature. The system deploys the service as an AI-powered web application.

Advanced parameters

Important

We recommend that you set these parameters with caution.

rpc.batching

No

Specifies whether batch processing is enabled on the server to accelerate GPU-based modeling. Valid values:

  • false: disables batch processing on the server.

  • true: enables batch processing on the server.

rpc.keepalive

No

The maximum processing time for a single request. If the amount of request processing time exceeds this value, the server returns the timeout error code 408 and closes the connection. Default value: 5000. Unit: milliseconds.

rpc.io_threads

No

The number of threads that are used by each instance to process network I/O. Default value: 4.

rpc.max_batch_size

No

The maximum size of each batch. Default value: 16. This parameter takes effect only if the rpc.batching parameter is set to true.

rpc.max_batch_timeout

No

The timeout period of each batch. Default value: 50. Unit: milliseconds. This parameter takes effect only if the rpc.batching parameter is set to true.

rpc.max_queue_size

No

The size of the request queue. Default value: 64. When the queue is full, the server returns the error code 450 and closes the connection. To prevent the server from being overloaded, the request queue instructs the client to send requests to other instances when the queue is full. If the response time is too long, set this parameter to a smaller value to prevent a request from timing out.

rpc.worker_threads

No

The number of threads that are used by each instance to process concurrent requests. Default value: 5.

rpc.rate_limit

No

Specifies whether to enable QPS-based throttling for an instance and limit the maximum number of queries that can be handled by an instance per second. Default value: 0. A value of 0 indicates that QPS-based throttling is disabled.

For example, if you set this parameter to 2000, when the queries per second (QPS) exceeds 2,000, new requests are denied and status code 429 is returned.

rolling_strategy.max_surge

No

The maximum number of additional instances that can be created for the service during a rolling update. You can set the value to a positive integer which indicates the number of additional instances. You can also set the value to a percentage, such as 2%, that indicates the ratio of the number of the additional instances to the original number of the service instances. The default value is 2%. The higher the value, the faster the service is updated.

For example, if the number of service instances is set to 100 and this parameter is set to 20, when you update the service, 20 additional instances are immediately created.

rolling_strategy.max_unavailable

No

The maximum number of service instances that become unavailable during a rolling update. During a rolling update, the system can release existing instances to free up resources for newly created instances. This prevents update failures caused by insufficient resources. If a dedicated resource group is used, this parameter is set to 1 by default. If the public resource group is used, this parameter is set to 0 by default.

For example, if this parameter is set to N, N instances are immediately stopped when a service update starts.

Note

If idle resources are sufficient, you can set this parameter to 0. If this parameter is set to a large value, service stability may be affected. This is because a larger value results in a reduced number of available instances during a service update and heavier workloads for each instance. You must consider service stability and the resources you need before you set this parameter.

eas.termination_grace_period

No

The maximum amount of time allowed for a graceful shutdown. Unit: seconds. Default value: 30.

EAS services use the rolling update policy. When an instance is to be released, it enters the Terminating state but continues to process the requests that it has received, while the system switches its traffic to other instances. After the instance finishes processing the in-progress requests, it is released. The amount of time taken by this process must be within the value of this parameter. If requests take a long time to process, you can increase the value of this parameter to ensure that all requests can be processed when the service is updated.

Important

If you set this parameter to a small value, service stability may be affected. If you set this parameter to a large value, service update may be prolonged. We recommend that you use the default value unless you have special requirements.

scheduling.spread.policy

No

The policy that is used to distribute instances during instance scheduling. Valid values:

  • host: The instances are distributed across as many nodes as possible.

  • zone: The instances are distributed across as many zones as possible.

  • default: The instances are scheduled by using the default policy and are not intentionally distributed.

Table 2. features parameters

Parameter

Required

Description

eas.aliyun.com/extra-ephemeral-storage

No

The additional amount of system disk memory that you can specify. If the free quota does not meet your business requirements, set this parameter. The value must be a positive integer. Valid values: 0 to 2000. Unit: GB.