All Products
Search
Document Center

Platform For AI:Parameters for custom deployment in the console

Last Updated:Apr 30, 2025

This topic describes parameter configurations for custom deployment of model services in the Platform for AI (PAI) console.

Basic Information

Parameter

Description

Service Name

Specify a service name as prompted.

Group

A service group has a unified ingress. You can use service groups to perform canary releases, blue-green deployments, heterogeneous resources inference, and asynchronous inference. For more information, see Manage service groups.

Environment Information

You can deploy a model by using an image or a processor.

  • Image-based Deployment: Select this deployment method if you want to quickly deploy AI inference services by mounting images, code, and models.

  • Processor-based Deployment: Select this deployment method if you want to deploy AI inference services by using models and processors, such as built-in processors or custom processors. For more information, see Deploy services by using built-in processors and Deploy services by using custom processors.

Note

In complex model inference scenarios, such as AI-generated content (AIGC) and video processing, inference takes a long time to complete. We recommend that you turn on Asynchronous Services to implement the asynchronous inference service. For more information, see Deploy an asynchronous inference service.

Deploy a service or web application by using an image

Image-based deployment supports asynchronous services and allows you to enable web applications. If the image that you use is integrated with a web UI application, the system automatically starts the web server after you enable web applications. This helps you directly access the web UI page.

Parameter

Description

Image Configuration

Valid values:

  • Alibaba Cloud Image: Select an Alibaba Cloud image.

  • Custom Image: Select a custom image. For more information about how to create a custom image, see Custom images.

  • Image Address: the URL of the image that is used to deploy the model service. Example: registry.cn-shanghai.aliyuncs.com/xxx/image:tag. You can specify the address of an image provided by PAI or a custom image. For more information about how to obtain the image address, see Custom images.

    Important

    The specified image must be in the same region as the service that you want to deploy.

    If you want to use an image from a private repository, click enter the username and password and specify the username and password of the image repository.

Model Settings

You can use one of the following methods to configure model files:

  • OSS

    • OSS: the path of the source Object Storage Service (OSS) bucket.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

  • General-purpose File Storage NAS (NAS) file system

    • Select a file system: the ID of the created NAS file system. You can log on to the NAS console to view the ID of the NAS file system in the region. You can also view the ID of the NAS file system from the drop-down list.

    • Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

    • File System Path: the NAS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the NAS file system.

  • CPFS for Lingjun: If you use Lingjun resource quotas to deploy a service, you can mount CPFS for Lingjun storage resources.

    • Select a file system: Select a Cloud Parallel File Storage (CPFS) file system of your Alibaba Cloud account. For more information about how to create a CPFS file system, see Create a file system.

    • File System Path: the CPFS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the CPFS file system.

  • PAI Model

    • PAI Model: Select a registered model based on the model name and version. For more information about how to view registered models, see Register and manage models.

    • Mount Path: the mount path of the service instance. The mount path is used to read the model file.

Command

The command to run the image. Example: python/run.py.

You also need to enter the port number, which is the local HTTP port on which the model service listens after the image is deployed.

Important

You cannot specify ports 8080 and 9090 because the EAS engine listens on ports 8080 and 9090.

Show More (Code Build | Environment Variable | Health Check | Enable gRPC | Third-party Library Settings)

Parameter

Description

Code Build

You can use one of the following methods to configure the code:

  • OSS

    • OSS: the path of the source OSS bucket.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

  • General-purpose NAS file system

    • Select a file system: the ID of the created NAS file system. You can log on to the NAS console to view the ID of the NAS file system in the region. You can also view the ID of the NAS file system from the drop-down list.

    • Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

    • File System Path: the NAS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the NAS file system.

  • CPFS for Lingjun: If you use Lingjun resource quotas to deploy a service, you can mount CPFS for Lingjun storage resources.

    • Select a file system: Select a CPFS file system of your Alibaba Cloud account. For more information about how to create a CPFS file system, see Create a file system.

    • File System Path: the CPFS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the CPFS file system.

  • Git

    • Git Repository Address: the address of the Git repository.

    • Mount Path: the mount path of the service instance. The path is used to read the code file from the Git directory.

  • Code Build

    • Code Build: Select an existing code build. If no code build is available, you can click Create Code Build to create a code build.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified PAI code build.

  • Custom Dataset

    • Custom Dataset: Select an existing dataset. If no dataset is available, you can click Create Dataset to create a dataset.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified PAI dataset.

Third-party Library Settings

Valid values:

  • Third-party Libraries: Specify third-party libraries in the field.

  • Path of requirements.txt: Specify the path of the requirements.txt file in the field. You must include the addresses of the third-party libraries in the requirements.txt file.

Environment Variables

Specify Key and Value for the environment variable.

  • Key: the name of the environment variable.

  • Value: the value of the environment variable.

Health Check

Turn on Health Check. In the Health Check panel, configure the health check feature for the model service. For more information about the introduction and configuration methods of the feature, see Configure the health check feature.

Enable gRPC

Specifies whether to enable the gRPC connection for the service gateway. Valid values:

  • false: disables the gRPC connection. HTTP requests are supported by default. The default value is false.

  • true: enables the gRPC connection.

Deploy a service by using processors

The following table describes the parameters if you set the Deployment Method parameter to Processor-based Deployment.

Parameter

Description

Model Settings

Valid values:

  • OSS: Select the OSS path in which the model file is stored.

  • Download URL: Enter a public URL.

  • PAI Model: Select a registered model by specifying the model name and the model version. For more information about how to view registered models, see Register and manage models.

Processor Type

The type of processor. You can select a built-in official processor or a custom processor based on your business requirements. For more information about built-in official processors, see Built-in processors.

Model Type

This parameter is required only if you set the Processor Type parameter to EasyVision(CPU), EasyVision(GPU), EasyTransfer(CPU), EasyTransfer(GPU), EasyNLP, or EasyCV. The available model types vary based on the processor type. You can configure the Processor Type and Model Type parameters based on your business requirements.

Processor Language

This parameter is available only if you set the Processor Type parameter to Custom Processor.

Valid values: cpp, java, and python.

Processor Package

This parameter is available only if you set the Processor Type parameter to Custom Processor. Valid values:

  • OSS: Select the OSS path in which the model file is stored.

  • Download URL: Enter a public URL.

Processor Main File

This parameter is available only if you set the Processor Type parameter to Custom Processor. This parameter specifies the main file of the processor package.

Show More (Mount Configurations | Environment Variable|Health Check | Enable gRPC)

Parameter

Description

Mount Configurations

The following mount modes are supported:

  • OSS

    • OSS: the path of the source OSS bucket.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

  • General-purpose NAS file system

    • Select a file system: the ID of the created NAS file system. You can log on to the NAS console to view the ID of the NAS file system in the region. You can also view the ID of the NAS file system from the drop-down list.

    • Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

    • File System Path: the NAS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the NAS file system.

  • CPFS for Lingjun: If you use Lingjun resource quotas to deploy a service, you can mount CPFS-based storage resources.

    • Select a file system: Select a CPFS file system of your Alibaba Cloud account. For more information about how to create a CPFS file system, see Create a file system.

    • File System Path: the CPFS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the CPFS file system.

  • Git

    • Git Repository Address: the address of the Git repository.

    • Mount Path: the mount path of the service instance. The path is used to read the code file from the Git directory.

  • Code Build

    • Code Build: Select an existing code build. If no code build is available, you can click Create Code Build to create a code build.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified PAI code build.

  • Custom Dataset

    • Custom Dataset: Select an existing dataset. If no dataset is available, you can click Create Dataset to create a dataset.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified PAI dataset.

Environment Variables

Specify Key and Value for the environment variable.

  • Key: the name of the environment variable.

  • Value: the value of the environment variable.

Health Check

Turn on Health Check. In the Health Check panel, configure the health check feature for the model service. For more information about the introduction and configuration methods of the feature, see Configure the health check feature.

Enable gRPC

Specifies whether to enable the gRPC connection for the service gateway. Valid values:

  • false: disables the gRPC connection. HTTP requests are supported by default. The default value is false.

  • true: enables the gRPC connection.

Resource Information

In the Resource Information section, configure the parameters described in the following table.

Parameter

Description

Resource Type

Select Public Resources, EAS Resource Group, or Resource Quota.

Note

We recommend that you use public resources in test scenarios.

GPU Sharing

This parameter is available only if you set the Resource Type parameter to EAS Resource Group. For more information, see GPU sharing.

Note
  • The GPU sharing feature is available only to users in the whitelist. If you want to use the GPU sharing feature, submit a ticket.

  • The GPU sharing feature does not support instances of the GU series. Make sure the EAS dedicated resources you purchase are not of the GU series.

Instances

We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

If you set the Resource Type parameter to a dedicated EAS resource group, you must configure the GPUs, vCPUs, and Memory (MB) parameters for each service instance.

Deployment Resources

This parameter is supported when you set the Resource Type parameter to Public Resources.

  • You can select a single CPU or GPU instance type.

  • If you can turn on Bidding, the instance type supports the bidding feature. You can turn on Bidding and configure the protection period for the preemptible instance that you want to create.

    • No Fixed Protection Period: No guarantee of continuous usage. The instance may be automatically released due to changes in inventory or fluctuations in market prices during the usage.

    • 1-Hour Protection Period: During the one-hour protection period, the instance will not be released. After the protection period, the instance may be automatically released.

  • You can configure Common instances and Preemptible instances at the same time. Resources are started based on the sequence in which the instance types are configured. You can add up to five resource types. If you use Preemptible instances, you need to set a bidding price to bid for preemptible instances.

Show More (Elastic Resource Pool |Additional System Disk | Distributed Inference | Rolling Update | Shared Memory | High-priority Resource Rescheduling | GPU Driver)

Parameter

Description

Elastic Resource Pool

This parameter is available only if you set the Resource Type parameter to EAS Resource Group.

You can turn on Elastic Resource Pool and configure your resources based on the instructions in the Deployment Resources section.

If you enable Elastic Resource Pool and the dedicated resource group that you use to deploy services is fully occupied, the system automatically adds pay-as-you-go instances to the public resource group during scale-outs. The added instances are billed as public resources. The instances in the public resource group are released first during scale-ins. For more information, see Elastic resource pool.

Additional System Disk

This parameter is available if you set the Resource Type parameter to Public Resources or EAS Resource Group and configure an elastic resource pool.

Configure additional system disks for the EAS service. Unit: GB. Valid values: 0 to 2000. You have a free quota of 30 GB on the system disk. If you specify 20 in the field, the available storage space is 50 GB.

Additional system disks are billed based on their capacity and usage duration. For more information, see Billing of EAS.

Distributed Inference

The instances of a service are deployed on multiple machines. This can resolve the issue that a model with ultra-large-scale parameters cannot be deployed on a single machine. For more information, see Multi-machine distributed inference.

Rolling Update

  • Number of Instances Exceeding Expectation: the maximum number of additional instances that can be created for the service during a rolling update. You can set this parameter to a positive integer, which specifies the number of additional instances. You can also set this parameter to a percentage, such as 2%, which specifies the ratio of the number of additional instances to the original number of service instances. The default value is 2%. The higher the value, the faster the service is updated. For example, if you set the number of service instances to 100 and set this parameter to 20, 20 additional instances are immediately created when you update the service.

  • Maximum Number of Unavailable Instances: the maximum number of service instances that become unavailable during a rolling update. During a rolling update, the system can release existing instances to free up resources for new instances. This prevents update failures caused by insufficient resources. If a dedicated resource group is used, this parameter is set to 1 by default. If the public resource group is used, this parameter is set to 0 by default. For example, if you set this parameter to N, N instances are immediately stopped when a service update starts.

    Note

    If idle resources are sufficient, you can set this parameter to 0. If you set this parameter to a large value, service stability may be affected. This is because a larger value results in a reduced number of available instances during a service update and heavier workloads for each instance. When specifying this parameter, you must consider service stability and the resources you require.

Shared Memory

Configure shared memory for the instance to perform read and write operations on the memory without data copy or transfer. Unit: GB.

High-priority Resource Rescheduling

After enabling high-priority resource rescheduling, EAS periodically creates probe instances on high-priority resources while the service runs. If the probe instance is scheduled, more probe instances are created exponentially until the scheduling fails. At the same time, after the scheduled probe instance completes initialization and enters the ready state, it will replace the instance on the low-priority resource. This feature can resolve the following issues:

  • During the rolling update of services, instances that are being terminated still occupy resources. As a result, newly created instances are started on the public resource group. Due to the limit of public resources, subsequent new instances are rescheduled back to the dedicated resource group.

  • When a preemptible instance and a common instance are used at the same time, the system periodically checks whether the preemptible instance is available. If available, the common instance is migrated to the preemptible instance.

GPU Driver

Specify the GPU driver version. Example: 550.127.08.

VPC

Optional. In the VPC section, configure the VPC (VPC), vSwitch, and Security Group Name parameters to enable VPC direct connection for the EAS service deployed in the public resource group. For more information, see Configure network connectivity.

After the network connection is established, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). The EAS services can also access other cloud services that reside in the VPC.

Features

Optional. In the Features section, configure the parameters described in the following table.

Parameter

Description

Memory Caching

If you enable this feature, the model files of an EAS service are cached to the on-premises directory to accelerate data reading and reduce latency. For more information, see Enable memory caching for a local directory.

Dedicated Gateway

You can configure a dedicated gateway to enhance access control and improve the security and efficiency of service access. For more information, see Use a dedicated gateway.

Show More (LLM Intelligent Router | Service Response Timeout Period | Graceful Shutdown | Save Call Records | Task Mode | Configure Secure Encryption Environment | Tracing Analysis)

Parameter

Description

LLM Intelligent Router

Turn on LLM Intelligent Router and select an LLM Intelligent Router service that you deployed. If no LLM intelligent router is available, you can click Create LLM Intelligent Router to create an intelligent router. For more information, see Use LLM Intelligent Router to improve inference efficiency.

LLM Intelligent Router is a special EAS service that can be bound with an LLM inference service. When the LLM inference service has multiple instances, the LLM Intelligent Router can dynamically distributes requests based on backend load. This ensures that the computing power and memory resources of each inference instance are evenly allocated, improving the resource efficiency of the cluster.

Service Response Timeout Period

The timeout period of the server for each request. Unit: seconds. Default value: 5.

Graceful Shutdown

  • Graceful Shutdown Time: the maximum amount of time allowed for a graceful shutdown. Unit: seconds. Default value: 30. EAS services use the rolling update policy. Before an instance is released, it enters the Terminating state and continues to process the requests that it received during the period of time that you specify, while the system switches the traffic to other instances. The instance is released after the instance finishes processing the requests. Therefore, the duration of the graceful shutdown process must be within the value of this parameter. If the time required to process requests is long, you can increase the value of this parameter to ensure that all requests that are in progress can be processed when the system updates the service.

    Important

    If you set this parameter to a small value, service stability may be affected. If you set this parameter to a large value, the service update may be prolonged. We recommend that you use the default value unless you have special requirements.

  • Send SIGTERM: Valid values:

    • false: When a service instance enters the EXIT state, the system does not send the SIGTERM signal. This is the default value.

    • true: When a service instance enters the EXIT state, the system immediately sends the SIGTERM signal to the main process. After the signal is received, the main process in the service performs a custom graceful shutdown in the signal processing function. If the signal is not processed, the main process may exit immediately after receiving the signal, causing the graceful shutdown to fail.

Save Call Records

You can enable this feature to persistently save all service requests and responses to MaxCompute tables or Simple Log Service. Turn on Save Call Records and select a save method:

  • MaxCompute

    • MaxCompute Project: Select an existing project from the drop-down list. If no project is available, you can click Create MaxCompute Project to create a project. For more information, see Use the MaxCompute console to create a project.

    • MaxCompute Table: Specify a name for the table. When you deploy the service, the system automatically creates a table in the MaxCompute project.

  • Simple Log Service

    • Simple Log Service Project: specifies a project that is used to isolate and manage resources in Simple Log Service. Select an existing project. If no project is available, click Create Simple Log Service Project to create a project. For more information, see Manage a project.

    • Logstore: used to collect, store, and query log data in Simple Log Service. During logstore configuration and service deployment, the system automatically creates a logstore in a Simple Log Service project.

Task Mode

You can enable this feature to deploy an inference service as an elastic job service. For more information, see Overview.

Configure Secure Encryption Environment

You can configure the system trust management service to ensure that data, models, and code information can be securely encrypted during service deployment and invocation, implementing a secure and verifiable inference service.

The secure encryption environment is designed for mounted storage files. You need to mount storage files before you enable this feature.

Tracing Analysis

Alibaba Cloud images with the built-in tracing analysis component support the tracing analysis feature. For images without the built-in component, you must add the following commands to enable the tracing analysis feature.

  • Add aliyun-bootstrap -a install && aliyun-instrument python app.py to the Command parameter. This command is used to install the ARMS agent for Python and start the application with the agent. app.py is the main file that provides prediction services for the image.

  • Add aliyun-bootstrap to the Third-party Library Settings parameter. This command is used to download the agent installer from the PyPI repository.

Service Configuration

In the Service Configuration section, the JSON configurations of the model service is displayed in the code editor.

You can add configuration items that are not included in the PAI console. For more information, see Parameters for JSON deployment.

You can use the EASCMD client to deploy the model service based on the JSON configurations. For more information, see Create a service.