All Products
Search
Document Center

Platform For AI:Deploy a model service in the PAI console

Last Updated:Jun 06, 2024

Elastic Algorithm Service (EAS) of Platform for AI (PAI) allows you to deploy trained models as inference services or AI-powered web applications. You can use models you trained or trained models from open source communities. EAS provides multiple methods for you to deploy models that are trained in different ways. EAS also provides various scenarios-based deployment methods that allow you to quickly deploy a model as an online service in the console. This topic describes how to deploy models and manage EAS online services by using the PAI console.

Prerequisites

A trained model is obtained.

Background information

You can deploy models and manage EAS online services in the console.

  • Upload and deploy models in the console

    You can deploy the model in the following methods:

    • Custom deployment: Custom deployment allows you to deploy models in a more flexible manner. You can deploy a model as an AI-powered web application or an inference service by using images, models, or processors.

    • Scenario-based model deployment: EAS provides various scenario-specific deployment solutions that suit different models, such as ModelScope, Hugging Face, Triton, TFServing, Stable Diffusion (for AI painting), and pre-trained large language models (LLMs). EAS provides simplified deployment solutions for these deployment scenarios.

  • Manage online model services

    You can manage deployed model services in the PAI console, such as viewing service details, updating service resource configurations, adding a version for a deployed model service, or scaling resources.

Upload and deploy models in the console

On the Elastic Algorithm Service (EAS) page, you can upload a model that you trained or a public model that you obtained from an open source community and then deploy the model as an online model service.

Step 1: Go to the Elastic Algorithm Service (EAS) page

  1. Log on to the PAI console.

  2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

  3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS). The Elastic Algorithm Service (EAS) page appears.

Step 2: Select a deployment method

  1. On the Inference Service tab, click Deploy Service.

  2. On the page that appears, select a deployment method.

    Deployment method

    Description

    Custom Model Deployment

    Custom Deployment

    A more flexible deployment method. You can quickly deploy a model as an online inference service by using a processor, or by configuring a preset image and third-party code library, mounting models and code, and running commands. For more information, see the Configure parameters for custom deployment section of this topic.

    JSON Deployment

    The model is deployed based on the content of a JSON file. For more information, see the Configure parameters for JSON deployment section of this topic.

    Scenario-based Model Deployment

    Note

    For information about the parameters of each scenario, see the Configure parameters for scenario-based deployment section of this topic.

    AI Painting - SD Web UI Deployment

    This method allows you to quickly deploy an AI painting service based on an open source SD web application and call the deployed service by using the web application or API operations. EAS isolates users and computing resources to implement enterprise-level applications.

    Large Language Model (LLM)

    This method allows you to quickly deploy an LLM as a web application that you can call by using the web page and API operations. You can use LangChain to integrate the application with your business data to build an enterprise knowledge base to implement intelligent dialogue and other automated services. You can also use the built-in inference acceleration provided by PAI-Blade to implement simplified model deployment in a cost-effective manner.

    RAG-based LLM Chatbot Deployment

    This method allows you to deploy an intelligent dialogue system based on an LLM and the Retrieval-Augmented Generation (RAG) technique. The system is suitable for Q&A, summarization, and other natural language processing tasks that rely on custom knowledge bases.

    AI Video Generation: ComfyUI-based Deployment

    This method allows you to deploy web applications for AI video generation based on the ComfyUI and Stable Video Diffusion models. EAS can help you quickly implement AI-powered text-to-video generation for industries such as short video platforms and animation production.

    ModelScope Model Deployment

    This method allows you to quickly deploy an open source ModelScope model and start model services.

    Hugging Face Model Deployment

    This method allows you to quickly deploy an open source Hugging Face model and start model services.

    Triton Deployment

    This method allows you to quickly deploy a model that uses an AI framework, such as TensorRT, TensorFlow, PyTorch, or ONNX as an online inference service by using the Trition Server inference service.

    TensorFlow Serving deployment

    This method allows you to quickly deploy a model in the standard SavedModel format as an online service by using the TensorFlow Serving engine.

Step 3: Deploy the service

Configure the parameters based on the deployment method. After you configure the parameters, click Deploy. When the service status changes to Running, the service is deployed.

Configure parameters for custom deployment

  1. On the Create Service page, configure the parameters in the Model Service Information section.

    • Service Name: Select a service name as prompted.

    • Deployment Method: The following deployment methods are supported: Deploy Service by Using Image, Deploy Web App by Using Image, and Deploy Service by Using Model and Processor.

      Note

      In complex model inference scenarios, such as AI content generation and video processing, inference takes a long time. We recommend that you turn on Asynchronous Service to implement the asynchronous inference service. For more information, see Asynchronous inference services. The asynchronous inference service is available only when the Deployment Method parameter is set to Deploy Service by Using Image or Deploy Service by Using Model and Processor.

      • Deploy Service by Using Image: Select this deployment method if you want to quickly deploy AI inference services by mounting images, code, and models.

      • Deploy Web App by Using Image: Select this deployment method if you want to quickly deploy the web application by mounting images, code, and models.

      • Deploy Service by Using Model and Processor: Select this deployment method if you want to deploy AI inference services by using models and processors, such as built-in processors or custom processors. For more information, see Deploy model services by using built-in processors and Deploy services by using custom processors.

      Deploy a service or web application by using an image

      The following table describes the parameters if you set the Deployment Method parameter to Deploy Service by Using Image or Deploy Web App by Using Image.

      Parameter

      Description

      Select Image

      Valid values:

      • PAI Image: Select an Alibaba Cloud image.

      • Custom Image: Select a custom image. For more information about how to create a custom image, see View and add images.

      • Image Address: The URL of the image that is used to deploy the model service. Example: registry.cn-shanghai.aliyuncs.com/xxx/image:tag. You can specify the address of an image provided by PAI or a custom image. For more information about how to obtain the image address, see View and add images.

        Important

        The specified image must be in the same region as the service that you want to deploy.

        If you want to use an image from a private repository, click enter and specify the username and password of the image repository.

      Specify Model Settings

      Click Specify Model Settings to configure the model. You can use one of the following methods to configure model files:

      • Mount OSS Path

        • The path of the source Object Storage Service (OSS) bucket.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

      • Mount NAS File System

        • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

        • NAS Source Path: the NAS path where the files are stored.

        • Mount Path: the mount path of the service instance. The mount path is used to read files from the NAS file system.

      • Mount PAI Model

        • Set Model Name and Model Version for an existing model that you want to use. For more information about how to view registered models, see Register and manage models.

        • Mount Path: the mount path of the service instance. The mount path is used to read the model file.

      Code Settings

      Click Specify Code Settings to configure the code. You can use one of the following mounting methods to provide access to the code that is required in the service deployment process.

      • Mount OSS Path

        • The path of the source OSS bucket.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

      • Mount NAS File System

        • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system.

        • NAS Source Path: the NAS path where the files are stored.

        • Mount Path: the mount path of the service instance. The mount path is used to read files from the specified NAS path.

      • Mount Git Path

        • Git Repository Address: the address of the Git repository.

        • Mount Path: the mount path of the service instance. The path is used to read the code file from the Git directory.

      • Mount PAI Dataset

        • Select an existing dataset. If no dataset is available, you can click Create Dataset to create a dataset.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified PAI dataset.

      • Mount PAI Code

        • Select an existing code build. If no code build is available, you can click Create Code Build to create a code build.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified PAI code build.

      Third-party Libraries

      Click Specify Third-party Libraries to configure the third-party library. Valid values:

      • Third-party Libraries: Specify a third-party library in the field.

      • Path of requirements.txt: Specify the path of the requirements.txt file in the field. You must include the address of the third-party library in the requirements.txt file.

      Environment Variables

      Click Specify Environment Variables to configure environment variables.

      Specify Name and Value for the environment variable.

      • Variable Name: the name of the environment variable.

      • Variable Value: the value of the environment variable.

      Command to Run

      Specify the command to run the image. Example: python/run.py.

      You also need to enter the port number, which is the local HTTP port on which the model service listens after the image is deployed.

      Important

      You cannot specify ports 8080 and 9090 because the EAS engine listens on ports 8080 and 9090.

      Deploy service by using model and processor

      The following table describes the parameters if you set the Deployment Method parameter to Deploy Service by Using Model and Processor.

      Parameter

      Description

      Model File

      Valid values:

      • Mount OSS Path

        Select the OSS path that stores the model file.

      • Upload Data

        1. Select an OSS path in the current region.

        2. Click Browse Local Files and select the local model file that you want to upload. You can also directly drag the model file to the blank area.

      • Publicly Accessible Download URL

        Select Publicly Accessible Download URL. Then, enter a publicly accessible URL in the field below the parameter.

      • Select Model

        Set Model Name and Model Version for an existing model that you want to use. For more information about how to view registered models, see Register and manage models.

      Processor Type

      Specify the type of processor. You can select a built-in official processor or a custom processor based on your business requirements. For more information about built-in official processors, see Built-in processors.

      Model Type

      This parameter is required only if you set the Processor Type parameter to EasyVision(CPU), EasyVision(GPU), EasyTransfer(CPU), EasyTransfer(GPU), EasyNLP, or EasyCV. The available model types vary based on the processor type. You can set the Processor Type and Model Type parameters based on your business requirements.

      Processor Language

      This parameter is available if you set the Processor Type parameter to Custom Processor.

      Valid values: Cpp, Java, and python.

      Processor Package

      This parameter is available if you set the Processor Type parameter to Custom Processor. Valid values:

      • Import OSS File

        Select Import OSS File. Then, select the OSS path in which the processor package is stored.

      • Upload Local File

        1. Select Upload Local File.

        2. Select an OSS path in the current region.

        3. Click the folder icon and select the on-premises processor package that you want to upload. You can also directly drag the processor package to the blank area.

          The package is uploaded to the OSS path in the current region, and the Processor Package parameter is automatically set.

          Note

          You can accelerate the loading speed of a processor during model deployment by uploading an on-premises processor package.

      • Download from Internet

        Select Download from Internet. Then, enter a public URL.

      Processor Main File

      This parameter is available if you set the Processor Type parameter to Custom Processor. This parameter specifies the main file of the processor package.

      Mount Settings

      Click Specify Mount Settings to configure the mounting method. You can use one of the following mounting methods.

      • Mount OSS Path

        • The path of the source OSS bucket.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

      • Mount NAS File System

        • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system.

        • NAS Source Path: the NAS path where the files are stored.

        • Mount Path: the mount path of the service instance. The mount path is used to read files from the specified NAS path.

      • Mount PAI Dataset

        • Select an existing dataset. If no dataset is available, you can click Create Dataset to create a dataset.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified PAI dataset.

      • Mount PAI Code

        • Select an existing code build. If no code build is available, you can click Create Code Build to create a code build.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified PAI code build.

      Environment Variables

      Click Specify Environment Variables to configure environment variables.

      Specify Name and Value for the environment variable.

      • Variable Name: the name of the environment variable.

      • Variable Value: the value of the environment variable.

  2. In the Resource Deployment Information section of the Create Service page, configure the parameters.

    Parameter

    Description

    Resource Group Type

    The type of resource group in which you want to deploy the model. You can deploy the model by using the public resource group or a dedicated resource group. For more information, see Work with dedicated resource groups.

    Note

    If you run a small number of tasks and do not have high requirements on latency, we recommend that you use the public resource group.

    Instance Count

    To prevent risks caused by single-instance deployment, we recommend that you specify multiple service instances.

    If you set Resource Group Type to a dedicated resource group, you must set the CPU, Memory (MB), and GPU parameters for each service instance.

    Resource Configuration Mode

    This parameter is available only if you set the Resource Group Type parameter to Public Resource Group. Valid values:

    • General

      You can select a single CPU or GPU instance type.

    • Cost-effective Resource Configuration

      You can configure multiple instance types or use preemptible instances. For more information, see Specify multiple instance types and Create and use preemptible instances.

      • Preemptible Instance Protection Period: You can set a protection period of 1 hour for a preemptible instance. This means that the system ensures your access to the instance during a protection period of 1 hour.

      • Deployment: You can configure Common instances and Preemptible instances at the same time. Resources are started based on the sequence in which the instance types are configured. You can add up to five resource types. If you use Preemptible instances, you need to set a bid price to bid for preemptible instances.

    Elastic Resource Pool

    This parameter is available only if you set the Resource Group Type parameter to a dedicated resource group.

    You can turn on Elastic Resource Pool and configure your resources based on the instructions in the Resource Configuration Mode section.

    If you enable Elastic Resource Pool and the dedicated resource group that you use to deploy services is fully occupied, the system automatically adds pay-as-you-go instances to the public resource group during scale-outs. The added instances are billed as public resources. The instances in the public resource group are released first during scale-ins. For more information, see Elastic resource pool.

    Extra System Storage

    This parameter is available only if you set the Resource Group Type parameter to Public Resource Group.

    Click Extra System Storage to configure additional system disks for the EAS service. Unit: GB. Valid values: 0 to 2000. You have a free quota of 30 GB on the system disk. If you specify 20 in the field, the available storage space is 50 GB.

    Additional system disks are billed based on their capacity and usage duration. For more information, see Billing of EAS.

  3. Optional. In the VPC Settings section, set the VPC, vSwitch, and Security Group Name parameters to enable VPC direct connection for the EAS service deployed in the public resource group.

    After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). In addition, the EAS services can access other cloud services that reside in the VPC.

  4. In the Configuration Editor section, the configurations of the service are displayed in the code editor. You can add configuration items that are not included in previous steps. For more information, see the "Create a service" section in the Run commands to use the EASCMD client topic.

    对应配置编辑区域

Configure parameters for JSON Deployment

Prepare a JSON file that is used to deploy the service. For more information, see Parameters of model services. On the JSON Deployment page, enter the content of the JSON file in the JSON editor and click Deploy. image

Configure parameters for scenario-based model deployment

The following section describes the parameters for different scenarios.

AI Painting - SD Web UI Deployment

Parameter

Description

Basic information

Service Name

The name of the service.

Edition

Valid values:

  • Standard Edition

    The standard edition is suitable for individual users to deploy common tests and applications, and supports both web application and API calls.

  • API Edition

    The API edition is suitable for scenarios in which you need to integrate your business by calling API operations. The system automatically converts the service into an asynchronous inference service. For more information, see Asynchronous inference services.

  • Cluster Edition WebUI

    The cluster edition is suitable for teamwork scenarios in which multiple members use the web application to generate images. The cluster edition ensures that each user has an independent model and output path. The backend computing resources are shared and scheduled in a centralized manner, which improves cost-effectiveness.

Model Settings

You can specify model settings in the following scenarios: (1) you want to use an open source model that you downloaded from a community or a model that you fine-tuned; (2) you want to save the output data to your data source; (3) you need to install third-party plug-ins or configurations. Click Add to configure model settings. Valid values:

  • Mount OSS: an empty file directory in the OSS bucket. For more information about how to create a bucket, see Create a bucket. For more information about how to create an empty directory, see Manage directories.

  • Mount NAS

    • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system.

    • NAS Source Path: the NAS path where the files are stored.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported. To achieve cost-effectiveness, we recommend that you use the instance type ml.gu7i.c16m60.1-gu30.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

Large Language Model (LLM)

Parameter

Description

Basic information

Service Name

The name of the service.

Model Source

Valid values:

  • Open Source Model: You can select a model from the Model Type drop-down list to quickly load and deploy a built-in LLM without the need to upload your model.

  • Custom Fine-tuned Model: You need to configure model settings to mount the fine-tuned model and set the parameters to deploy the model.

Model Type

Select a model category.

Model Settings

This parameter is required if you set the Model Source parameter to Custom Fine-tuned Model.

Valid values:

  • Mount OSS: the OSS bucket directory in which the fine-tuned model is stored.

  • Mount NAS

    • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system.

    • NAS Source Path: the source path of the NAS file system in which the fine-tuned model is stored.

  • Mount PAI Model: Select a registered model by specifying the model name and model version. For more information about how to register models, see Register and manage models.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported. To achieve cost-effectiveness, we recommend that you use the instance type ml.gu7i.c16m60.1-gu30.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

RAG-based LLM Chatbot Deployment

Parameter

Description

Basic information

Service Name

The name of the service.

Model Source

Valid values:

  • Open Source Model: You can select a model from the Model Type drop-down list to quickly load and deploy a built-in LLM without the need to upload your model.

  • Custom Fine-tuned Model: You need to configure model settings to mount the fine-tuned model and set the parameters to deploy the model.

Model Type

Select a model category.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

  • If you set Model Source to Open Source Model, the system automatically selects an instance type based on the selected model type as the default value.

  • If you set Model Source to Custom Fine-tuned Model, you need to select an instance type that matches the model. For more information, see Deploy LLM applications in EAS.

Inference Acceleration

Inference acceleration can be enabled for the Qwen, Llama2, ChatGLM, or Baichuan2 model that is deployed on A10 or GU30 instances. Valid values:

  • BladeLLM Inference Acceleration: The BladeLLM inference acceleration engine ensures high concurrency and low latency. You can use BladeLLM to accelerate LLM inference in a cost-effective manner.

  • Open-source vLLM Inference Acceleration

Vector Database Settings

Select a database as your vector database. Valid values: FAISS, Elasticsearch, Milvus, Hologres, and AnalyticDB.

VPC Configuration (Optional)

VPC

  • If you use Hologres, AnalyticDB for PostgreSQL, Elasticsearch, or Milvus to build a vector database, select the VPC in which the vector database is deployed.

  • If you use Faiss to build a vector database, you do not need to configure the VPC.

vSwitch

Security Group Name

AI Video Generation: ComfyUI-based Deployment

Parameter

Description

Basic information

Service Name

The name of the model service.

Edition

The edition of the service. Valid values:

  • Standard Edition: suitable for single users to call the service by using the web UI and supports service calls by using the web UI or API operations

  • API Edition: suitable for high-concurrency scenarios. The system automatically deploys the service as an asynchronous service. This edition supports service calls only by using API operations.

  • Cluster Edition WebUI: suitable for multiple users to call the service by using the web UI at the same time. This edition supports service calls only by using the web UI. For information about how a Cluster Edition service works, see the Principles of the Cluster Edition service section of this topic.

For information about the scenarios of each edition, see the Background information section of this topic.

Model Settings

If you select API Edition or Standard Edition and call the service by using API operations, click Add to configure the model to obtain the inference result. Valid values:

  • Mount OSS: Click the image icon to select an existing OSS directory.

  • Mount NAS: Configure a NAS mount target and NAS source path.

Resource Configuration

Resource Configuration

We recommend that you use the GU30, A10 or T4 GPU types. By default, the system uses the GPU-accelerated ml.gu7i.c16m60.1-gu30 instance type to ensure cost-effectiveness.

ModelScope Model Deployment

Parameter

Description

Basic information

Service Name

The name of the service.

Select Model

Select a ModelScope model from the drop-down list.

Model Version

Select a model version from the drop-down list. By default, the latest version is used.

Model Type

After you select a model, the system automatically specifies the Model Type parameter.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

Hugging Face Model Deployment

Parameter

Description

Basic information

Service Name

The name of the service.

Model ID

The ID of the Hugging Face model. Example: distilbert-base-uncased-finetuned-sst-2-english.

Model Type

The type of the Hugging Face model. Example: text-classification.

Model Version

The version of the Hugging Face model. Example: main.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

Triton Deployment

Parameter

Cluster Description

Basic information

Service Name

The name of the service.

Model Settings

Make sure that the model you deploy meets the structure requirements of Trition. For more information, see Model deployment by using Triton Server. After you prepare the model, select one of the following method to deploy the model:

  • Mount OSS: Select the OSS bucket directory in which the model is stored.

  • Mount NAS

    • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

    • NAS Source Path: the source path of the model in NAS.

  • Mount PAI Model: Select a registered model by specifying the model name and model version. For more information about how to register models, see Register and manage models.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only public resources are supported.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

TensorFlow Serving Deployment

Parameter

Description

Basic information

Service Name

The name of the service.

Deployment Mode

The following deployment methods are supported:

  • Standard Model Deployment: used to deploy a single-model service.

  • Configuration File Deployment: used to deploy a multi-model service.

Model Settings

TensorFlow Serving has specific structure requirements for deployed models. For more information, see Model deployment by using TensorFlow Serving.

  • If you set the Deployment Method parameter to Standard Model Deployment, you must configure the OSS bucket directory in which the model file is stored.

  • If you set the Deployment Method parameter to Configuration File Deployment, you must configure the following parameters:

    • OSS: the OSS bucket directory in which the model is stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read the model file.

    • Configuration File: the OSS path in which the configuration file is stored.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

Manage online model services in EAS

On the Inference Service tab of the Elastic Algorithm Service (EAS) page, you can view deployed services, and stop, start, or delete services.

Warning

If you stop or delete a model service, requests that rely on the model service fail. Proceed with caution.

  • View service details

    • Click the name of the service that you want to manage to go to the Service Details page. On the Service Details page, you can view the basic information, instances, and configurations of the service.

    • On the Service Details page, you can click different tabs to view information about service monitoring, logs, and deployment events.

  • Update service resource configurations

    On the Service Details tab, click Resource Configuration in the Resource Information section. In the Resource Configuration dialog box, update the resources that are used to run the service. For more information, see Upload and deploy models in the console.

  • Add a version for a deployed model service

    On the EAS-Online Model Services page, find the service that you want to update and click Update Service in the Actions column. For more information, see Upload and deploy models in the console.

    Warning

    When you add a version for a model service, the service is temporarily interrupted. Consequently, the requests that rely on the service fail until the service recovers. Proceed with caution.

    After you update the service, click the version number in the Current Version column to view the Version Information or change the service version. image

  • Scale resources

    On the EAS-Online Model Services page, find the service that you want to manage and click Scale in the Actions column. In the Scale dialogue box, specify the number of Instances to adjust the instances that are used to run the model service.

  • Enable auto scaling

    You can configure automatic scaling for the service to enable the service to automatically adjust the resources that are used to run the online model services in EAS based on your business requirements. For more information, see the "Method 1: Manage the horizontal auto scaling feature in the console" section in the Enable or disable the horizontal auto-scaling feature topic.

References