All Products
Search
Document Center

Platform For AI:Deploy an online portrait service as a scalable job

Last Updated:Nov 28, 2025

This topic describes how to deploy an online portrait service as a scalable job to perform inference, addressing issues such as underutilization of resources and request interruptions during scale-in.

Prerequisites

  • A virtual private cloud (VPC) is created and Internet access is enabled for the VPC.

    1. A VPC, vSwitch, and security group are created. For more information, see Create a VPC with an IPv4 CIDR block and Create a security group.

    2. An Internet NAT gateway is created in the VPC. An elastic IP address (EIP) is associated with the gateway and SNAT entries are configured on the gateway. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.

  • For model training and portrait creation, 5 to 20 training images and 1 template image are prepared. The following image formats are supported: .jpg, .jpeg, and .png. Make sure that the size of each image is greater than 512 x 512 pixels.

    • Single-person portrait: The template image must contain the face of a person. The faces in multiple training images belong to the same person.

    • Multi-person portrait: The template image must contain multiple faces, and the number of faces must be the same as the value of the model_id parameter specified for model training.

  • An Object Storage Service (OSS) bucket is created. For more information, see Create a bucket.

Limits

The AI portrait solution is supported only in the China (Beijing) and Singapore regions.

Deploy a scalable job for model inference

Deploy a verification service

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

  2. Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

  3. On the Custom Deployment page, configure the parameters. The following tables describe the key parameters. Use the default settings for other parameters. For more information, see Custom deployment.

    • In the Basic Information section, enter the service name, such as photog_check.

    • In the Environment Information section, configure the parameters in the following table.

      Parameter

      Description

      Deployment Method

      Select Image-based Deployment and enable Asynchronous Queue.

      Image Configuration

      Select Image Address and specify the image address. Valid values:

      • registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check. 1.0.0.pub: the image address in the China (Beijing) region.

      • registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:check. 1.0.0.pub: the image address in the Singapore region.

      Code Build

      Click OSS and configure the following parameters:

      • Uri: Select an OSS bucket path. Example: oss://examplebucket/.

      • Mount Path: Enter /photog_oss.

      Command

      Enter python app.py.

      Port Number

      Enter 7860.

    • In the Resource Information section, configure the parameters in the following table.

      Parameter

      Description

      Resource Type

      Select Public Resources.

      Deployment Resources

      Select a -gu30 instance type on the GPU tab. We recommend that you use the ml.gu7i.c32m188.1-gu30 instance type.

      Configure a system disk

      Set the value to 120. Unit: GiB.

    • In the Asynchronous Queue section, configure the parameters in the following table.

      Parameter

      Description

      Resource Type

      Select Public Resources.

      Deployment

      • Minimum Instances: 1

      • vCPUs: 8 Cores

      • Memory (GB): 64 GB

      Maximum Data for A Single Input Request

      Set the value to 20480 KB to ensure that storage space is sufficient for each request in the queue.

      Maximum Data for A Single Output

    • In the Network information section, select a VPC configuration, vSwitch, and security group that you created.

    • In the Service Configuration section, add the following options. For more information, see the complete configuration example.

      Field

      New option

      metadata

      Add the following options:

      "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              }
      • keepalive: the maximum processing time of a single request. Unit: milliseconds. Set the value to 3600000.

      • worker_threads: the number of threads that are used to concurrently process requests in each Elastic Algorithm Service (EAS) instance.

        The default value is 5, which specifies that the first five jobs in the queue are assigned to the same instance. To ensure that requests are queued for processing in order, we recommend that you set this option to 1.

      queue

      Add the "max_delivery": 1 option to prevent repeated delivery after failure.

      Example of the complete configuration:

      {
          "metadata": {
              "name": "photog_check",
              "instance": 1,
              "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              },
              "type": "Async"
          },
          "cloud": {
              "computing": {
                  "instance_type": "ml.gu7i.c32m188.1-gu30",
                  "instances": null
              },
              "networking": {
                  "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                  "security_group_id": "sg-2ze0kgiee55d0fn4****",
                  "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
              }
          },
          "features": {
              "eas.aliyun.com/extra-ephemeral-storage": "100Gi"
          },
          "queue": {
              "cpu": 8,
              "max_delivery": 1,
              "min_replica": 1,
              "memory": 64000,
              "resource": "",
              "source": {
                  "max_payload_size_kb": 20480
              },
              "sink": {
                  "max_payload_size_kb": 20480
              }
          },
          "storage": [
              {
                  "oss": {
                      "path": "oss://examplebucket/",
                      "readOnly": false
                  },
                  "properties": {
                      "resource_type": "code"
                  },
                  "mount_path": "/photog_oss"
              }
          ],
          "containers": [
              {
                  "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub",
                  "script": "python app.py",
                  "port": 7860
              }
          ]
      }
    • Click Deploy.

Deploy a training service

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

  2. Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

  3. On the Custom Deployment page, configure the parameters. The following tables describe the key parameters. Use the default settings for other parameters. For more information, see Custom deployment.

    • In the Basic Information section, enter the Service Name, such as photog_check.

    • In the Environment Information section, configure the parameters in the following table.

      Parameter

      Description

      Deployment Method

      Select Image-based Deployment and enable Asynchronous Queue.

      Image Configuration

      Select Image Address and specify the image address. Valid values:

      • Image address in the China (Beijing) region: registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train. 1.0.0.pub.

      • Image address in the Singapore region: registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:train. 1.0.0.pub.

      Code Build

      Click OSS and configure the following parameters:

      • Uri: Select the OSS bucket path that you specified for the verification service. Example: oss://examplebucket/.

      • Mount Path: In this example, /photog_oss is used.

      Command

      Enter python app.py.

      Port Number

      Enter 7860.

    • In the Resource Information section, configure the parameters in the following table.

      Parameter

      Description

      Resource Type

      Select Public Resources.

      Deployment Resources

      Select a -gu30 instance type on the GPU tab. We recommend that you use the ml.gu7i.c32m188.1-gu30 instance type.

      Configure a system disk

      Set the value to 120. Unit: GiB.

    • In the Asynchronous Queue section, configure the parameters in the following table.

      Parameter

      Description

      Resource Type

      Select Public Resources.

      Deployment

      • Minimum Instances: 1

      • vCPUs: 8 Cores

      • Memory (GB): 64 GB

      Maximum Data for A Single Input Request

      Set the value to 20480 KB to ensure that storage space is sufficient for each request in the queue.

      Maximum Data for A Single Output

    • In the Network information section, select a VPC configuration, vSwitch, and security group that you created.

    • In the Service Configuration section, add the following options. For more information, see the complete configuration example.

      Field

      Added options

      autoscaler

      Optional. Configurations for automatic scaling of the service. For more information, see Horizontal auto scaling.

      "behavior": {
        "scaleDown": {
          "stabilizationWindowSeconds": 60
        }
      },
      "max": 5,
      "min": 1,
      "strategies": {
                  "queue[backlog]": 1
      }

      metadata

      Add the following options:

      "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              }
      • keepalive: the maximum processing time of a single request. Unit: milliseconds. Set the value to 3600000.

      • worker_threads: the number of threads that are used to concurrently process requests in each EAS instance.

        The default value is 5, which specifies that the first five tasks that are in the queue are assigned to the same instance. To ensure that requests are queued for processing in order, we recommend that you set this option to 1.

      queue

      Add the "max_delivery": 1 option to prevent repeated delivery after failure.

      Example of the complete configuration:

      {
          "autoscaler": {
              "behavior": {
                  "scaleDown": {
                      "stabilizationWindowSeconds": 60
                  }
              },
              "max": 5,
              "min": 1,
              "strategies": {
                  "queue[backlog]": 1
              }
          },
          "metadata": {
              "name": "photog_train_pmml",
              "instance": 1,
              "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              },
              "type": "Async"
          },
          "cloud": {
              "computing": {
                  "instance_type": "ml.gu7i.c32m188.1-gu30",
                  "instances": null
              },
              "networking": {
                  "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                  "security_group_id": "sg-2ze0kgiee55d0fn4****",
                  "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
              }
          },
          "features": {
              "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
          },
          "queue": {
              "cpu": 8,
              "max_delivery": 1,
              "min_replica": 1,
              "memory": 64000,
              "resource": "",
              "source": {
                  "max_payload_size_kb": 20480
              },
              "sink": {
                  "max_payload_size_kb": 20480
              }
          },
          "storage": [
              {
                  "oss": {
                      "path": "oss://examplebucket/",
                      "readOnly": false
                  },
                  "properties": {
                      "resource_type": "code"
                  },
                  "mount_path": "/photog_oss"
              }
          ],
          "containers": [
              {
                  "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub",
                  "script": "python app.py",
                  "port": 7860
              }
          ]
      }
  4. Click Deploy.

Deploy a prediction service

In this example, a prediction service is deployed as a scalable job. Perform the following steps:

  1. Click Deploy Service. In the Custom Model Deployment section, click JSON Deployment.

  2. Enter the following configuration information in the JSON editor.

    {
        "metadata": {
            "name": "photog_pre_pmml",
            "instance": 1,
            "rpc": {
                "keepalive": 3600000,
                "worker_threads": 1
            },
            "type": "ScalableJob"
        },
        "cloud": {
            "computing": {
                "instance_type": "ecs.gn6v-c8g1.2xlarge",
                "instances": null
            },
            "networking": {
                "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                "security_group_id": "sg-2ze0kgiee55d0fn4****",
                "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
            }
        },
        "features": {
            "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
        },
        "queue": {
            "cpu": 8,
            "max_delivery": 1,
            "min_replica": 1,
            "memory": 64000,
            "resource": "",
            "source": {
                "max_payload_size_kb": 20480
            },
            "sink": {
                "max_payload_size_kb": 20480
            }
        },
        "storage": [
            {
                "oss": {
                    "path": "oss://examplebucket/",
                    "readOnly": false
                },
                "properties": {
                    "resource_type": "code"
                },
                "mount_path": "/photog_oss"
            }
        ],
        "containers": [
            {
                "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:infer.1.0.0.pub",
                "env": [
                    {
                        "name": "URL",
                        "value": "http://127.0.0.1:8000"
                    },
                    {
                        "name": "AUTHORIZATION",
                        "value": "="
                    }
                ],
                "script": "python app.py",
                "port": 7861
            },
            {
                "image": "eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2",
                "port": 8000,
                "script": "./webui.sh --listen --port 8000 --skip-version-check --no-hashing --no-download-sd-model --skip-install --api --filebrowser --sd-dynamic-cache --data-dir /photog_oss/photog/webui/"
            }
        ]
    }

    The following table describes the key parameters. For more information about how to configure other parameters, see Parameters related to the service model.

    Parameter

    Description

    metadata

    name

    The service name, which is unique in the region.

    type

    Set the value to ScalableJob to deploy the asynchronous inference service as a scalable job.

    containers

    image

    You need to specify the image addresses of the AI portrait prediction service and the web UI prediction service. In this example, the image addresses in the China (Beijing) region is used. Valid values:

    • Image addresses in the China (Beijing) region:

      • AI portrait prediction service: registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:infer. 1.0.0.pub.

      • Web UI prediction service: eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2.

    • Image addresses in the Singapore region:

      • AI portrait prediction service: registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:infer. 1.0.0.pub.

      • Web UI prediction service: eas-registry-vpc.ap-southeast-1.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2.

    storage

    path

    In this example, OSS mounting is used. Set the value to the path of your OSS bucket that you specified for the verification service. Example: oss://examplebucket/.

    Download and decompress the WebUI model file. Save the file in the OSS bucket. In this example, the oss://examplebucket/photog_oss/webui path is used. For information about how to upload objects to an OSS bucket, see ossutil overview. For information about how to upload files to a File Storage NAS file system, see Mount a file system on a Linux ECS instance and Manage files.

    mount_path

    Set the value to /photog_oss.

  3. Click Deploy.

    After you deploy a scalable job, the system automatically creates a queue service and enables the auto scaling feature for the service.

Call a service

After you deploy a service, you can call the service to implement AI portrait.

When you call a service, you must set the taskType parameter to query to specify that the request is an inference request. For more information, see the Call the service. Sample code:

import json
from eas_prediction import QueueClient

# Create input queue objects to receive input data. 
input_queue = QueueClient('182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'photog_check')
input_queue.set_token('<token>')
input_queue.init()

datas = json.dumps(
    {
       'request_id'    : 12345,
       'images'        : ["xx.jpg", "xx.jpg"], # urls, a list
       'configure'     : {
            'face_reconize' : True, # Judge whether all pictures are of a person
        }
    }
)
# Set the taskType parameter to query. 
tags = {"taskType": "query"}
index, request_id = input_queue.put(f'{datas}', tags)
print(index, request_id)

# View details about the input queues. 
attrs = input_queue.attributes()
print(attrs)

References