All Products
Search
Document Center

Platform For AI:Deploy an online inference service for AI portraits

Last Updated:Apr 01, 2026

Use the Scalable Job service for AI Portrait inference to prevent resource underutilization and request interruptions during scale-in.

Prerequisites

  • A virtual private cloud (VPC) is created and Internet access is enabled for the VPC.

    1. A VPC, vSwitch, and security group are created. For more information, see VPCs and vSwitchesand Use security groups.

    2. An Internet NAT gateway is created in the VPC. An elastic IP address (EIP) is associated with the gateway and SNAT entries are configured on the gateway. For more information, see Internet NAT gateway.

  • For model training and portrait creation, 5 to 20 training images and 1 template image are prepared. The following image formats are supported: .jpg.jpeg, and .png. Make sure that the size of each image is greater than 512 x 512 pixels.

    • Single-person portrait: The template image must contain the face of a person. The faces in multiple training images belong to the same person.

    • Multi-person portrait: The template image must contain multiple faces, and the number of faces must be the same as the value of the model_id parameter specified for model training.

  • An OSS bucket is created. For more information, see Create a bucket.

Limitations

The AI portrait solution is available only in the China (Beijing) and Singapore regions.

Deploy a scalable job service for inference

Deploy a verification service

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

  2. Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

  3. On the Custom Deployment page, configure the following parameters. Use default values for the rest. For more information, see Custom Deployment.

    • In the Basic Information section, set the service name. Example: photog_check.

    • In the Environment Information section, configure the following parameters:

      Parameter

      Description

      Deployment Method

      Select Image-based Deployment and then select Asynchronous Queue.

      Image Configuration

      Select Image Address and enter the image address:

      1. China (Beijing): registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub.

      2. Singapore: registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub.

      Code Build

      Select OSS as the mount type and set the following parameters:

      1. Uri: Your OSS bucket path. Example: oss://examplebucket/.

      2. Mount Path: Set to /photog_oss.

      Command to Run

      Set to python app.py.

      Port Number

      Set to 7860.

    • In the Resource Information section, set the following parameters:

      Parameter

      Description

      Resource Type

      Select Public Resources.

      Deployment

      Select a GU30-series instance type on the GPU tab. Recommended: ml.gu7i.c32m188.1-gu30.

      Configure a system disk

      Set to 120 GiB.

    • In the Asynchronous Queue section, set the following parameters:

      Parameter

      Description

      Resource Type

      Select Public Resources.

      Deployment

      • Number of replicas: 1

      • CPU (cores): 8

      • Memory (GB): 64

      Maximum Data for A Single Input Request

      Set to 20480 KB to ensure sufficient storage for each request in the queue.

      Maximum Data for A Single Output

    • In the Service Access section, select the VPC, vSwitch, and security group you created.

    • In the Service Configurations section, add the following configurations. Refer to the complete configuration example below for the new parameters.

      Field

      New parameters

      metadata

      Add the following parameters:

      {
          "metadata": {
              "name": "photog_check",
              "instance": 1,
              "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              },
              "type": "Async"
          },
          "cloud": {
              "computing": {
                  "instance_type": "ml.gu7i.c32m188.1-gu30",
                  "instances": null
              },
              "networking": {
                  "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                  "security_group_id": "sg-2ze0kgiee55d0fn4****",
                  "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
              }
          },
          "features": {
              "eas.aliyun.com/extra-ephemeral-storage": "100Gi"
          },
          "queue": {
              "cpu": 8,
              "max_delivery": 1,
              "min_replica": 1,
              "memory": 64000,
              "resource": "",
              "source": {
                  "max_payload_size_kb": 20480
              },
              "sink": {
                  "max_payload_size_kb": 20480
              }
          },
          "storage": [
              {
                  "oss": {
                      "path": "oss://examplebucket/",
                      "readOnly": false
                  },
                  "properties": {
                      "resource_type": "code"
                  },
                  "mount_path": "/photog_oss"
              }
          ],
          "containers": [
              {
                  "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub",
                  "script": "python app.py",
                  "port": 7860
              }
          ]
      }
      1. keepalive: Maximum processing time for a single request, in milliseconds. Set to 3600000.

      2. worker_threads: Number of concurrent processing threads per Elastic Algorithm Service (EAS) instance.

        Default value: 5, which means the first five queued tasks are assigned to the same instance. Set to 1 to process requests sequentially.

      queue

      Add "max_delivery": 1 to prevent repeated deliveries after a failure.

      Complete configuration example

      {
          "metadata": {
              "name": "photog_check",
              "instance": 1,
              "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              },
              "type": "Async"
          },
          "cloud": {
              "computing": {
                  "instance_type": "ml.gu7i.c32m188.1-gu30",
                  "instances": null
              },
              "networking": {
                  "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                  "security_group_id": "sg-2ze0kgiee55d0fn4****",
                  "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
              }
          },
          "features": {
              "eas.aliyun.com/extra-ephemeral-storage": "100Gi"
          },
          "queue": {
              "cpu": 8,
              "max_delivery": 1,
              "min_replica": 1,
              "memory": 64000,
              "resource": "",
              "source": {
                  "max_payload_size_kb": 20480
              },
              "sink": {
                  "max_payload_size_kb": 20480
              }
          },
          "storage": [
              {
                  "oss": {
                      "path": "oss://examplebucket/",
                      "readOnly": false
                  },
                  "properties": {
                      "resource_type": "code"
                  },
                  "mount_path": "/photog_oss"
              }
          ],
          "containers": [
              {
                  "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub",
                  "script": "python app.py",
                  "port": 7860
              }
          ]
      }
  4. Click Deploy.

Deploy a training service

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

  2. Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

  3. On the Custom Deployment page, configure the following parameters. Use default values for the rest. For more information, see Custom Deployment.

    • In the Basic Information section, set the service name. Example: photog_train_pmml.

    • In the Environment Information section, set the following parameters:

      Parameter

      Description

      Deployment Method

      Select Image-based Deployment and then select Asynchronous Queue.

      Image Configuration

      Select Image Address and enter the image address:

      1. China (Beijing): registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub.

      2. Singapore: registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub.

      Code configuration

      Select OSS as the mount type and set the following parameters:

      1. Uri: Path to your OSS bucket. Must match the path specified for the verification service. Example: oss://examplebucket/.

      2. Mount Path: Set to /photog_oss.

      Command to Run

      Set to python app.py.

      Port Number

      Set to 7860.

    • In the Resource Information section, set the following parameters:

      Parameter

      Description

      Resource Type

      Select Public Resources.

      Deployment

      Select a GU30-series instance type on the GPU tab. Recommended: ml.gu7i.c32m188.1-gu30.

      Configure a system disk

      Set to 120 GiB.

    • In the Asynchronous Queue section, set the following parameters:

      Parameter

      Description

      Resource Type

      Select Public Resources.

      Deployment

      • Number of replicas: 1

      • CPU (cores): 8

      • Memory (GB): 64

      Maximum Data for A Single Input Request

      Set to 20480 KB to ensure sufficient storage for each request in the queue.

      Maximum Data for A Single Output

    • In the Service Access section, select the VPC, vSwitch, and security group you created.

    • In the Service Configurations section, add the following configurations. Refer to the complete configuration example below for the new parameters.

      Field

      New parameters

      autoscaler

      (Optional) Horizontal auto scaling configuration. For more information, see Horizontal auto scaling.

      {
          "autoscaler": {
              "behavior": {
                  "scaleDown": {
                      "stabilizationWindowSeconds": 60
                  }
              },
              "max": 5,
              "min": 1,
              "strategies": {
                  "queue[backlog]": 1
              }
          },
          "metadata": {
              "name": "photog_train_pmml",
              "instance": 1,
              "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              },
              "type": "Async"
          },
          "cloud": {
              "computing": {
                  "instance_type": "ml.gu7i.c32m188.1-gu30",
                  "instances": null
              },
              "networking": {
                  "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                  "security_group_id": "sg-2ze0kgiee55d0fn4****",
                  "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
              }
          },
          "features": {
              "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
          },
          "queue": {
              "cpu": 8,
              "max_delivery": 1,
              "min_replica": 1,
              "memory": 64000,
              "resource": "",
              "source": {
                  "max_payload_size_kb": 20480
              },
              "sink": {
                  "max_payload_size_kb": 20480
              }
          },
          "storage": [
              {
                  "oss": {
                      "path": "oss://examplebucket/",
                      "readOnly": false
                  },
                  "properties": {
                      "resource_type": "code"
                  },
                  "mount_path": "/photog_oss"
              }
          ],
          "containers": [
              {
                  "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub",
                  "script": "python app.py",
                  "port": 7860
              }
          ]
      }

      metadata

      Add the following parameters:

      {
          "metadata": {
              "name": "photog_check",
              "instance": 1,
              "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              },
              "type": "Async"
          },
          "cloud": {
              "computing": {
                  "instance_type": "ml.gu7i.c32m188.1-gu30",
                  "instances": null
              },
              "networking": {
                  "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                  "security_group_id": "sg-2ze0kgiee55d0fn4****",
                  "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
              }
          },
          "features": {
              "eas.aliyun.com/extra-ephemeral-storage": "100Gi"
          },
          "queue": {
              "cpu": 8,
              "max_delivery": 1,
              "min_replica": 1,
              "memory": 64000,
              "resource": "",
              "source": {
                  "max_payload_size_kb": 20480
              },
              "sink": {
                  "max_payload_size_kb": 20480
              }
          },
          "storage": [
              {
                  "oss": {
                      "path": "oss://examplebucket/",
                      "readOnly": false
                  },
                  "properties": {
                      "resource_type": "code"
                  },
                  "mount_path": "/photog_oss"
              }
          ],
          "containers": [
              {
                  "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub",
                  "script": "python app.py",
                  "port": 7860
              }
          ]
      }
      1. keepalive: Maximum processing time for a single request, in milliseconds. Set to 3600000.

      2. worker_threads: Number of concurrent processing threads per EAS instance.

        Default value: 5, which means the first five queued tasks are assigned to the same instance. Set to 1 to process requests sequentially.

      queue

      Add "max_delivery": 1 to prevent multiple redeliveries after a failure.

      Complete configuration example

      {
          "autoscaler": {
              "behavior": {
                  "scaleDown": {
                      "stabilizationWindowSeconds": 60
                  }
              },
              "max": 5,
              "min": 1,
              "strategies": {
                  "queue[backlog]": 1
              }
          },
          "metadata": {
              "name": "photog_train_pmml",
              "instance": 1,
              "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              },
              "type": "Async"
          },
          "cloud": {
              "computing": {
                  "instance_type": "ml.gu7i.c32m188.1-gu30",
                  "instances": null
              },
              "networking": {
                  "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                  "security_group_id": "sg-2ze0kgiee55d0fn4****",
                  "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
              }
          },
          "features": {
              "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
          },
          "queue": {
              "cpu": 8,
              "max_delivery": 1,
              "min_replica": 1,
              "memory": 64000,
              "resource": "",
              "source": {
                  "max_payload_size_kb": 20480
              },
              "sink": {
                  "max_payload_size_kb": 20480
              }
          },
          "storage": [
              {
                  "oss": {
                      "path": "oss://examplebucket/",
                      "readOnly": false
                  },
                  "properties": {
                      "resource_type": "code"
                  },
                  "mount_path": "/photog_oss"
              }
          ],
          "containers": [
              {
                  "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub",
                  "script": "python app.py",
                  "port": 7860
              }
          ]
      }
  4. Click Deploy.

Deploy a prediction service

The prediction service is deployed as a Scalable Job service.

  1. Click Deploy Service. In the Custom Model Deployment section, click JSON Deployment.

  2. Enter the configuration in the JSON editor.

    {
        "metadata": {
            "name": "photog_pre_pmml",
            "instance": 1,
            "rpc": {
                "keepalive": 3600000,
                "worker_threads": 1
            },
            "type": "ScalableJob"
        },
        "cloud": {
            "computing": {
                "instance_type": "ecs.gn6v-c8g1.2xlarge",
                "instances": null
            },
            "networking": {
                "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                "security_group_id": "sg-2ze0kgiee55d0fn4****",
                "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
            }
        },
        "features": {
            "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
        },
        "queue": {
            "cpu": 8,
            "max_delivery": 1,
            "min_replica": 1,
            "memory": 64000,
            "resource": "",
            "source": {
                "max_payload_size_kb": 20480
            },
            "sink": {
                "max_payload_size_kb": 20480
            }
        },
        "storage": [
            {
                "oss": {
                    "path": "oss://examplebucket/",
                    "readOnly": false
                },
                "properties": {
                    "resource_type": "code"
                },
                "mount_path": "/photog_oss"
            }
        ],
        "containers": [
            {
                "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:infer.1.0.0.pub",
                "env": [
                    {
                        "name": "URL",
                        "value": "http://127.0.0.1:8000"
                    },
                    {
                        "name": "AUTHORIZATION",
                        "value": "="
                    }
                ],
                "script": "python app.py",
                "port": 7861
            },
            {
                "image": "eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2",
                "port": 8000,
                "script": "./webui.sh --listen --port 8000 --skip-version-check --no-hashing --no-download-sd-model --skip-install --api --filebrowser --sd-dynamic-cache --data-dir /photog_oss/webui/"
            }
        ]
    }

    The following table describes key parameters. For details about other parameters, see JSON Deployment.

    Parameter

    Description

    metadata

    Name

    Service name. Must be unique within the region.

    Type

    Set to ScalableJob to deploy the asynchronous inference service as a Scalable Job service.

    containers

    Image

    Image addresses for the AI portrait prediction service and WebUI prediction service. The following list provides supported images. This solution uses images for the China (Beijing) region.

    1. Image addresses for China (Beijing):

      1. AI portrait prediction service: registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:infer.1.0.0.pub.

      2. WebUI prediction service: eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2.

    2. Image addresses for Singapore:

      1. AI portrait prediction service: registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:infer.1.0.0.pub.

      2. WebUI prediction service: eas-registry-vpc.ap-southeast-1.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2.

    storage

    Path

    OSS mount path. Use the same OSS bucket path as the verification service. Example: oss://examplebucket/.

    Download and extract the model files required by WebUI, and store them in your OSS bucket at oss://examplebucket/photog_oss/webui with the directory structure shown below. For more information about uploading files to OSS, see Command-line tool ossutil 1.0. For more information about uploading files to NAS, see Quick start (Linux) and Use Workbench to manage files on an ECS instance.image.png

    Mount path

    Set to /photog_oss.

  3. Click Deploy.

    When you deploy a Scalable Job service, a queue service is automatically created with horizontal auto scaling enabled by default.

Call the service

After the service is deployed, call it to generate AI portraits.

When calling the service, set taskType to query for an inference request, as described in Call services. use the following sample code:

import json
from eas_prediction import QueueClient

# Create an input queue client to write input data.
input_queue = QueueClient('182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'photog_check')
input_queue.set_token('<token>')
input_queue.init()

datas = json.dumps(
    {
       'request_id'    : 12345,
       'images'        : ["xx.jpg", "xx.jpg"], # urls, a list
       'configure'     : {
            'face_reconize' : True, # Checks if all pictures are of the same person.
        }
    }
)
# Specify taskType as query.
tags = {"taskType": "query"}
index, request_id = input_queue.put(f'{datas}', tags)
print(index, request_id)

# View the details of the input queue.
attrs = input_queue.attributes()
print(attrs)

Related documentation