Deploy and call a scalable job service for inference by using EAS - Platform For AI

Use the Scalable Job service for AI Portrait inference to prevent resource underutilization and request interruptions during scale-in.

Prerequisites

A virtual private cloud (VPC) is created and Internet access is enabled for the VPC.
1. A VPC, vSwitch, and security group are created. For more information, see VPCs and vSwitchesand Use security groups.
2. An Internet NAT gateway is created in the VPC. An elastic IP address (EIP) is associated with the gateway and SNAT entries are configured on the gateway. For more information, see Internet NAT gateway.
For model training and portrait creation, 5 to 20 training images and 1 template image are prepared. The following image formats are supported: .jpg, .jpeg, and .png. Make sure that the size of each image is greater than 512 x 512 pixels.
- Single-person portrait: The template image must contain the face of a person. The faces in multiple training images belong to the same person.
- Multi-person portrait: The template image must contain multiple faces, and the number of faces must be the same as the value of the model_id parameter specified for model training.
An OSS bucket is created. For more information, see Create a bucket.

Limitations

The AI portrait solution is available only in the China (Beijing) and Singapore regions.

Deploy a scalable job service for inference

Deploy a verification service

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

On the Custom Deployment page, configure the following parameters. Use default values for the rest. For more information, see Custom Deployment.

In the Basic Information section, set the service name. Example: photog_check.

In the Environment Information section, configure the following parameters:

Parameter	Description
Deployment Method	Select Image-based Deployment and then select Asynchronous Queue.
Image Configuration	Select Image Address and enter the image address: China (Beijing): `registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub`. Singapore: `registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub`.
Code Build	Select OSS as the mount type and set the following parameters: Uri: Your OSS bucket path. Example: `oss://examplebucket/`. Mount Path: Set to `/photog_oss`.
Command to Run	Set to `python app.py`.
Port Number	Set to 7860.

In the Resource Information section, set the following parameters:
Parameter
Description
Resource Type
Select Public Resources.
Deployment
Select a GU30-series instance type on the GPU tab. Recommended: ml.gu7i.c32m188.1-gu30.
Configure a system disk
Set to 120 GiB.

In the Asynchronous Queue section, set the following parameters:

Parameter	Description
Resource Type	Select Public Resources.
Deployment	Number of replicas: 1 CPU (cores): 8 Memory (GB): 64
Maximum Data for A Single Input Request	Set to 20480 KB to ensure sufficient storage for each request in the queue.
Maximum Data for A Single Output

In the Service Access section, select the VPC, vSwitch, and security group you created.

In the Service Configurations section, add the following configurations. Refer to the complete configuration example below for the new parameters.

Field

New parameters

metadata

Add the following parameters:

{
    "metadata": {
        "name": "photog_check",
        "instance": 1,
        "rpc": {
            "keepalive": 3600000,
            "worker_threads": 1
        },
        "type": "Async"
    },
    "cloud": {
        "computing": {
            "instance_type": "ml.gu7i.c32m188.1-gu30",
            "instances": null
        },
        "networking": {
            "vswitch_id": "vsw-2ze4o9kww55051tf2****",
            "security_group_id": "sg-2ze0kgiee55d0fn4****",
            "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
        }
    },
    "features": {
        "eas.aliyun.com/extra-ephemeral-storage": "100Gi"
    },
    "queue": {
        "cpu": 8,
        "max_delivery": 1,
        "min_replica": 1,
        "memory": 64000,
        "resource": "",
        "source": {
            "max_payload_size_kb": 20480
        },
        "sink": {
            "max_payload_size_kb": 20480
        }
    },
    "storage": [
        {
            "oss": {
                "path": "oss://examplebucket/",
                "readOnly": false
            },
            "properties": {
                "resource_type": "code"
            },
            "mount_path": "/photog_oss"
        }
    ],
    "containers": [
        {
            "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub",
            "script": "python app.py",
            "port": 7860
        }
    ]
}

keepalive: Maximum processing time for a single request, in milliseconds. Set to 3600000.
worker_threads: Number of concurrent processing threads per Elastic Algorithm Service (EAS) instance.
Default value: 5, which means the first five queued tasks are assigned to the same instance. Set to 1 to process requests sequentially.

queue

Add "max_delivery": 1 to prevent repeated deliveries after a failure.

Complete configuration example

{
    "metadata": {
        "name": "photog_check",
        "instance": 1,
        "rpc": {
            "keepalive": 3600000,
            "worker_threads": 1
        },
        "type": "Async"
    },
    "cloud": {
        "computing": {
            "instance_type": "ml.gu7i.c32m188.1-gu30",
            "instances": null
        },
        "networking": {
            "vswitch_id": "vsw-2ze4o9kww55051tf2****",
            "security_group_id": "sg-2ze0kgiee55d0fn4****",
            "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
        }
    },
    "features": {
        "eas.aliyun.com/extra-ephemeral-storage": "100Gi"
    },
    "queue": {
        "cpu": 8,
        "max_delivery": 1,
        "min_replica": 1,
        "memory": 64000,
        "resource": "",
        "source": {
            "max_payload_size_kb": 20480
        },
        "sink": {
            "max_payload_size_kb": 20480
        }
    },
    "storage": [
        {
            "oss": {
                "path": "oss://examplebucket/",
                "readOnly": false
            },
            "properties": {
                "resource_type": "code"
            },
            "mount_path": "/photog_oss"
        }
    ],
    "containers": [
        {
            "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub",
            "script": "python app.py",
            "port": 7860
        }
    ]
}

Click Deploy.

Deploy a training service

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

On the Custom Deployment page, configure the following parameters. Use default values for the rest. For more information, see Custom Deployment.

In the Basic Information section, set the service name. Example: photog_train_pmml.

In the Environment Information section, set the following parameters:

Parameter	Description
Deployment Method	Select Image-based Deployment and then select Asynchronous Queue.
Image Configuration	Select Image Address and enter the image address: China (Beijing): `registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub`. Singapore: `registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub`.
Code configuration	Select OSS as the mount type and set the following parameters: Uri: Path to your OSS bucket. Must match the path specified for the verification service. Example: `oss://examplebucket/`. Mount Path: Set to `/photog_oss`.
Command to Run	Set to `python app.py`.
Port Number	Set to 7860.

In the Resource Information section, set the following parameters:
Parameter
Description
Resource Type
Select Public Resources.
Deployment
Select a GU30-series instance type on the GPU tab. Recommended: ml.gu7i.c32m188.1-gu30.
Configure a system disk
Set to 120 GiB.

In the Asynchronous Queue section, set the following parameters:

Parameter	Description
Resource Type	Select Public Resources.
Deployment	Number of replicas: 1 CPU (cores): 8 Memory (GB): 64
Maximum Data for A Single Input Request	Set to 20480 KB to ensure sufficient storage for each request in the queue.
Maximum Data for A Single Output

In the Service Access section, select the VPC, vSwitch, and security group you created.

In the Service Configurations section, add the following configurations. Refer to the complete configuration example below for the new parameters.

Field	New parameters
autoscaler	(Optional) Horizontal auto scaling configuration. For more information, see Horizontal auto scaling. { "autoscaler": { "behavior": { "scaleDown": { "stabilizationWindowSeconds": 60 } }, "max": 5, "min": 1, "strategies": { "queue[backlog]": 1 } }, "metadata": { "name": "photog_train_pmml", "instance": 1, "rpc": { "keepalive": 3600000, "worker_threads": 1 }, "type": "Async" }, "cloud": { "computing": { "instance_type": "ml.gu7i.c32m188.1-gu30", "instances": null }, "networking": { "vswitch_id": "vsw-2ze4o9kww55051tf2**", "security_group_id": "sg-2ze0kgiee55d0fn4", "vpc_id": "vpc-2ze5hl4ozjl4fo7q3**" } }, "features": { "eas.aliyun.com/extra-ephemeral-storage": "120Gi" }, "queue": { "cpu": 8, "max_delivery": 1, "min_replica": 1, "memory": 64000, "resource": "", "source": { "max_payload_size_kb": 20480 }, "sink": { "max_payload_size_kb": 20480 } }, "storage": [ { "oss": { "path": "oss://examplebucket/", "readOnly": false }, "properties": { "resource_type": "code" }, "mount_path": "/photog_oss" } ], "containers": [ { "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub", "script": "python app.py", "port": 7860 } ] }
metadata	Add the following parameters: { "metadata": { "name": "photog_check", "instance": 1, "rpc": { "keepalive": 3600000, "worker_threads": 1 }, "type": "Async" }, "cloud": { "computing": { "instance_type": "ml.gu7i.c32m188.1-gu30", "instances": null }, "networking": { "vswitch_id": "vsw-2ze4o9kww55051tf2**", "security_group_id": "sg-2ze0kgiee55d0fn4", "vpc_id": "vpc-2ze5hl4ozjl4fo7q3**" } }, "features": { "eas.aliyun.com/extra-ephemeral-storage": "100Gi" }, "queue": { "cpu": 8, "max_delivery": 1, "min_replica": 1, "memory": 64000, "resource": "", "source": { "max_payload_size_kb": 20480 }, "sink": { "max_payload_size_kb": 20480 } }, "storage": [ { "oss": { "path": "oss://examplebucket/", "readOnly": false }, "properties": { "resource_type": "code" }, "mount_path": "/photog_oss" } ], "containers": [ { "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub", "script": "python app.py", "port": 7860 } ] } keepalive: Maximum processing time for a single request, in milliseconds. Set to 3600000. worker_threads: Number of concurrent processing threads per EAS instance. Default value: 5, which means the first five queued tasks are assigned to the same instance. Set to 1 to process requests sequentially.
queue	Add `"max_delivery": 1` to prevent multiple redeliveries after a failure.

Complete configuration example

{
    "autoscaler": {
        "behavior": {
            "scaleDown": {
                "stabilizationWindowSeconds": 60
            }
        },
        "max": 5,
        "min": 1,
        "strategies": {
            "queue[backlog]": 1
        }
    },
    "metadata": {
        "name": "photog_train_pmml",
        "instance": 1,
        "rpc": {
            "keepalive": 3600000,
            "worker_threads": 1
        },
        "type": "Async"
    },
    "cloud": {
        "computing": {
            "instance_type": "ml.gu7i.c32m188.1-gu30",
            "instances": null
        },
        "networking": {
            "vswitch_id": "vsw-2ze4o9kww55051tf2****",
            "security_group_id": "sg-2ze0kgiee55d0fn4****",
            "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
        }
    },
    "features": {
        "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
    },
    "queue": {
        "cpu": 8,
        "max_delivery": 1,
        "min_replica": 1,
        "memory": 64000,
        "resource": "",
        "source": {
            "max_payload_size_kb": 20480
        },
        "sink": {
            "max_payload_size_kb": 20480
        }
    },
    "storage": [
        {
            "oss": {
                "path": "oss://examplebucket/",
                "readOnly": false
            },
            "properties": {
                "resource_type": "code"
            },
            "mount_path": "/photog_oss"
        }
    ],
    "containers": [
        {
            "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub",
            "script": "python app.py",
            "port": 7860
        }
    ]
}

Click Deploy.

Deploy a prediction service

The prediction service is deployed as a Scalable Job service.

Click Deploy Service. In the Custom Model Deployment section, click JSON Deployment.

Enter the configuration in the JSON editor.

{
    "metadata": {
        "name": "photog_pre_pmml",
        "instance": 1,
        "rpc": {
            "keepalive": 3600000,
            "worker_threads": 1
        },
        "type": "ScalableJob"
    },
    "cloud": {
        "computing": {
            "instance_type": "ecs.gn6v-c8g1.2xlarge",
            "instances": null
        },
        "networking": {
            "vswitch_id": "vsw-2ze4o9kww55051tf2****",
            "security_group_id": "sg-2ze0kgiee55d0fn4****",
            "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
        }
    },
    "features": {
        "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
    },
    "queue": {
        "cpu": 8,
        "max_delivery": 1,
        "min_replica": 1,
        "memory": 64000,
        "resource": "",
        "source": {
            "max_payload_size_kb": 20480
        },
        "sink": {
            "max_payload_size_kb": 20480
        }
    },
    "storage": [
        {
            "oss": {
                "path": "oss://examplebucket/",
                "readOnly": false
            },
            "properties": {
                "resource_type": "code"
            },
            "mount_path": "/photog_oss"
        }
    ],
    "containers": [
        {
            "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:infer.1.0.0.pub",
            "env": [
                {
                    "name": "URL",
                    "value": "http://127.0.0.1:8000"
                },
                {
                    "name": "AUTHORIZATION",
                    "value": "="
                }
            ],
            "script": "python app.py",
            "port": 7861
        },
        {
            "image": "eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2",
            "port": 8000,
            "script": "./webui.sh --listen --port 8000 --skip-version-check --no-hashing --no-download-sd-model --skip-install --api --filebrowser --sd-dynamic-cache --data-dir /photog_oss/webui/"
        }
    ]
}

The following table describes key parameters. For details about other parameters, see JSON Deployment.

Parameter		Description
metadata	Name	Service name. Must be unique within the region.
metadata	Type	Set to ScalableJob to deploy the asynchronous inference service as a Scalable Job service.
containers	Image	Image addresses for the AI portrait prediction service and WebUI prediction service. The following list provides supported images. This solution uses images for the China (Beijing) region. Image addresses for China (Beijing): AI portrait prediction service: `registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:infer.1.0.0.pub`. WebUI prediction service: `eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2`. Image addresses for Singapore: AI portrait prediction service: `registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:infer.1.0.0.pub`. WebUI prediction service: `eas-registry-vpc.ap-southeast-1.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2`.
storage	Path	OSS mount path. Use the same OSS bucket path as the verification service. Example: `oss://examplebucket/`. Download and extract the model files required by WebUI, and store them in your OSS bucket at `oss://examplebucket/photog_oss/webui` with the directory structure shown below. For more information about uploading files to OSS, see Command-line tool ossutil 1.0. For more information about uploading files to NAS, see Quick start (Linux) and Use Workbench to manage files on an ECS instance.
storage	Mount path	Set to `/photog_oss`.

Click Deploy.
When you deploy a Scalable Job service, a queue service is automatically created with horizontal auto scaling enabled by default.

Call the service

After the service is deployed, call it to generate AI portraits.

When calling the service, set taskType to query for an inference request, as described in Call services. use the following sample code:

import json
from eas_prediction import QueueClient

# Create an input queue client to write input data.
input_queue = QueueClient('182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'photog_check')
input_queue.set_token('<token>')
input_queue.init()

datas = json.dumps(
    {
       'request_id'    : 12345,
       'images'        : ["xx.jpg", "xx.jpg"], # urls, a list
       'configure'     : {
            'face_reconize' : True, # Checks if all pictures are of the same person.
        }
    }
)
# Specify taskType as query.
tags = {"taskType": "query"}
index, request_id = input_queue.put(f'{datas}', tags)
print(index, request_id)

# View the details of the input queue.
attrs = input_queue.attributes()
print(attrs)

Platform For AI:Deploy an online inference service for AI portraits