LVM-Image-Face-Ratio Filter (DLC) - Platform For AI - Alibaba Cloud Documentation Center

The LVM-Image-Face-Ratio Filter (DLC) component filters images whose face-to-image area ratio falls outside a specified range. Use it to remove images dominated by faces or images with no meaningful face content before training image generation models.

Important

This component requires a GPU instance type. Select a GPU instance when configuring the resource group.

Supported computing resources

Deep Learning Containers (DLC)

How it works

For each image, the component calculates the proportion of faces in the image. Images whose face ratio falls outside the configured minimum and maximum are filtered out. The remaining images are written to the output path you specify.

Adjust the minimum and maximum face ratio range based on your dataset and training objective.

Inputs and outputs

Input ports

The component accepts the following inputs:

Read File Data component — reads the Object Storage Service (OSS) path where training data is stored.
Image Data OSS Path parameter — select either an OSS directory containing image files or an existing meta.jsonl metadata file. See the parameter description below.
Any image data preprocessing component — connect it as an upstream input.

Output port

Filtering results written to the OSS directory specified by Output File OSS Path. See the parameter description below for output file details.

Configure the component

Configure the LVM-Image-Face-Ratio Filter (DLC) component in Machine Learning Designer. The following table describes all parameters.

Tab	Parameter	Type	Required	Default	Description
Field Settings	Image Data OSS Path	String	No	—	OSS directory containing image data, or an existing `meta.jsonl` file. If no upstream component is connected on the first run, select the OSS directory manually. The component generates `meta.jsonl` in the parent directory of the specified path. On subsequent runs, select `meta.jsonl` directly.
Field Settings	Output File OSS Path	String	Yes	—	OSS directory where filtering results are stored. The output includes: `{name}.jsonl` (filtered output, named by Output Filename), `{name}_stats.jsonl` (statistics), and `dj_run_yaml.yaml` (algorithm run configuration).
Field Settings	Output Filename	String	Yes	`result.jsonl`	File name for the filtering output.
Parameter Settings	Minimum face Ratio	Float	Yes	`0.0`	Minimum face ratio. Images with a face ratio below this value are filtered out.
Parameter Settings	Maximum face Ratio	Float	Yes	`0.4`	Maximum face ratio. Images with a face ratio above this value are filtered out.
Execution Tuning	Number of Processes	Integer	Yes	`4`	Number of parallel processes.
Select Resource Group	Public Resource Group	—	No	—	GPU instance type and virtual private cloud (VPC) to use. Select a GPU instance type.
Select Resource Group	Dedicated resource group	—	No	—	Number of vCPUs, memory, shared memory, and GPUs to allocate.
Select Resource Group	Maximum Running Duration (seconds)	Integer	No	—	Maximum run time in seconds. The job is terminated if this limit is exceeded.

Usage notes

First run without upstream component: If no upstream component provides the OSS path, set Image Data OSS Path to the OSS directory containing your images. The component creates meta.jsonl in the parent directory on the first run. Use this file as the input for subsequent runs instead of rescanning the directory.
Face ratio range: Set the minimum and maximum face ratio to control which images are retained. Images with a face ratio below the minimum or above the maximum are filtered out. Adjust the range based on your dataset and training objective.
GPU requirement: This component uses GPU-accelerated face detection. Always select a GPU instance type in the resource group configuration.