Build an Image-Text Pair Filter Pipeline with LVM-Image-Text-Matching - Platform for AI

The LVM-Image-Text-Matching Filter (DLC) component is used to filter the data of an image that has an excessively low text-image matching score.

Important

This component requires a GPU instance type. When you configure the resource group, make sure you select a GPU instance.

Supported computing resources

Deep Learning Containers (DLC)

How the algorithm works

The LVM-Image-Text-Matching Filter (DLC) component compares the description text of an image with the description text in training data and calculates the matching score of the image based on blip-itm-base-coco. This way, the component filters the data of an image that has an excessively low text-image matching score to ensure the quality of the image. The description text in training data is the content that follows the <__dj__video> field in the training data file. In most cases, the component is used for the subsequent training of image generation models.

Input data format

The input is a JSONL file. Each line contains a JSON object with the following fields:

images: The OSS path of the image.
text: The description text. The <__dj__image> field marks the start of the description text, and the <|__dj__eoc|> field marks the end.

Inputs and outputs

Input ports

The Read File Data component reads the OSS path where the training data is stored.
You can configure the Image Data OSS Path parameter to select the training data file.

For more information about the training data file, see How the algorithm works.

Output port

The component produces the filtering results. For details on the output files, see the Output File OSS Path parameter in Field Settings.

Configure the component

You can configure the parameters of the LVM-Image-Text-Matching Filter (DLC) component in Machine Learning Designer. The parameters are grouped into the following tabs.

Field Settings

Parameter	Required	Description	Default value
Image Data OSS Path	No	The training data file. For more information, see How the algorithm works.	No default value
Output File OSS Path	Yes	The OSS directory where the filtering results are stored. The directory contains the following output files: `{name}.jsonl`: The filtered output file. You can set the file name with the Output Filename parameter. `{name}_stats.jsonl`: The state file. `dj_run_yaml.yaml`: The parameter configuration file used when the algorithm runs.	No default value
Output Filename	Yes	The file name for the filtering results.	result.jsonl

Parameter Settings

Parameter	Required	Description	Default value
Minimum Text-Frame Matching Score	Yes	The minimum text-image matching score. Images with scores below this threshold are filtered out.	0.1
Maximum Text-Frame Matching Score	Yes	The maximum text-image matching score. In most cases, set this parameter to 1.	1

Execution Tuning

Parameter	Required	Description	Default value
Select Resource Group - Public Resource Group	No	The instance type (CPU or GPU) and virtual private cloud (VPC) to use. You must select a GPU instance type for this algorithm.	No default value
Select Resource Group - Dedicated resource group	No	The number of CPU cores, memory, shared memory, and GPUs to use.	No default value
Maximum Running Duration (seconds)	No	The maximum time the component can run. If the component exceeds this duration, the job is terminated.	No default value