use image metric learning algorithm to train models - Platform For AI

If your business involves metric learning, you can use the image metric learning (raw) component of Platform for AI (PAI) to build metric learning models for inference. This topic describes how to configure the image metric learning (raw) component and provides an example on how to use the component.

Prerequisites

OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant the permissions that are required to use Machine Learning Designer.

Limits

You can use the image metric learning (raw) component with the computing resources of Deep Learning Containers (DLC).

Overview

The image metric learning (raw) component provides mainstream models such as ResNet50, ResNet18, ResNet34, ResNet101, swint_tiny, swint_small, swint_base, vit_tiny, vit_small, vit_base, xcit_tiny, xcit_small, and xcit_base.

Configure the component in the PAI console

Input ports
Input port (from left to right)
Data type
Recommended upstream component
Required
data annotation path for training
OSS
Read File Data
No
data annotation path for evaluation
OSS
Read File Data
No

Component parameters

Tab	Parameter	Required	Description	Default value
Fields Setting	model type	Yes	The model type used for training. Valid values: DataParallelMetricLearning ModelParallelMetricLearning	DataParallelMetricLearning
	oss dir to save model	Yes	The Object Storage Service (OSS) directory in which the training model is stored. Example: `oss://examplebucket/yun****/designer_test`.	None
	oss annotation path for training data	No	If you do not specify the labeled training data by using an input port, you must configure this parameter. Note If you use both an input port and this parameter to specify the labeled training data, the value specified by the input port takes precedence. The OSS path in which the labeled training data is stored. Example: `oss://examplebucket/yun**/data/imagenet/meta/train_labeled.txt`. Each data record in the train_labeled.txt file is stored in the `absolute path/image name.jpg label_id` format. Important** image storage path and label_id are separated by a space.	None
	oss annotation path for evaluation data	No	If you do not use the data annotation path for evaluation input port to specify the labeled evaluation data, you must configure this parameter. Note If you use both an input port and this parameter to specify the labeled evaluation data, the value specified by the input port takes precedence. The OSS path in which the labeled evaluation data is stored. Example: `oss://examplebucket/yun**/data/imagenet/meta/val_labeled.txt`. Each data record in the val_labeled.txt file is stored in the `absolute path/image name.jpg label_id` format. Important** image storage path and label_id are separated by a space.	None
	class list file	No	You can specify the class name or set the parameter to the OSS path where the TXT file that contains the class name is located.	None
	Data Source Type	Yes	The type of input data. Valid values: ClsSourceImageList and ClsSourceItag.	ClsSourceImageList
	oss path for pretrained model	No	The OSS path of your pre-trained model. If you have a pre-trained model, set this parameter to the OSS path of your pre-trained model. If you do not configure this parameter, the default pre-trained model provided by PAI is used.	None
Parameters Setting	backbone	Yes	The backbone model that you want to use. Valid values: resnet_50 resnet_18 resnet_34 resnet_101 swin_transformer_tiny swin_transformer_small swin_transformer_base	resnet50
	image size after resizing	Yes	The size of the resized image. Unit: pixels.	224
	backbone output channels	Yes	The feature dimensions exported by the mainstream model. The value must be an integer.	2048
	backbone output channels	Yes	The feature dimensions exported by the neck. The value must be an integer.	1536
	training data classification label range	Yes	The number of dimensions in the output data.	None
	metric loss	Yes	The loss function evaluates the degree of inconsistency between values predicted by the training model and actual values. The scope of the event. Valid values: AMSoftmax recommend margin 0.4 scale 30 ArcFaceLoss recommend margin 28.6 scale 64 CosFaceLoss recommend margin 0.35 scale 64 LargeMarginSoftmaxLoss recommend margin 4 scale 1 SphereFaceLoss recommend margin 4 scale 1 ModelParallel AMSoftmax ModelParallel Softmax	AMSoftmax recommend margin 0.4 scale 30
	metric learning loss scale parameter	Yes	The scale that you want to use for the loss function. Configure this parameter based on the loss function that you select.	30
	metric learning loss margin parameter	Yes	The margin that you want to use for the loss function. Configure this parameter based on the loss function that you select.	0.4
	metric learning loss weight in all losses	No	The weight that you want to use for the loss function, which indicates the optimization ratio of metric learning and the classification model.	1.0
	optimizer	Yes	The optimization method for model training. Valid values: SGD AdamW	SGD
	initial learning rate	Yes	The initial learning rate. The value is a floating-point number.	0.03
	batch size	Yes	The size of a training batch, which indicates the number of data samples used for model training in each iteration.	None
	total train epochs	Yes	The total number of epochs. An epoch ends when a round of training is complete on all data samples. The total number of epochs indicates the total number of training rounds conducted on data samples.	200
	save checkpoint epoch	No	The frequency at which a checkpoint is saved. The value of 1 indicates that a checkpoint is saved each time an epoch ends.	10
Execution Tuning	io thread num for training.	No	The number of threads used to read the training data.	4
	use fp 16	No	Specifies whether to enable FP16 to reduce memory usage during model training.	None
	single worker or distributed on MaxCompute or DLC	Yes	The compute engine that is used to run the component. You can select a compute engine based on your business requirements. Valid values: single_on_dlc distribute_on_dlc	single_on_dlc
	number of worker	No	If you select distribute_on_dlc for single worker or distributed on MaxCompute or DLC parameter, you configure set this parameter. The number of concurrent workers used in computing.	1
	gpu machine type	Yes	The GPU specifications that you want to use.	8vCPU+60GB Mem+1xp100-ecs.gn5-c8g1.2xlarge

Examples

The following figure shows a sample pipeline in which the image metric learning (raw) component is used. 工作流 In this example, configure the components in the preceding figure by performing the following steps:

Prepare data. Label images by using iTAG provided by PAI. For more information, see iTAG.
Use the Read File Data-4 and Read File Data-5 components to read the labeled training data and labeled evaluation data. To read the data, set the OSS Data Path parameter of each component to the OSS path in which the data that you want to retrieve is stored.
Draw lines from the preceding two components to the image metric learning (raw) component and configure the parameters for the image metric learning (raw) component. For more information, see the "Configure component in the PAI console" section of this topic.

Input port (from left to right)	Data type	Recommended upstream component	Required
data annotation path for training	OSS	Read File Data	No
data annotation path for evaluation	OSS	Read File Data	No