EasyVision of Machine Learning Platform for AI provides enhanced capabilities in image feature extraction and allows you to perform distributed training on multiple servers. You can use EasyVision to read images from Object Storage Service (OSS) and write image feature extraction results to OSS. You can also obtain images by reading data in a table and write image feature extraction results to the table. This topic describes how to use EasyVision to read images from OSS and write image feature extraction results to OSS.

Data format

For more information, see Input data formats.

Image feature extraction

Run the following Machine Learning Platform for AI command to extract image features based on existing files:
pai -name ev_predict_ext
             -Dmodel_path='oss://pai-vision-data-sh/pretrained_models/saved_models/resnet_v1_50/'
             -Dmodel_type='feature_extractor'
             -Dinput_oss_file='oss://path/to/your/filelist.txt'
             -Doutput_oss_file='oss://path/to/your/result.txt'
             -Dimage_type='url'
             -Dfeature_name='resnet_v1_50/block4'
             -Dnum_worker=2
             -DcpuRequired=800
             -DgpuRequired=100
             -Dbuckets='oss://pai-vision-data-sh/'
             -Darn='your_role_arn'
             -DossHost='oss-cn-shanghai-internal.aliyuncs.com'
For more information, see Parameters.

Output

Extraction results are written to an output file. Each entry in the file represents the extraction result of one image. Each entry consists of a file path and a JSON string. For example, the output file contains the following information:
oss://path/to/your/image1.jpg,  {"feature": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4583122730255127, 0.0]}
oss://path/to/your/image1.jpg,  {"feature": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4583122730255127, 0.0]}
oss://path/to/your/image1.jpg,  {"feature": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4583122730255127, 0.0]}
The JSON string consists of only one key-value pair. The value of feature is a one-dimensional vector that indicates image features.

Model address and output

You can use the pretrained model resnet_v1_50 that is stored in the OSS path oss://pai-vision-data-sh/pretrained_models/saved_models/resnet_v1_50 to extract image features. After extraction, the model generates the following result:
resnet_v1_50/block1 shape: [None, 56, 56, 256] type: <dtype: 'float32'>
resnet_v1_50/block2 shape: [None, 28, 28, 512] type: <dtype: 'float32'>
resnet_v1_50/block3 shape: [None, 14, 14, 1024] type: <dtype: 'float32'>
resnet_v1_50/block4 shape: [None, 7, 7, 2048] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 2048] type: <dtype: 'float32'>
resnet_v1_50/logits shape: [None, 1, 1, 1000] type: <dtype: 'float32'>
predictions shape: [None] type: <dtype: 'int32'>
class shape: [None] type: <dtype: 'int32'>
preprocessed_images shape: [None, 224, 224, 3] type: <dtype: 'float32'>
resnet_v1_50/conv1 shape: [None, 112, 112, 64] type: <dtype: 'float32'>
logits shape: [None, 1000] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
resnet_v1_50/spatial_squeeze shape: [None, 1000] type: <dtype: 'float32'>
You can use the pretrained model resnet_v1_101 that is stored in the OSS path oss://pai-vision-data-sh/pretrained_models/saved_models/resnet_v1_101 to extract image features. After extraction, the model generates the following result:
resnet_v1_101/block4 shape: [None, 7, 7, 2048] type: <dtype: 'float32'>
resnet_v1_101/logits shape: [None, 1, 1, 1000] type: <dtype: 'float32'>
resnet_v1_101/block2 shape: [None, 28, 28, 512] type: <dtype: 'float32'>
resnet_v1_101/conv1 shape: [None, 112, 112, 64] type: <dtype: 'float32'>
resnet_v1_101/block1 shape: [None, 56, 56, 256] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
resnet_v1_101/spatial_squeeze shape: [None, 1000] type: <dtype: 'float32'>
predictions shape: [None] type: <dtype: 'int32'>
preprocessed_images shape: [None, 224, 224, 3] type: <dtype: 'float32'>
logits shape: [None, 1000] type: <dtype: 'float32'>
resnet_v1_101/block3 shape: [None, 14, 14, 1024] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 2048] type: <dtype: 'float32'>
You can use the pretrained model inception_v3 that is stored in the OSS path oss://pai-vision-data-sh/pretrained_models/saved_models/inception_v3 to extract image features. After extraction, the model generates the following result:
preprocessed_images shape: [None, 299, 299, 3] type: <dtype: 'float32'>
Conv2d_1a_3x3 shape: [None, 149, 149, 32] type: <dtype: 'float32'>
Conv2d_2a_3x3 shape: [None, 147, 147, 32] type: <dtype: 'float32'>
Conv2d_2b_3x3 shape: [None, 147, 147, 64] type: <dtype: 'float32'>
MaxPool_3a_3x3 shape: [None, 73, 73, 64] type: <dtype: 'float32'>
Conv2d_3b_1x1 shape: [None, 73, 73, 80] type: <dtype: 'float32'>
Conv2d_4a_3x3 shape: [None, 71, 71, 192] type: <dtype: 'float32'>
MaxPool_5a_3x3 shape: [None, 35, 35, 192] type: <dtype: 'float32'>
Mixed_5b shape: [None, 35, 35, 256] type: <dtype: 'float32'>
Mixed_5c shape: [None, 35, 35, 288] type: <dtype: 'float32'>
Mixed_5d shape: [None, 35, 35, 288] type: <dtype: 'float32'>
Mixed_6a shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_6b shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_6c shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_6d shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_6e shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_7a shape: [None, 8, 8, 1280] type: <dtype: 'float32'>
Mixed_7b shape: [None, 8, 8, 2048] type: <dtype: 'float32'>
Mixed_7c shape: [None, 8, 8, 2048] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 2048] type: <dtype: 'float32'>
PreLogits shape: [None, 1, 1, 2048] type: <dtype: 'float32'>
Logits shape: [None, 1001] type: <dtype: 'float32'>
Predictions shape: [None, 1001] type: <dtype: 'float32'>
logits shape: [None, 1001] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
predictions shape: [None] type: <dtype: 'int32'>
original_image shape: [None, None, None, 3] type: <dtype: 'float32'>
original_image_shape shape: [None, 3] type: <dtype: 'int32'>
You can use the pretrained model inception_v4 that is stored in the OSS path oss://pai-vision-data-sh/pretrained_models/saved_models/inception_v4 to extract image features. After extraction, the model generates the following result:
preprocessed_images shape: [None, 299, 299, 3] type: <dtype: 'float32'>
Conv2d_1a_3x3 shape: [None, 149, 149, 32] type: <dtype: 'float32'>
Conv2d_2a_3x3 shape: [None, 147, 147, 32] type: <dtype: 'float32'>
Conv2d_2b_3x3 shape: [None, 147, 147, 64] type: <dtype: 'float32'>
Mixed_3a shape: [None, 73, 73, 160] type: <dtype: 'float32'>
Mixed_4a shape: [None, 71, 71, 192] type: <dtype: 'float32'>
Mixed_5a shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_5b shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_5c shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_5d shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_5e shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_6a shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6b shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6c shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6d shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6e shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6f shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6g shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6h shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_7a shape: [None, 8, 8, 1536] type: <dtype: 'float32'>
Mixed_7b shape: [None, 8, 8, 1536] type: <dtype: 'float32'>
Mixed_7c shape: [None, 8, 8, 1536] type: <dtype: 'float32'>
Mixed_7d shape: [None, 8, 8, 1536] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 1536] type: <dtype: 'float32'>
PreLogitsFlatten shape: [None, 1536] type: <dtype: 'float32'>
Logits shape: [None, 1001] type: <dtype: 'float32'>
Predictions shape: [None, 1001] type: <dtype: 'float32'>
logits shape: [None, 1001] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
predictions shape: [None] type: <dtype: 'int32'>
original_image shape: [None, None, None, 3] type: <dtype: 'float32'>
original_image_shape shape: [None, 3] type: <dtype: 'int32'>
You can use the pretrained model mobilenet_v2 that is stored in the OSS path oss://pai-vision-data-sh/pretrained_models/saved_models/mobilenet_v2_1.0_224 to extract image features. After extraction, the model generates the following result:
preprocessed_images shape: [None, 224, 224, 3] type: <dtype: 'float32'>
layer_1 shape: [None, 112, 112, 32] type: <dtype: 'float32'>
layer_2 shape: [None, 112, 112, 16] type: <dtype: 'float32'>
layer_3 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
layer_4 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
layer_5 shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_6 shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_7 shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_8 shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_9 shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_10 shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_11 shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_12 shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_13 shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_14 shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_15 shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_16 shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_17 shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_18 shape: [None, 7, 7, 320] type: <dtype: 'float32'>
layer_19 shape: [None, 7, 7, 1280] type: <dtype: 'float32'>
layer_2/depthwise_output shape: [None, 112, 112, 32] type: <dtype: 'float32'>
layer_2/output shape: [None, 112, 112, 16] type: <dtype: 'float32'>
layer_3/expansion_output shape: [None, 112, 112, 96] type: <dtype: 'float32'>
layer_3/depthwise_output shape: [None, 56, 56, 96] type: <dtype: 'float32'>
layer_3/output shape: [None, 56, 56, 24] type: <dtype: 'float32'>
layer_4/expansion_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
layer_4/depthwise_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
layer_4/output shape: [None, 56, 56, 24] type: <dtype: 'float32'>
layer_5/expansion_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
layer_5/depthwise_output shape: [None, 28, 28, 144] type: <dtype: 'float32'>
layer_5/output shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_6/expansion_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_6/depthwise_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_6/output shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_7/expansion_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_7/depthwise_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_7/output shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_8/expansion_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_8/depthwise_output shape: [None, 14, 14, 192] type: <dtype: 'float32'>
layer_8/output shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_9/expansion_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_9/depthwise_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_9/output shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_10/expansion_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_10/depthwise_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_10/output shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_11/expansion_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_11/depthwise_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_11/output shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_12/expansion_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_12/depthwise_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_12/output shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_13/expansion_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_13/depthwise_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_13/output shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_14/expansion_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_14/depthwise_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_14/output shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_15/expansion_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_15/depthwise_output shape: [None, 7, 7, 576] type: <dtype: 'float32'>
layer_15/output shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_16/expansion_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_16/depthwise_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_16/output shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_17/expansion_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_17/depthwise_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_17/output shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_18/expansion_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_18/depthwise_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_18/output shape: [None, 7, 7, 320] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 1280] type: <dtype: 'float32'>
Logits shape: [None, 1001] type: <dtype: 'float32'>
Predictions shape: [None, 1001] type: <dtype: 'float32'>
logits shape: [None, 1001] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
predictions shape: [None] type: <dtype: 'int32'>
original_image shape: [None, None, None, 3] type: <dtype: 'float32'>
original_image_shape shape: [None, 3] type: <dtype: 'int32'>
You can use the pretrained model efficientnet_b0 that is stored in the OSS path oss://pai-vision-data-sh/pretrained_models/saved_models/efficientnet-b0 to extract image features. After extraction, the model generates the following result:
stem shape: [None, 112, 112, 32] type: <dtype: 'float32'>
block_0/expansion_output shape: [None, 112, 112, 32] type: <dtype: 'float32'>
block_0 shape: [None, 112, 112, 16] type: <dtype: 'float32'>
reduction_1/expansion_output shape: [None, 112, 112, 32] type: <dtype: 'float32'>
reduction_1 shape: [None, 112, 112, 16] type: <dtype: 'float32'>
block_1/expansion_output shape: [None, 56, 56, 96] type: <dtype: 'float32'>
block_1 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
block_2/expansion_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
block_2 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
reduction_2/expansion_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
reduction_2 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
block_3/expansion_output shape: [None, 28, 28, 144] type: <dtype: 'float32'>
block_3 shape: [None, 28, 28, 40] type: <dtype: 'float32'>
block_4/expansion_output shape: [None, 28, 28, 240] type: <dtype: 'float32'>
block_4 shape: [None, 28, 28, 40] type: <dtype: 'float32'>
reduction_3/expansion_output shape: [None, 28, 28, 240] type: <dtype: 'float32'>
reduction_3 shape: [None, 28, 28, 40] type: <dtype: 'float32'>
block_5/expansion_output shape: [None, 14, 14, 240] type: <dtype: 'float32'>
block_5 shape: [None, 14, 14, 80] type: <dtype: 'float32'>
block_6/expansion_output shape: [None, 14, 14, 480] type: <dtype: 'float32'>
block_6 shape: [None, 14, 14, 80] type: <dtype: 'float32'>
block_7/expansion_output shape: [None, 14, 14, 480] type: <dtype: 'float32'>
block_7 shape: [None, 14, 14, 80] type: <dtype: 'float32'>
block_8/expansion_output shape: [None, 14, 14, 480] type: <dtype: 'float32'>
block_8 shape: [None, 14, 14, 112] type: <dtype: 'float32'>
block_9/expansion_output shape: [None, 14, 14, 672] type: <dtype: 'float32'>
block_9 shape: [None, 14, 14, 112] type: <dtype: 'float32'>
block_10/expansion_output shape: [None, 14, 14, 672] type: <dtype: 'float32'>
block_10 shape: [None, 14, 14, 112] type: <dtype: 'float32'>
reduction_4/expansion_output shape: [None, 14, 14, 672] type: <dtype: 'float32'>
reduction_4 shape: [None, 14, 14, 112] type: <dtype: 'float32'>
block_11/expansion_output shape: [None, 7, 7, 672] type: <dtype: 'float32'>
block_11 shape: [None, 7, 7, 192] type: <dtype: 'float32'>
block_12/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
block_12 shape: [None, 7, 7, 192] type: <dtype: 'float32'>
block_13/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
block_13 shape: [None, 7, 7, 192] type: <dtype: 'float32'>
block_14/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
block_14 shape: [None, 7, 7, 192] type: <dtype: 'float32'>
block_15/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
block_15 shape: [None, 7, 7, 320] type: <dtype: 'float32'>
reduction_5/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
reduction_5 shape: [None, 7, 7, 320] type: <dtype: 'float32'>
features shape: [None, 7, 7, 320] type: <dtype: 'float32'>
head_1x1 shape: [None, 7, 7, 1280] type: <dtype: 'float32'>
pooled_features shape: [None, 1280] type: <dtype: 'float32'>
global_pool shape: [None, 1280] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
head shape: [None, 1000] type: <dtype: 'float32'>
logits shape: [None, 1000] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
predictions shape: [None] type: <dtype: 'int32'>