Caffe is an open source deep learning framework. This topic describes how to train models by using Caffe in Machine Learning Platform for AI (PAI).

Format conversion

The Caffe component does not support training data in a customized format. You must convert the data to the required format by using the Format Conversion component.

The input port of the Format Conversion component is connected to the Read File Data component.
  • Read File Data
    To use this component, you must set the OSS Data Path parameter. The following code shows the format of the file_list training data in the bucket.hz.aliyun.com/train_img/train_file_list.txt directory:
    bucket/ilsvrc12_val/ILSVRC2012_val_00029021.JPEG 817
    bucket/ilsvrc12_val/ILSVRC2012_val_00021046.JPEG 913
    bucket/ilsvrc12_val/ILSVRC2012_val_00041166.JPEG 486
    bucket/ilsvrc12_val/ILSVRC2012_val_00029527.JPEG 327
    bucket/ilsvrc12_val/ILSVRC2012_val_00042825.JPEG 138
  • Format Conversion
    Configure Output OSS Directory and other parameters. bucket_name.oss-cn-hangzhou-zmf.aliyuncs.com/ilsvrc12_val_convert is used in this example. The Format Conversion component generates the converted data_file_list.txt and related data files. Format of data_file_list:
    bucket/ilsvrc12_val_convert/train_data_00_01
    bucket/ilsvrc12_val_convert/train_data_00_02
    • Machine Learning Platform for AI console
      • OSS Path for Storing Image Lists: The OSS path in which images are stored.
      • File Prefix: The default value is data.
      • resize_height: The default value is 256.
      • resize_width: The default value is 256.
      • Encoding Type: You can select JPG, PNG, or Raw from the drop-down list.
      • Output OSS Directory: The directory to which output data is stored.
      • Shuffle: This option is selected by default.
      • Gray: specifies whether the image is grayscale. This option is cleared by default.
      • Image Mean: specifies whether to generate the image mean file. This option is cleared by default.
    • PAI command
      PAI -name convert_image_oss2oss
          -project algo_public_dev
          -Darn=acs:ram::1607128916545079:role/test-1
          -DossImageList=bucket_name.oss-cn-hangzhou-zmf.aliyuncs.com/image_list.txt
          -DossOutputDir=bucket_name.oss-cn-hangzhou-zmf.aliyuncs.com/your/dir
          -DencodeType=jpg
          -Dshuffle=true
          -DdataFilePrefix=train
          -DresizeHeight=256
          -DresizeWidth=256
          -DisGray=false
          -DimageMeanFile=false
      Parameter Description Value Required/Default value
      ossHost The endpoint of the OSS bucket. Example: oss-test.aliyun-inc.com. No. The default value is oss-cn-hangzhou-zmf.aliyuncs.com, which indicates the internal endpoint of OSS.
      arn The ARN of the OSS role. Example: acs:ram::XXXXXXXXXXXXXXXX:role/ossaccessroleforodps. XXX indicates the 16 digits generated for the ARN of the role. Yes.
      ossImageList The list of image files. Example: bucket_name/image_list.txt. Yes.
      ossOutputDir The directory to which data is stored. Example: bucket_name/your/dir. Yes.
      encodeType The type of the encoding. Example: JPG, PNG, or Raw. No. The default value is JPG.
      shuffle Specifies whether to shuffle the data. A value of the BOOLEAN type. No. The default value is true.
      dataFilePrefix The prefix of the data file. A value of the STRING type. Example: train or val. Yes.
      resizeHeight The height of the resized image. A value of the int type. Users can customize the value. No. The default value is 256.
      resizeWidth The width of the resized image. A value of the int type. Users can customize the value. No. The default value is 256.
      isGray Specifies whether the image is grayscale. A value of the BOOLEAN type. No. The default value is false.
      imageMeanFile Specifies whether to generate the image mean file. A value of the BOOLEAN type. No. The default value is false.

Caffe

Caffe is a deep learning framework that features clarity, high readability, and agility. For more information, see The official website. You can configure parameters of the Caffe component by using one of the following methods:
  • Machine Learning Platform for AI console
    Tab Parameter Description
    Parameters Setting Solver Path The OSS path in which the data is stored.
    Limit Job Runtime After you select this option, you can set the Limit Job Runtime parameter. Valid values: 1 to 168. Unit: hours.
    Tuning GPUs The number of GPUs. Default value: 1.
  • PAI command
    PAI -name pluto_train_oss
        -project algo_public_dev
        -DossHost=oss-cn-hangzhou-zmf.aliyuncs.com
        -Darn=acs:ram::1607128916545079:role/test-1
        -DsolverPrototxtFile=bucket_name.oss-cn-hangzhou-zmf.aliyuncs.com/solver.prototxt
        -DgpuRequired=1
    Parameter Required Description Default value
    ossHost No The endpoint of the OSS bucket. oss-cn-hangzhou-zmf.aliyuncs.com
    arn Yes The ARN of the OSS role. Example: acs:ram::XXXXXXXXXXXXXXXX:role/ossaccessroleforodps. XXX indicates the 16 digits generated for the ARN of the role. No default value
    solverPrototxtFile Yes The OSS path of the solver file. The path must start with the bucket name. No default value
    gpuRequired No The number of GPUs. 1
Based on the optimized parallel computing, Solver is slightly different from Caffe. Differences:
  • net: The net file is stored in the OSS path.
  • type: This parameter is set to ParallelSGD, which is a string value.
  • model_average_iter_interval: the synchronous frequency in multiple GPUs. The value 1 indicates that the data is synchronized each round.
  • snapshot_prefix: The OSS output directory of the model.
net: "bucket/alexnet/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "bucket/snapshot/alexnet_train"
solver_mode: GPU
type: "ParallelSGD"
model_average_iter_interval: 1
Select BinaryDataLayer for datalayer in train_val. Example:
layer {
name: "data"
type: "BinaryData"
top: "data"
top: "label"
include {
  phase: TRAIN
}
transform_param {
  mirror: true
  crop_size: 227
  mean_file: "bucket/imagenet_mean.binaryproto"
}
binary_data_param {
  source: "bucket/ilsvrc12_train_binary/data_file_list.txt"
  batch_size: 256
  num_threads: 10
}
}
layer {
name: "data"
type: "BinaryData"
top: "data"
top: "label"
include {
  phase: TEST
}
transform_param {
  mirror: false
  crop_size: 227
  mean_file: "bucket/imagenet_mean.binaryproto"
}
binary_data_param {
  source: "bucket/ilsvrc12_val_binary/data_file_list.txt"
  batch_size: 50
  num_threads: 10
}
}

The name of the new data Layer is BinaryData. You can also use transform param to transform the input image data. The parameters are consistent with the native parameters of Caffe.

binary_data_param specifies the parameter settings for the data layer, including the following parameters:
  • source: the data source. The path of the data source is the same as the path that is specified in filelist. The value starts with a bucket name and does not contain oss://.
  • num_threads: the number of concurrent threads for reading OSS data. The default value is 10. You can modify this parameter as needed.

Example

The following example shows how to train a model with MNIST data by using Caffe.
  1. Prepare data sources
    Download and extract Caffe data. For more information, see Download Caffe resources. Upload the data to OSS. The path settings shown in the following figure are for reference only.Prepare data sources
  2. Run the experiment
    Drag the Caffe component and connect it with the Read File Data component.Run the experiment

    Set Solver Path to mnist_solver_dnn_binary.prototxt and click Run in the upper-left corner of canvas.

  3. View Logs
    Right-click the Caffe component and select View Log.View Logs
    In the View Log message, click the logview link. On the page that appears, choose ODPS Tasks > VlinuxTask > StdErr to view the logs generated during the training.Training logs