Fast Neural Networks (FastNN) in Machine Learning Platform for AI (PAI) is a distributed neural network warehouse based on the PAISoar framework. FastNN supports classic algorithms, such as Inception, ResNet, and VGG. It will support more advanced algorithms in the future. FastNN is embedded in Machine Learning Studio. You can use it on this platform.

Prepare data sources

To facilitate the usage of FastNN in the PAI console, CIFAR-10, MNIST, and Flowers datasets are downloaded, converted to TFRecord datasets, and saved in a public Object Storage Service (OSS) bucket. You can use the Read MaxCompute Table or OSS Data Synchronization component provided by PAI to access the datasets. The following table lists the storage paths in OSS.
Dataset Number of classes Training dataset Test dataset Storage path
mnist 10 3320 350
  • China (Beijing): oss://pai-online-beijing.oss-cn-beijing-internal.aliyuncs.com/fastnn-data/mnist/
  • China (Shanghai): oss://pai-online.oss-cn-shanghai-internal.aliyuncs.com/fastnn-data/mnist/
cifar10 10 50000 10000
  • China (Beijing): oss://pai-online-beijing.oss-cn-beijing-internal.aliyuncs.com/fastnn-data/cifar10/
  • China (Shanghai): oss://pai-online.oss-cn-shanghai-internal.aliyuncs.com/fastnn-data/cifar10/
flowers 5 60000 10000
  • China (Beijing): oss://pai-online-beijing.oss-cn-beijing-internal.aliyuncs.com/fastnn-data/flowers/
  • China (Shanghai): oss://pai-online.oss-cn-shanghai-internal.aliyuncs.com/fastnn-data/flowers/
FastNN allows you to read data in the TFRecord format. It uses TFRecordDataset to build dataset pipelines, which can be used for model training. This saves time for data preprocessing. FastNN does not provide fine-grained data partitioning. We recommend that you evenly distribute data to each server.
  • Make sure that the number of samples for each TFRecord file is the same.
  • Make sure that the number of TFRecord files that each worker processes is the same.

If the data is in the TFRecord format, you can build dataset pipelines by referring to the files, such as cifar10, mnist, and flowers in the datasets directory. The cifar10 file is used in this example.

The following code shows the key_to_features format in the cifar10 file:
features={
        'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
        'image/format': tf.FixedLenFeature((), tf.string, default_value='png'),
        'image/class/label': tf.FixedLenFeature(
          [], tf.int64, default_value=tf.zeros([], dtype=tf.int64)),
}
  1. In the datasets directory, create a file named cifar10.py for data parsing and edit the file.
    """Provides data for the Cifar10 dataset.
    The dataset scripts used to create the dataset can be found at:
    datasets/download_and_covert_data/download_and_convert_cifar10.py
    """
    from __future__ import division
    from __future__ import print_function
    import tensorflow as tf
    """Expect func_name is 'parse_fn'
    """
    def parse_fn(example):
      with tf.device("/cpu:0"):
        features = tf.parse_single_example(
          example,
          features={
            'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
            'image/format': tf.FixedLenFeature((), tf.string, default_value='png'),
            'image/class/label': tf.FixedLenFeature(
              [], tf.int64, default_value=tf.zeros([], dtype=tf.int64)),
          }
        )
        image = tf.image.decode_jpeg(features['image/encoded'], channels=3)
        label = features['image/class/label']
        return image, label
  2. Configure dataset_map in the datasets/dataset_factory.py file.
    from datasets import cifar10
    datasets_map = {
        'cifar10': cifar10,
    }
  3. When you run the task script, specify the dataset_name=cifar10 and train_files=cifar10_train.tfrecord parameters. After the configuration, you can use the cifar10 file for model training.
Note If you want to read data in other formats, you must build a dataset pipeline based on the required logic. For more information, see the utils/dataset_utils.py file.

Hyperparameters

FastNN supports the following types of hyperparameters:
  • Dataset hyperparameters: the parameters used to determine the basic attributes of training datasets, such as dataset_dir in which a training dataset is stored
  • Data preprocessing hyperparameters: the data preprocessing functions and dataset pipeline parameters
  • Modeling hyperparameters: the basic parameters for model training, including model_name and batch_size
  • Learning rate hyperparameters: the learning rate parameters and tuning parameters
  • Optimizer hyperparameters: the optimizer parameters
  • Log hyperparameters: the output log parameters
  • Performance tuning hyperparameters: the tuning parameters, such as mixed precision
The following example shows the format of a hyperparameter file:
enable_paisora=True
batch_size=128
use_fp16=True
dataset_name=flowers
dataset_dir=oss://pai-online-beijing.oss-cn-beijing-internal.aliyuncs.com/astnn-data/flowers/
model_name=inception_resnet_v2
optimizer=sgd
num_classes=5
job_name=worker
  • Dataset hyperparameters
    Parameter Type Description
    dataset_name string The name of the input data parsing file. Valid values: mock, cifar10, mnist, and flowers. For more information, see all data parsing files in the images/datasets directory. Default value: mock.
    dataset_dir string The absolute path of the input dataset. Default value: None.
    num_sample_per_epoch integer The total number of dataset samples. This parameter is used with learning rate decay.
    num_classes integer The number of sample classes. Default value: 100.
    train_files string The file name of all training data. Separate multiple names with commas (,). Example: 0.tfrecord,1.tfrecord.
  • Data preprocessing hyperparameters
    Parameter Type Description
    preprocessing_name string This parameter is used together with model_name to specify the name of the data preprocessing method. For more information about valid values, see the preprocessing_factory file in the images/preprocessing directory. Default value: None, which indicates that the data is not preprocessed.
    shuffle_buffer_size integer The size of the buffer pool for sample-based shuffles when a data pipeline is created. Default value: 1024.
    num_parallel_batches integer The number of parallel threads multiplied by batch_size to equal map_and_batch. This parameter helps specify the parallel granularity of parsing samples. Default value: 8.
    prefetch_buffer_size integer The number of batches of data prefetched by the data pipeline. Default value: 32.
    num_preprocessing_threads integer The number of threads used by the data pipeline to prefetch data in parallel. Default value: 16.
    datasets_use_caching bool Specifies whether to enable caching for compressed input data with memory overheads. Default value: False, which indicates that caching is disabled.
  • Modeling hyperparameters
    Parameter Type Description
    task_type string The type of the task. Valid values:
    • pretrain: model pre-training. This is the default value.
    • finetune: model optimization.
    model_name string The model to be trained. Valid values include all models in images/models. You can set this parameter based on all models defined in the images/models/model_factory file. Default value: inception_resnet_v2.
    num_epochs integer The number of training rounds for the training dataset. Default value: 100.
    weight_decay float The weight decay factor during model training. Default value: 0.00004.
    max_gradient_norm float Specifies whether to perform gradient clipping based on the global normalization value. Default value: None, which indicates that gradient clipping is not performed.
    batch_size integer The amount of data that one card processes in an iteration. Default value: 32.
    model_dir string The path to reload the checkpoint. Default value: None, which indicates that model tuning is not performed.
    ckpt_file_name string The name of the file that reloads the checkpoint. Default value: None.
  • Learning rate hyperparameters
    Parameter Type Description
    warmup_steps integer The number of iterations for inverse learning rate decay. Default value: 0.
    warmup_scheme string The scheme of inverse learning rate decay. Set the value to t2t (Tensor2Tensor). This value indicates that the learning rate is initialized to be 1/100 of the specified learning rate and then is exponentiated to inverse-decay to the specified learning rate.
    decay_scheme string The scheme of learning rate decay. Valid values:
    • luong234: Start 4 rounds of decay with a factor of 1/2 after 2/3 of total iterations are completed.
    • luong5: Start 5 rounds of decay with a factor of 1/2 after 1/2 of total iterations are completed.
    • luong10: Start 10 rounds of decay with a factor of 1/2 after 1/2 of total iterations are completed.
    learning_rate_decay_factor float The factor of learning rate decay. Default value: 0.94.
    learning_rate_decay_type string The type of learning rate decay. Valid values: fixed, exponential, and polynomial. Default value: exponential.
    learning_rate float The initial learning rate. Default value: 0.01.
    end_learning_rate float The minimum learning rate during decay. Default value: 0.0001.
  • Optimizer hyperparameters
    Parameter Type Description
    optimizer string The name of the optimizer. Valid values: adadelta, adagrad, adam, ftrl, momentum, sgd, rmsprop, adamweightdecay. Default value: rmsprop.
    adadelta_rho float The decay factor of Adadelta. Default value: 0.95.
    adagrad_initial_accumulator_value float The initial value of the Adagrad accumulator. Default value: 0.1. This parameter is specific to the Adagrad optimizer.
    adam_beta1 float The exponential decay rate in primary momentum prediction. Default value: 0.9. This parameter is specific to the Adam optimizer.
    adam_beta2 float The exponential decay rate in secondary momentum prediction. Default value: 0.999. This parameter is specific to the Adam optimizer.
    opt_epsilon float The offset of the optimizer. Default value: 1.0. This parameter is specific to the Adam optimizer.
    ftrl_learning_rate_power float The idempotent parameter of the learning rate. Default value: -0.5. This parameter is specific to the FTRL optimizer.
    ftrl_initial_accumulator_value float The starting point of the FTRL accumulator. Default value: 0.1. This parameter is specific to the FTRL optimizer.
    ftrl_l1 float The regularization term of FTRL l1. Default value: 0.0. This parameter is specific to the FTRL optimizer.
    ftrl_l2 float The regularization term of FTRL l2. Default value: 0.0. This parameter is specific to the FTRL optimizer.
    momentum float The momentum parameter of MomentumOptimizer. Default value: 0.9. This parameter is specific to the Momentum optimizer.
    rmsprop_momentum float The momentum parameter of RMSPropOptimizer. Default value: 0.9.
    rmsprop_decay float The decay factor of RMSProp. Default value: 0.9.
  • Log hyperparameters
    Parameter Type Description
    stop_at_step integer The total number of training iterations. Default value: 100.
    log_loss_every_n_iters integer The iterative frequency at which the loss information is printed. Default value: 10.
    profile_every_n_iters integer The iterative frequency at which the timeline is printed. Default value: 0.
    profile_at_task integer The index of the machine that generates the timeline. Default value: 0, which corresponds to the index of the chief worker.
    log_device_placement bool Specifies whether to print the device placement information. Default value: False.
    print_model_statistics bool Specifies whether to print the trainable variable information. Default value: false.
    hooks string The training hooks. Default value: StopAtStepHook,ProfilerHook,LoggingTensorHook,CheckpointSaverHook.
  • Performance tuning hyperparameters
    Parameter Type Description
    use_fp16 bool Specifies whether to perform semi-precision training. Default value: True.
    loss_scale float The factor of the loss scale during training. Default value: 1.0.
    enable_paisoar bool Specifies whether to use the PAISoar framework. Default value: True.
    protocol string By default, the grpc.rdma cluster can use grpc+verbs to improve data access efficiency.

Develop a main file

If the existing models cannot meet your requirements, you can use the dataset, models, and preprocessing APIs for further development. Before that, you must understand the basic process of FastNN. In this example, images are used and the code entry file is train_image_classifiers.py. The following code shows the code architecture:
# Initialize the required model based on model_name to obtain network_fn. The input parameter train_image_size may be returned.
    network_fn = nets_factory.get_network_fn(
            FLAGS.model_name,
            num_classes=FLAGS.num_classes,
            weight_decay=FLAGS.weight_decay,
            is_training=(FLAGS.task_type in ['pretrain', 'finetune']))
# Initialize the required data preprocessing function based on model_name or preprocessing_name to obtain preprocess_fn.
    preprocessing_fn = preprocessing_factory.get_preprocessing(
                FLAGS.model_name or FLAGS.preprocessing_name,
                is_training=(FLAGS.task_type in ['pretrain', 'finetune']))
# Select the valid TFRecord format based on dataset_name and synchronously call preprocess_fn to parse the dataset and obtain dataset_iterator.
    dataset_iterator = dataset_factory.get_dataset_iterator(FLAGS.dataset_name,
                                                            train_image_size,
                                                            preprocessing_fn,
                                                            data_sources,
# Call network_fn and dataset_iterator to define the function loss_fn that is used to calculate the loss.
    def loss_fn():
      with tf.device('/cpu:0'):
          images, labels = dataset_iterator.get_next()
        logits, end_points = network_fn(images)
        loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=tf.cast(logits, tf.float32), weights=1.0)
        if 'AuxLogits' in end_points:
          loss += tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=tf.cast(end_points['AuxLogits'], tf.float32), weights=0.4)
        return loss
# Call the PAISoar API to encapsulate the native optimizer of loss_fn and tf.
    opt = paisoar.ReplicatedVarsOptimizer(optimizer, clip_norm=FLAGS.max_gradient_norm)
    loss = optimizer.compute_loss(loss_fn, loss_scale=FLAGS.loss_scale)
# Define training tensors based on opt and loss.
    train_op = opt.minimize(loss, global_step=global_step)