How to evaluate the effect of deep learning model

1. Background

The debugging, visualization and evaluation of the training effect of the machine learning training process have always been difficult problems in the industry. In the case of less data and simpler models, such as LR, GBDT, SVM, and few super parameters, the adjustability and interpretability of the model are guaranteed to a certain extent, so we use simple training and then observe the recall/precision Indicators such as /AUC can be dealt with.

In the era of deep learning, the complexity of the model is far beyond imagination. The layer-by-layer nested network structure, the selection of optimizers and a large number of hyperparameters, and the continuation of features all together build a complex deep model. If the effect is not good, there are various reasons. In order to locate and solve these problems, the algorithm research and development students need to spend a lot of energy and try repeatedly, and it is very likely that they will not get an accurate answer. In simple terms, the network model approximates a black box.

2. Deep Insight

Through research, we found that a large number of intermediate indicators in the training and evaluation process can have a relationship with the model effect. Through systematic analysis and modeling of tensors, gradients, weights and update quantities, it can play an auxiliary decision-making role in algorithm tuning and problem positioning. Moreover, by improving the AUC algorithm and analyzing more evaluation indicators such as ROC, PR, and estimated distribution, the effect of the model can be evaluated more comprehensively.

After more than two months of hard work, we launched the DeepInsight platform, which is dedicated to solving a series of problems such as current model debugging and problem location. After submitting the model to start training, users can use the DeepInsight platform to view and analyze the training process in one stop, from training intermediate indicators to predictive indicators to performance data. For obvious problems in training, the platform will also highlight and give prompts. In the future, we hope that the platform can better help users discover and locate problems in training, and give appropriate prompts (such as changing the optimization algorithm of some sub-networks, changing the learning rate momentum, etc.), just like GDB for C++ .

2.1 Goals

Precipitate and persist training data. The data of deep learning is very precious. The network topology, parameters, training intermediate process, and model evaluation indicators of each training will be stored persistently, which is convenient for subsequent manual analysis and secondary modeling;

Precipitate the understanding of model training, provide analysis and optimization methods, assist decision-making, and avoid various known problems at the same time;

Use big data analysis and modeling to find the relationship between intermediate process indicators and better assist decision-making. We call this goal Model on Model, that is, use new models to analyze and evaluate in-depth models;

On the basis of big data analysis and modeling, try to perform deep reinforcement learning (DRL) on existing models to improve the efficiency of deep learning debugging.

2.2 Architecture

The system is mainly divided into four layers: input layer, analysis layer, evaluation layer, output layer;

It also includes five major components: Tensorboard+ visual analysis; TensorViewer log display comparison; TensorDealer integrated configuration; TensorTracer data disclosure; TensorDissection analysis and tuning.

2.3 Progress

2.3.1 TensorBoard+, a high-performance visualization component

Google's TensorBoard (referred to as TB) is a visual component of TensorFlow (referred to as TF), which can view the network structure and intermediate indicators of deep learning. The original TB runs in a stand-alone command line mode, which cannot be used by multiple users; the usability is poor, and the current process needs to be killed every time the log path is switched; at the same time, the performance is also poor, and the loading of industrial model data immediately freezes; the index layering is chaotic , thousands of indicators are all listed and cannot be viewed; the complex functions of usage are weak, and it does not support secondary data comparison of displayed graphics, and does not support X-axis floating-point data display, etc.

Therefore, we reconstructed the core code of TB to support GB-level log loading and data layering, transformed the entire service into a multi-user version, and used Docker to flexibly manage resources and automatically recycle them. The UI supports highlighting custom indicators, hierarchical display, data comparison, log upload, etc., as follows:

Support online change of TF log path

Support online aggregation and comparison of graph data

Support X-axis floating-point value type display

Support graphic data Hightlight sub-dimension display

Support manual adjustment of front-end timing refresh time, display data in real time

2.3.2 Integrated configuration log management system TensorViewer

TF tasks lack effective management, and users cannot view and analyze data on demand, let alone review historical data. We opened up the channel between TF and DeepInsight to collect information about all tasks. Users can view the real-time data and all historical data of each training, and support multi-task comparative analysis; at the same time, it supports one-click jump to Tensorboard+, directly for the current log data for visualization.

2.3.3 Improving TensorFlow's visualization data reveal

We have defined a set of data exfiltration methods, which can exfiltrate all internal data into a unified summary format and be processed by Tensorboard+. Since the PS architecture does not have a master to centrally process intermediate data, and the disclosure of indicators such as tensors and gradients is extremely resource-intensive, how to disclose data is worthy of in-depth study. At present, we are exposing data on Worker0, which can meet the requirements of general model training. In the future, we will study the Snapshot data exfiltration scheme, which can also achieve better results in large-scale networks.

At present, we have initially analyzed the process indicators revealed by Tensorflow, and are exploring supervised and unsupervised modeling on these massive indicators.

2.3.4 Improved Model Evaluation Indicators

Tensorflow's built-in AUC calculation method has fewer buckets, and the calculation accuracy has bugs. It has insufficient performance when processing large amounts of data. Moreover, it can only calculate AUC and cannot draw ROC, PR and other curves.

We have improved the calculation method, introduced more buckets, and improved calculation efficiency. At the same time, we have drawn more new indicators. The currently drawn indicators include AUC, ROC, PR, volatility, positive and negative sample bucket distribution. By observing the distribution of positive and negative samples, we found that the defect of Tensorflow's asynchronous calculation leads to errors in the number of samples in some buckets, which will cause extremely small fluctuations in AUC. This bug has not yet been resolved. All estimated indicators are seamlessly connected to the DeepInsight platform.

2.3.5 Research model training intermediate indicators

Through in-depth observation and modeling of the training indicators of large-scale Embedding sub-networks, we found that changes in weight (bias) values can reflect whether the relevant network structure is effectively trained. Areas where the weight (bias) values change only slightly are training "dead spots"—parts of the network that have not been trained. By observing the gradient of the weight (bias), it can help us diagnose problems such as gradient dispersion or gradient explosion, analyze and understand the difficulty of training this part of the network, and adjust the optimizer and learning rate settings in a targeted manner. By comprehensively examining the activation and gradient of each part of the entire network, it can help us understand the complex mechanism of mutual coupling and synergistic transmission of multi-channel information in the entire network, so as to design and optimize the model structure more effectively.

The research on intermediate indicators will precipitate and flow back to DeepInsight. After the training indicators are produced, prompts will be given to users to assist in decision-making.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us