Digital Image Processing Course Design---Medical Image Recognition Based on CNN (Convolutional Neural Network)

Medical Image Recognition Introduction:

Medical Image Recognition.Recognition, segmentation and parsing of medical images are the core tasks of medical image analysis. Medical image recognition refers to identifying objects in medical images. In theory, target recognition does not require target detection or localization; but in practice, detection and localization are usually combined to assist in target recognition. Once the recognition or detection is completed, the minimum bounding box of the target is obtained, and the precise boundary of the target object can be found through the segmentation task. When there are multiple target objects in the image, the segmentation of multiple targets becomes a task of semantic parsing, that is, assigning semantic labels to pixels in 2D images or 3D images. Segmentation of the object is accomplished by labeling the same object's pixels or voxels with the same label.

1. Medical Image Recognition.Experimental background


Recognition, segmentation and parsing of medical images are the core tasks of medical image analysis. Medical image recognition refers to identifying objects in medical images. In theory, target recognition does not require target detection or localization; but in practice, detection and localization are usually combined to assist in target recognition. Once the recognition or detection is completed, the minimum bounding box of the target is obtained, and the precise boundary of the target object can be found through the segmentation task. When there are multiple target objects in the image, the segmentation of multiple targets becomes a task of semantic parsing, that is, assigning semantic labels to pixels in 2D images or 3D images. Segmentation of the object is accomplished by labeling the same object's pixels or voxels with the same label.

Medical Image Recognition.The goal of medical image recognition is to parse complex semantics and match with the Basic Human Anatomy Model (FMA) to represent human structural symbols in a human-comprehensible form that can be navigated, parsed, and interpreted by machine systems.

Medical Image Recognition.This paper is mainly based on CNN-based medical image recognition. Through the data comparison and analysis of the three-time network built by myself and the classic network Resnet18, a conclusion is drawn, and the medical image recognition results of the two in four gastric diseases and normal conditions are compared.

2. Medical Image Recognition.The purpose and significance of the experiment


After understanding the above research background and overview, it is necessary to decide how to explore and realize the function of medical image recognition. In the process of realizing this function, gradually understand the connotation of each document, and master a certain ability of digital image processing and deep learning.

 In the design of this course, by comparing the network built by myself and the classic network Resnet18, and comparing horizontally and vertically, we can deeply understand the similarities and differences between the convolutional neural network and the residual neural network from the parameters, network structure and prediction results, and learn more about this The two networks also lay a solid foundation for future scientific research in this area.

Through this study, I hope to be able to master how to use some commonly used functions related to neural networks and deep learning in tensorflow and keras libraries, and to understand some rules for adjusting parameters and network construction, and have a grasp of deep learning. And improve, so that they have an understanding and ideas of the development of big data and artificial intelligence, and can better promote their own progress.

Medical Image Recognition3. Environment construction and data set
3.1 Environment Construction
python3.8, tensorflow+keras , spyder / vscode
3.2 Dataset Preparatio
3.2 Dataset Preparation
stomach (cancer_0, gastric_ulcer_1, gastric_erosion_2, gastric_polyps_3, normal_4, jpg format)
100-500-1000-2000 sheets (train)
100-200 (test/valid)
Medical Image Recognition4. Experimental steps

4.1 Initial Setup
(1) Among them, some libraries are explained as follows:
PIL (Python Image Library): Image Processing Library
itertools : iterators
(2) Commonly used keras callback functions:
EarlyStopping : Stop training to prevent overfitting when the monitored quantity no longer increases or decreases.
ReduceLROnPlateau : Reduce the learning rate when the monitored quantity is no longer increasing or decreasing, when learning stops, the model will always benefit from a 2-10x reduction in the learning rate.
ModelCheckpoint : Save the model with the highest or lowest amount of monitoring during training.

4.2 Data Preprocessing
At first, we only chose normalization as a data preprocessing method, as shown below. Later, during image training, we found that the effect was not very good, so we changed it.
The improvement is shown below. We added staggered transformation, random scaling and random horizontal flipping, because the image we are going to process has a part of the text on the left that will have a certain impact on the final training and prediction, but we did not choose to remove the text. Instead, the method of staggered transformation is chosen to minimize the influence of the text.

In the preprocessing stage, I learned the following knowledge and methods for processing data:
(1) staggered transformation
Image staggered transformation is very useful in image geometric deformation. Common staggered transformations are divided into X-direction and Y-direction staggered transformations. The corresponding mathematical matrices are as follows:

(2) one-hot encoding
One-Hot encoding, also known as one-bit valid encoding, mainly uses N-bit state registers to encode N states , each state has its own register bit, and only one bit is valid at any time.
One-Hot encoding is the representation of categorical variables as binary vectors. This first requires mapping categorical values to integer values. Then, each integer value is represented as a binary vector, which is zero-valued except for the integer's index, and is marked as 1.
The reason for one-hot encoding the label set is that our training model uses categorical_crossentropy as the loss function, which requires that the label set must be one-hot encoded. Therefore, we do encoding transformations for the training, validation and test labels.

(3) Homogenization
Normalization is a way of simplifying calculations, that is, a dimensional expression is transformed into a dimensionless expression and becomes a scalar.
  The purpose of the data set is to increase the convergence speed of the network, reduce the training time, and adapt to the activation function whose value range is between (0, 1) to increase the degree of discrimination. In fact, a particularly important reason for normalization is to ensure that the eigenvalue weights are consistent. For example, when we use a mean square error function such as mse , a large eigenvalue such as (5000-1000)2 and a small eigenvalue (3-1)2 are added and then averaged to obtain the error value, which is obviously a large value. It has the greatest impact on the error value, but in most cases, the weights of the eigenvalues should be the same, and it is only because the units are different that the values are very different. Therefore, we normalize the feature data in advance to solve such problems.

4.3 Medical Image Recognition.Building the model
4.3.1 Convolutional layers
The function of the convolution layer is to perform feature extraction on the input data. It contains multiple convolution kernels. Each element of the convolution kernel corresponds to a weight coefficient and a deviation. The code is as follows.

For the convolutional layer, in addition to calling the conv2D library function, we also need to think about two issues:
(1) How does the convolution layer select the convolution kernel?
  
Small and deep convolution kernels should be selected. Small convolution kernels alone are not good. Only by stacking many small convolution kernels can the performance of the model be improved.

The convolution kernel of CNN corresponds to a receptive field, which makes each neuron do not need to feel the global image, and each neuron only feels a local image area, and then at a higher level, these neurons that feel different parts are integrated to get global information. One benefit of doing this is that it reduces the number of parameters for training.
(2) The difference between the same and valid in the convolutional layer padding

4.3.2 Medical Image Recognition.Activation function layer


The Activation Function is a function that runs on the neurons of the artificial neural network and is responsible for mapping the input of the neuron to the output. The activation function is introduced to increase the nonlinearity of the neural network model.

We use the Relu activation function: before the advent of ReLU , the sigmoid function and the hyperbolic tangent function were commonly used activation functions.
The Relu function model is as follows:

4.3.3 Pooling layer
Compressing the input feature map, on the one hand, makes the feature map smaller and simplifies the computational complexity of the network; on the other hand, it performs feature compression to extract the main features. In the design of this course, we use the max pooling method.
The principle of max pooling is shown in the following figure:

4.3.4 Dropout layer
In some models of machine learning, if the model has too many parameters and too few training samples, the trained model is prone to overfitting. Overfitting means that the model has a relatively small loss function on the training data and a high prediction accuracy, but has a relatively large loss function on the test data and a low prediction accuracy.
Dropout is based on the structure of the standard bp network, so that the activation value of the hidden layer of the bp network becomes 0 with a certain proportion v, that is, according to a certain proportion v, a part of the hidden layer nodes are randomly invalidated. Make the algorithm use a larger learning rate to speed up the learning rate.

4.3.5 Flatten layer
The Flatten layer is used to "flatten" the input, that is, to make the multi-dimensional input one-dimensional, which is often used in the transition from the convolutional layer to the fully connected layer.

4.3.6 Fully connected layer (Dense)
" classifier " in the entire convolutional neural network . If the operations such as convolutional layer, pooling layer and activation function layer are to map the original data to the hidden layer feature space, the fully connected layer plays the role of " mapping the learned distributed feature representation " to the sample label space .

4.3.7 Softmax
We do the final classification via the softmax function.
  Softmax can be understood as normalization. For example, there are currently five types of image classifications. The output of the softmax layer is a five-dimensional vector. The first value in the vector is the probability value of the image belonging to the first category. The first value in the vector The two values are the probability values that the current picture belongs to the second category...the sum of these five-dimensional vectors is 1.

4.3.8 The third network built by myself
The network built for the first time is 8 layers, including 2 convolution layers, 2 activation function layers, 1 pooling layer, 1 fully connected layer, 1 Flatten layer, and 1 classification layer. The network is as follows Show:

The network built for the second time is 13 layers, including 3 convolution layers, 3 activation function layers, 2 pooling layers, 1 fully connected layer, 2 Dropout layers, 1 Flatten layer, and 1 classification. layer, the network looks like this:

The network built for the third time is 17 layers, including 4 convolution layers, 5 activation function layers, 1 pooling layer, 2 fully connected layers, 3 Dropout layers, 1 Flatten layer, 1 classification layer, the network looks like this:

5. Medical Image Recognition.Experimental conclusion
5.1 Comparison of training parameters

From the above figure, we can see that the 13-layer network needs to train the fewest parameters, so the parameters are significantly reduced after adding a pooling layer. It can be seen that the pooling layer is the main reason for the reduction of parameters.
5.2 Comparison of activation functions

After the comparison, we chose the ReLU activation function, and we found that Relu is easier to learn and optimize, because of its piecewise linear nature, its pre-pass, post-pass, and derivation are all piece-wise linear. The traditional sigmoid function, due to saturation at both ends, is easy to discard information during the propagation process, resulting in gradient dispersion. When using the gradient descent method, the weights of the first few layers change so slowly that they cannot be extracted from the sample. learn effectively.
5.3 Comparison of Loss Functions
In order to quantify the fitting effect of the neural network, we construct a loss function

 categorical_crossentropy is the cross entropy loss function, which is used to evaluate the difference between the probability distribution obtained by the current training and the real distribution. It depicts the distance between the actual output (probability) and the expected output (probability), that is, the smaller the value of cross entropy, the closer the two probability distributions are.
with the selection of categorical_crossentropy , we finally choose categorical_crossentropy , the reason is that the loss function constructed by ourselves is drawing the loss graph, the change of the curve is not as fast as categorical_crossentropy , and the decline interval of loss is small, so we choose categorical_crossentropy .

5.4 Comparison of Gradient Descent Algorithms
Batch gradient descent uses the entire training set for each learning, so the advantage is that each update will go in the right direction, and finally it can be guaranteed to converge to the extreme point (convex functions converge to the global extreme points, non-convex functions may Convergence to local extreme points), but its disadvantage is that each learning time is too long, and if the training set is so large that it needs to consume a lot of memory, and the full gradient descent cannot perform online model parameter update.
The stochastic gradient descent algorithm only randomly selects one sample at a time to update the model parameters, so each learning is very fast and can be updated online. The biggest disadvantage is that each update may not go in the right direction, so optimization fluctuations (disturbances) can be introduced.
Our model chooses the Adam optimizer, a method of adaptive learning rate gradient descent, as shown in the following figure:

5.5 Learning rate, number of iterations, batchsize
monitor: The monitored value, which can be accuracy, val_loss, val_accuracy .
factor: The value to scale the learning rate, the learning rate will be reduced in the form lr = lr *factor.
patience: When patience epochs have passed and the model performance does not improve, the action of reducing the learning rate is triggered.

5.6 Display of experimental results
5.6.1 The first training (three networks)
The data set we selected for the first training is train500+test100+val100, and the results are shown in the following figure:
From this result, we can see that because there is no dropout in the 8-layer network constructed for the first time, the results shown in the figure appear. The accuracy of the validation set is much lower than that of the training set, and the 17-layer network is compared with 13 There are two more layers of dropout in the layer network, and the accuracy of the validation set is far lower than that of the training set. Compared with the 8-layer network, a pooling layer and dropout are added to the 13-layer network. After the training, the accuracy of the training set can reach about 80%, and the accuracy of the validation set can reach about 70%. Therefore, after comparison, we finally chose a 13-layer network for the next training.
5.6.2 The second training (three iterations under one network)
The data set we selected for the second training is train1000+test200+val200. This time, we let the 13-layer network selected after the first training iterate 30, 50, and 100 times respectively, and analyze and compare the results under different iterations. ,As shown below:
From this result, we can see that with the increase of the number of iterations, the accuracy of the training set and the validation set tends to be saturated, and this phenomenon is already obvious when the number of iterations is 30, so in the next During the training process, we can choose epochs as 30 to analyze the training results.
5.6.3 The third training (compared with the classic network resnet18)
The data set we selected for the third training is train2000+test200+val200. This time, we train the 13-layer network and the classic network resnet18 with epochs of 30 to analyze the network we built and the classic network. The results of the course design of medical image recognition are shown in the following figure:
From this result, we can see that the final accuracy of the network we built on the validation set can reach close to 80%, while the accuracy of the classical network resnet on the validation set can reach about 85%. By querying Related information We found that there is a problem of gradient disappearance or explosion in deep networks, which makes it difficult to train deep learning models.
The core idea of Resnet is to change the learning purpose of the network structure. The original learning is the image feature H(X) obtained directly through convolution, and now it is the residual H(X)-X between the image and the feature, so change The reason is because residual learning is easier than direct learning of raw features.
5.6.4 Evaluation Model
To evaluate the model, we choose the confusion matrix for evaluation. The following is a brief description of the confusion matrix we understand by consulting relevant materials.
Confusion matrix ( confusionmatrix ), also known as error matrix, is a standard format for expressing accuracy evaluation, expressed in the form of a matrix with n rows and n columns. Specific evaluation indicators include overall accuracy, mapping accuracy, user accuracy, etc. These accuracy indicators reflect the accuracy of image classification from different aspects.
In artificial intelligence, confusion matrix is a visualization tool, especially for supervised learning, and is generally called matching matrix in unsupervised learning. In the image accuracy evaluation, it is mainly used to compare the classification results and the actual measured values, and the accuracy of the classification results can be displayed in a confusion matrix. The confusion matrix is computed by comparing the location and classification of each measured pixel with the corresponding location and classification in the classified image.
The confusion matrix of the final result of this course design is shown in the following figure:

From the confusion matrix in the above figure, we can see that 43% of all cancer pictures in the test set are predicted to be cancer, which is the highest among the five categories; 57% of all gastric ulcer pictures in the test set are predicted to be cancer. were predicted to be gastric polyps, and only 13% were predicted to be gastric ulcers; among all gastric polyp pictures in the test set, 40% were predicted to be normal, and only 13% were predicted to be gastric polyps; Among all gastric erosion pictures in the test set, 40% are predicted to be gastric erosion, which is the highest in the five categories; among all the normal pictures in the test set, 48% are predicted as normal, which is the highest in the five categories.
The following shows a few prediction results, where the picture above is the prediction name and its probability .

Medical Image Recognition6. Major innovations

6.1 Neural Networks with Different Layers
The network built for the first time is 8 layers, including 2 convolution layers, 2 activation function layers, 1 pooling layer, 1 fully connected layer, 1 Flatten layer, and 1 classification layer. The training parameters are 10495909.
The network built for the second time is 13 layers, including 3 convolution layers, 3 activation function layers, 2 pooling layers, 1 fully connected layer, 2 Dropout layers, 1 Flatten layer, 1 Classification layer, the training parameters are 5271525. Due to the increase of the pooling layer, the number of parameters is greatly reduced, which reduces the time required for training.
The network built for the third time is 17 layers, including 4 convolutional layers, 5 activation function layers, 1 pooling layer, 2 fully connected layers, 3 Dropout layers, 1 Flatten layer, 1 The classification layer has 21000165 training parameters. Because of the reduction of pooling layer and the increase of dropout, the number of parameters increases greatly, which increases the time required for training.
6.2 Self-built loss function
This loss function is inspired by the neural network and deep learning explained by Mr. Wu Enda, so such a loss function is constructed. For this loss function, suppose we think that the predicted value and the real value are 0 or 1, when the real value is 0 or 1 When the value is 1, the loss function is simplified to, when the loss value is 0, the predicted value is just 1; similarly, when the real value is 0, the loss function is simplified to, when the loss value is 0, the predicted value is just right The value is also 0.
Through the above reasoning, we come to a conclusion that the loss function constructed by ourselves can theoretically fit the relationship between the real value and the predicted value well, so that the changes of the two values are consistent, but in this course design, The change interval of the loss function is smaller than that of the categorical cross entropy , so the loss function we choose in the actual training is the categorical cross entropy.

Medical Image Recognition7. System Analysis

7.1 Overall analysis of this project
After this project, our group has basically realized the function of medical image recognition to predict gastric diseases. At the same time, we also have our own innovations, and we have also deepened the learning of digital image processing, neural networks, deep learning and other related knowledge.
Our conclusions are as follows:
First of all, when building a neural network, it is not good to have too many or too few layers. It is necessary to build a suitable neural network according to your own needs and the situation of the data set. At the same time, pay attention to the convolution layer, activation function and pooling layer. The corresponding relationship between network layers such as dropout, dropout, etc., it should be noted that in the neural network, the dimension of the output of the upper layer must correspond to the dimension of the input of the next layer, otherwise the built neural network will be due to the dimension Inconsistent and error is reported, normal training and prediction cannot be performed.
Secondly, there are some texts on the left side of the original data set. We did not think of cutting out these texts at first, but adopted preprocessing methods such as staggered transformation to minimize the impact of texts on the final prediction results. It may be the reason for this part of the text, so our final accuracy rate is not very high.
Finally, our innovation is to build a neural network with different layers, and through previous learning, we built a loss function and compared it with the classification cross entropy . At the same time, we used the confusion matrix when evaluating the model. To evaluate the final results of this project, it is found that the expected results are basically achieved, and gastric ulcers and gastric polyps may be due to the relatively small difference between the two, so in the final identification results, these two types of identification are correct and accurate. rate is not very high.
7.2 Problems encountered during the project and their solutions
(1) The problem of importing the package
  The version of tensorflow and keras do not match, resulting in the later code always reporting errors, and some library functions cannot be used.
(2) The way of data preprocessing
From the initial homogenization, it was found that the processing effect was not very good, and later preprocessing methods such as staggered transformation and random horizontal flip were added.
(3) Division of datasets
There are some text descriptions on the left side of most of the pictures . This is a shortcoming in the design of this course. In the early stage , we screen out a small number of pictures without text, and then conduct training and prediction.
(4) The reason why the accuracy of the validation set (test set) is higher than that of the training set
  ① The data set is too small, which will lead to uneven segmentation of the data set, that is to say, the distribution of the training set and the test set is uneven. If the model can correctly capture the internal distribution pattern of the data, it may cause training The internal variance of the set is greater than that of the validation set, which will cause a larger error in the training set. At this time, the data set needs to be re-divided to make the distribution the same.
   ② Too much model regularization, such as too much dropout during training, is quite different from the model during validation, and there will be no dropout during validation.
Dropout basically ensures that the accuracy on the test set is the best and better than the accuracy on the training set. Dropout forces the neural network to be a very large ensemble of weak classifiers, which means that a single classifier does not have much classification accuracy, they only become stronger when they are strung together.
And during training, Dropout cuts out a random set of these classifiers, so the training accuracy will suffer; during testing, Dropout will automatically turn off and allow all weak classifiers in the neural network to be used, so the test accuracy improve.
  ③The accuracy of the training set is generated after each batch, while the accuracy of the validation set is generally generated after an epoch. The model for validation is after training batches, and there is a lag, so it can be said that training is used A similar model is used for verification, of course, the accuracy rate is higher.
  ④ The data of the training set has undergone a series of preprocessing, such as rotation, affine, blur, adding noise, etc. Excessive preprocessing leads to changes in the distribution of the training set, so the accuracy of the training set is lower than that of the validation set.
8. Summary and improvement
The only regret about this medical image recognition is that the text on the left side of the picture was not removed for training and prediction. I also hope to apply the knowledge learned in this project to the future study and life more proficiently. , and the innovative point of this project is on the built-in neural network and the loss function of its own architecture. At the same time, after listening to other people's defenses, I also learned some other classic networks and more effective data preprocessing methods. The ability to find mistakes, I hope I can make myself do better in the next study.
Copyright statement: The content of this article is contributed by the real-name registered users of Alibaba Cloud

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00