How to handle multiple types of degenerate convolution superresolution
1. introduction
The purpose of single image super-resolution (SISR) is to obtain a clear high-resolution (HR) image from a single low-resolution (LR) image input. In general, the LR image y is a sharp HR image x obtained by the following degradation process.
Where image.png represents the convolution between the HR clear image x and the blur kernel k, represents the downsampling operator with coefficient s, and n represents the additive white Gaussian noise (AWGN) with standard deviation (noise level).
SISR methods are mainly divided into three categories: interpolation-based methods, model-based methods, and discriminative learning-based methods.
Interpolation-based methods (eg nearest neighbor interpolation, bicubic interpolation) are fast but less effective. The model-based method introduces image priors, such as non-local similarity priors, denoising priors, etc., and then solves the objective function to obtain HR images with better visual quality, but the speed is slow. Although the combination of CNN-based denoising priors can improve the speed to some extent, it is still limited by some disadvantages, such as: end-to-end training cannot be performed, and some parameters that are difficult to adjust are included.
Discriminative learning-based methods, especially CNN-based methods, have received widespread attention in recent years due to their fast speed and end-to-end learning, and have gradually become the mainstream method for solving SISR.
Since the first work SRCNN using CNN to solve SISR was published in ECCV (2014), various improvement methods have been proposed one after another. For example, VDSR has achieved a very large improvement in PSNR indicators; ESPCN and FSRCNN have improved in speed respectively; SRGAN has proposed an effective method for improving visual effects when the magnification is large.
However, these methods all have a common shortcoming, that is, they only consider the bicubic downsampling degradation model and cannot flexibly extend their models to handle other degradation types simultaneously (non-blind). Due to the variety of degradation processes in real images, the effective practical application scenarios of such methods are very limited.
Some SISR works have pointed out that the accuracy of the blur kernel in the process of image degradation plays a crucial role in SISR, but there is no related work based on CNN that takes factors such as the blur kernel into account. This leads to the main question addressed in this paper: Is it possible to design a non-blind SISR model to solve different types of image degradation?
2. method
This paper first analyzes the SISR method under the maximum a posteriori (MAP) framework, hoping to guide the design of the CNN network structure. Due to the ill-posed nature of the SISR problem, it is usually necessary to introduce a regular term to constrain the solution space. Specifically, the HR image x corresponding to the LR image y can be approximated by solving the following problem: .
Among them is the likelihood (that is, data fidelity) item, is the prior (that is, regular) item, and is the trade-off parameter between the likelihood item and the prior item.
In simple terms, the above formula contains two points:
The estimated HR image must not only conform to the degradation process of SISR, but also satisfy the prior characteristics of clear images;
For the non-blind super-resolution problem, the solution of x is related to the LR image y, blur kernel k, noise level and trade-off parameters.
In short, the MAP estimation of non-blind SISR can be expressed as, where is the parameter in the MAP estimation. Furthermore, if CNN is regarded as another form of MAP estimation solution, then the following conclusions can be drawn:
Since the data fidelity term corresponds to the degradation process of SISR, accurate modeling of the degradation process plays a crucial role in the results of SISR. However, the existing CNN-based methods aim to solve the following problem: image.png
. Its usefulness is very limited since it does not take factors such as blur kernels and noise into account.
In order to design a more effective CNN-based SISR model, more types of image degradation should be taken into account. A simple idea is to use the blur kernel k and noise level as the input of the network. Since the trade-off parameters can be integrated into the noise level, the CNN mapping function can be simplified as follows: image.png
Since most of the parameters in MAP estimation correspond to the image prior, and the image prior is not related to the image degradation process, a single CNN model has the modeling ability to handle different degradation types.
Through the above analysis, it can be concluded that the non-blind SISR should also take the blur kernel and noise level in the degradation model as the input of the network. However, the dimensions of the LR image, blur kernel and noise level are different, so they cannot be directly used as the input of CNN.
To this end, this paper proposes a dimension stretching strategy. Assuming that the size of the LR image is , first reduce the dimensionality of the vectorized blur kernel PCA, and then combine it with the noise level to obtain a t+1-dimensional vector v, and then stretch v into a tensor of dimension, we will this tensor It is called a degradation map (Degradation Maps), where all elements of the i-th map are .
At this point, we can merge the degradation map and LR image together as the input of CNN. In order to prove the effectiveness of this strategy, a fast and effective ESPCN super-resolution network structure framework is selected. It is worth noting that in order to speed up the convergence speed of the training process, and considering that the LR image contains Gaussian noise, the Batch Normalization layer is added to the network.
Figure 2 presents the structural framework of the proposed Super-Resolution Network (SRMD for short).
3. experiment
In the training phase, SRMD uses isotropic and anisotropic Gaussian blur kernels, Gaussian white noise with noise levels between [0, 75], and a bicubic downsampling operator. It should be pointed out that SRMD can be extended to other downsampling operators and even other degradation models.
In the test phase, SRMD compared the PSNR and SSIM results of different methods under the same bicubic downsampling degradation (as shown in Table 1). It can be seen that although SRMD is used to deal with various types of degradation, it still achieves good results under bicubic downsampling degradation. It should be pointed out that SRMD also has a great advantage in speed. It only takes 0.084 seconds to process a 512×512 LR image on the Titan Xp GPU, which is half the time it takes for VDSR super-resolution to double.
Table 2 shows the comparison of PSNR and SSIM results under different degradation types. It can be seen that SRMD has also achieved good results. Figure 4 illustrates that SRMD can set a non-uniform degradation map, and then can process LR images with non-uniform degradation spaces. Finally, Figure 5 shows the comparison of visual effects of different methods on real images. It can be seen that the HR image restored by SRMD is significantly better than other methods in visual effect.
Table 1: Comparison of PSNR and SSIM results of different methods under bicubic downsampling degradation (where SRMDNF represents the model trained without considering noise).
Table 2: Comparison of PSNR and SSIM results of different methods under different degradation types.
Figure 4: Illustrates that SRMD can handle situations where the degradation space is not uniform. (a) Spatial distribution of noise level and blur kernel width; (b) LR image (enlarged by nearest neighbor interpolation); (c) restored HR image (enlarged twice).
Figure 5: Comparison of the visual effects of different methods on the SISR classic test image "Chip" with four times the super-resolution.
4. in conclusion
To sum up, the main contributions of this paper are in three aspects:
*Proposed a simple, effective, and scalable super-resolution model, which can not only handle the bicubic downsampling degradation model, but also can handle multiple and even degradation types with uneven degradation spaces, providing a practical application for SISR solution.
* A simple and effective dimension stretching strategy is proposed to enable convolutional neural networks to process inputs of different dimensions, and this strategy can be extended to other applications.
* Experiments show that the super-resolution network model trained with synthetic images can effectively deal with the complex degradation types of real images.
The purpose of single image super-resolution (SISR) is to obtain a clear high-resolution (HR) image from a single low-resolution (LR) image input. In general, the LR image y is a sharp HR image x obtained by the following degradation process.
Where image.png represents the convolution between the HR clear image x and the blur kernel k, represents the downsampling operator with coefficient s, and n represents the additive white Gaussian noise (AWGN) with standard deviation (noise level).
SISR methods are mainly divided into three categories: interpolation-based methods, model-based methods, and discriminative learning-based methods.
Interpolation-based methods (eg nearest neighbor interpolation, bicubic interpolation) are fast but less effective. The model-based method introduces image priors, such as non-local similarity priors, denoising priors, etc., and then solves the objective function to obtain HR images with better visual quality, but the speed is slow. Although the combination of CNN-based denoising priors can improve the speed to some extent, it is still limited by some disadvantages, such as: end-to-end training cannot be performed, and some parameters that are difficult to adjust are included.
Discriminative learning-based methods, especially CNN-based methods, have received widespread attention in recent years due to their fast speed and end-to-end learning, and have gradually become the mainstream method for solving SISR.
Since the first work SRCNN using CNN to solve SISR was published in ECCV (2014), various improvement methods have been proposed one after another. For example, VDSR has achieved a very large improvement in PSNR indicators; ESPCN and FSRCNN have improved in speed respectively; SRGAN has proposed an effective method for improving visual effects when the magnification is large.
However, these methods all have a common shortcoming, that is, they only consider the bicubic downsampling degradation model and cannot flexibly extend their models to handle other degradation types simultaneously (non-blind). Due to the variety of degradation processes in real images, the effective practical application scenarios of such methods are very limited.
Some SISR works have pointed out that the accuracy of the blur kernel in the process of image degradation plays a crucial role in SISR, but there is no related work based on CNN that takes factors such as the blur kernel into account. This leads to the main question addressed in this paper: Is it possible to design a non-blind SISR model to solve different types of image degradation?
2. method
This paper first analyzes the SISR method under the maximum a posteriori (MAP) framework, hoping to guide the design of the CNN network structure. Due to the ill-posed nature of the SISR problem, it is usually necessary to introduce a regular term to constrain the solution space. Specifically, the HR image x corresponding to the LR image y can be approximated by solving the following problem: .
Among them is the likelihood (that is, data fidelity) item, is the prior (that is, regular) item, and is the trade-off parameter between the likelihood item and the prior item.
In simple terms, the above formula contains two points:
The estimated HR image must not only conform to the degradation process of SISR, but also satisfy the prior characteristics of clear images;
For the non-blind super-resolution problem, the solution of x is related to the LR image y, blur kernel k, noise level and trade-off parameters.
In short, the MAP estimation of non-blind SISR can be expressed as, where is the parameter in the MAP estimation. Furthermore, if CNN is regarded as another form of MAP estimation solution, then the following conclusions can be drawn:
Since the data fidelity term corresponds to the degradation process of SISR, accurate modeling of the degradation process plays a crucial role in the results of SISR. However, the existing CNN-based methods aim to solve the following problem: image.png
. Its usefulness is very limited since it does not take factors such as blur kernels and noise into account.
In order to design a more effective CNN-based SISR model, more types of image degradation should be taken into account. A simple idea is to use the blur kernel k and noise level as the input of the network. Since the trade-off parameters can be integrated into the noise level, the CNN mapping function can be simplified as follows: image.png
Since most of the parameters in MAP estimation correspond to the image prior, and the image prior is not related to the image degradation process, a single CNN model has the modeling ability to handle different degradation types.
Through the above analysis, it can be concluded that the non-blind SISR should also take the blur kernel and noise level in the degradation model as the input of the network. However, the dimensions of the LR image, blur kernel and noise level are different, so they cannot be directly used as the input of CNN.
To this end, this paper proposes a dimension stretching strategy. Assuming that the size of the LR image is , first reduce the dimensionality of the vectorized blur kernel PCA, and then combine it with the noise level to obtain a t+1-dimensional vector v, and then stretch v into a tensor of dimension, we will this tensor It is called a degradation map (Degradation Maps), where all elements of the i-th map are .
At this point, we can merge the degradation map and LR image together as the input of CNN. In order to prove the effectiveness of this strategy, a fast and effective ESPCN super-resolution network structure framework is selected. It is worth noting that in order to speed up the convergence speed of the training process, and considering that the LR image contains Gaussian noise, the Batch Normalization layer is added to the network.
Figure 2 presents the structural framework of the proposed Super-Resolution Network (SRMD for short).
3. experiment
In the training phase, SRMD uses isotropic and anisotropic Gaussian blur kernels, Gaussian white noise with noise levels between [0, 75], and a bicubic downsampling operator. It should be pointed out that SRMD can be extended to other downsampling operators and even other degradation models.
In the test phase, SRMD compared the PSNR and SSIM results of different methods under the same bicubic downsampling degradation (as shown in Table 1). It can be seen that although SRMD is used to deal with various types of degradation, it still achieves good results under bicubic downsampling degradation. It should be pointed out that SRMD also has a great advantage in speed. It only takes 0.084 seconds to process a 512×512 LR image on the Titan Xp GPU, which is half the time it takes for VDSR super-resolution to double.
Table 2 shows the comparison of PSNR and SSIM results under different degradation types. It can be seen that SRMD has also achieved good results. Figure 4 illustrates that SRMD can set a non-uniform degradation map, and then can process LR images with non-uniform degradation spaces. Finally, Figure 5 shows the comparison of visual effects of different methods on real images. It can be seen that the HR image restored by SRMD is significantly better than other methods in visual effect.
Table 1: Comparison of PSNR and SSIM results of different methods under bicubic downsampling degradation (where SRMDNF represents the model trained without considering noise).
Table 2: Comparison of PSNR and SSIM results of different methods under different degradation types.
Figure 4: Illustrates that SRMD can handle situations where the degradation space is not uniform. (a) Spatial distribution of noise level and blur kernel width; (b) LR image (enlarged by nearest neighbor interpolation); (c) restored HR image (enlarged twice).
Figure 5: Comparison of the visual effects of different methods on the SISR classic test image "Chip" with four times the super-resolution.
4. in conclusion
To sum up, the main contributions of this paper are in three aspects:
*Proposed a simple, effective, and scalable super-resolution model, which can not only handle the bicubic downsampling degradation model, but also can handle multiple and even degradation types with uneven degradation spaces, providing a practical application for SISR solution.
* A simple and effective dimension stretching strategy is proposed to enable convolutional neural networks to process inputs of different dimensions, and this strategy can be extended to other applications.
* Experiments show that the super-resolution network model trained with synthetic images can effectively deal with the complex degradation types of real images.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00