-
摘要:
利用深度学习进行超分辨重建已经获得了极大的成功,但是目前绝大多数网络结构依然存在训练以及重建速度较慢,一个模型仅能重建一个尺度以及重建图像过于平滑等问题。针对这些问题,本文设计了一种级联的网络结构(DCN)来逐级对图像进行重建。使用L2和感知损失函数共同优化网络,在每一级的共同作用下得到了最终高质量的重建图像。此外,本文的方法可以同时重建多个尺度,比如4×的模型可以重建1.5×,2×,2.5×,3×,3.5×,4×。在几个常用数据集上的实验表明,该方法在准确性和视觉效果均优于现有的方法。
Abstract:Convolutional neural networks have recently been shown to have the highest accuracy for single image super-resolution (SISR) reconstruction. Most of the network structures suffer from low training and reconstruction speed, and still have the problem that one model can only be rebuilt for a single scale. For these problems, a deep cascaded network (DCN) is designed to reconstruct the image step by step. L2 and the perception loss function are used to optimize the network together, and then a high quality reconstructed image will be obtained under the joint action of each cascade. In addition, our network can get reconstructions of different scales, such as 1.5×, 2×, 2.5×, 3×, 3.5× and 4×. Extensive experiments on several of the largest benchmark datasets demonstrate that the proposed approach performs better than existing methods in terms of accuracy and visual improvement.
-
Key words:
- deep learning /
- super-resolution /
- step by step /
- multi scale /
- perception loss function
-
Overview: Recovering high resolution (HR) image from its low resolution (LR) image is an important issue in the field of digital image processing and other vision tasks. Recently, Dong et al. found that a convolutional neural network (CNN) can be used to learn end-to-end mapping from LR to HR. The network is expanded into many different forms, using sub-pixel convolutional network, very deep convolutional network, and recursive residual network. Although these models have achieved the desired results, the issues still exist some problems as described as following. First, most methods use up-sampling operators, such as bi-cubic interpolation, to upscale the input image to the bigger size. This pre-processing adds considerable unnecessary computations and often results in visible reconstruction artifacts. To solve this problem, there are several algorithms such as ESPCN using sub-pixels and FSRCNN with transposed convolution. However, the network structures of these methods are extremely too simple to lean complex and detailed mappings. Second, most existing methods use only L2 to optimize the network, which will result in an excessively smooth image less suitable for human vision. Third, those methods cannot reconstruct more than one scale, which means a model is only for one scale, and this will increase the extra-works of training for the other scales, especially for large-scale training.
To address these defects, we propose a deep cascaded network (DCN). DCN is a cascade structure, and it takes an LR image as input and predicts a residual image in each scale. The predicted residual for each scale is used to efficiently reconstruct the HR image through up-sampling and adding operations. We train the DCN with L2 and perceptual loss function to obtain a robust image.
Our approach differs from existing CNN-based methods in the following aspects:
1) Multiple scales with cascade layers. Our network has a cascade structure and generates multiple intermediate SR predictions in feed-forward process. This progressive reconstruction can get more accurate results. Our 4× model can obtain 1.5×, 2×, 2.5×, 3×, 3.5× reconstructed images.
2) Optimize network with L2 and perceptual loss function. Using L2 can get more accurate pixel-level reconstruction and using perceptual loss function may be closer to human vision.
3) Features extraction on LR image. Our method does not require traditional interpolation methods to up-sample images as a pre-processing, thus greatly reducing the computational complexity.
Extensive experiments on several large benchmark datasets show that the proposed approach performs better than existing methods in terms of accuracy and visual improvement.
-
图 1 本文级联的网络结构。浅蓝色部分为特征提取部分,浅绿色为图像重建,网络每一级都包含d个卷积层conv和一个用于上采样的反卷积层deconv
Figure 1. Deep cascaded network architectures. The light green part is the image reconstruction of each cascade, while the light blue part is the part of the feature extraction and for each cascade. This network includes several convolution layers and one transposed convolutions layer (upsampling) in each cascade
表 1 每一层的卷积核参数。其中,conv列中,d < 6?5×5:3×3表示在层数小于6时,卷积核大小是5×5,其他都是3×3;在padding列中,d < 4?0:1表示在层数小于4时,不填充0,其他填充1个0;Size表示每一级后训练图像的分辨率
Table 1. Net parameters for each cascade. In the conv column, d < 6?5×5:3×3 means if d < 6, the kernel size is 5×5, otherwise is 3×3. In the padding column, d < 4?0:1 means if d < 4, we do not pad zero, otherwise pad one zero. The size in the table means the output image size for the cascade
Model Cascade Size Conv Deconv Conv stride Deconv stride Padding 2× 1.5× 38 3×3 3×3 1 2 d < 4?0:1 2× 52 3×3 3×3 1 2 d < 7?0:1 4× 1.5× 38 3×3 3×3 1 2 d < 4?0:1 2× 52 3×3 3×3 1 2 d < 7?0:1 2.5× 64 3×3 3×3 1 2 d < 11?0:1 3× 76 d < 6?5×5:3×3 3×3 1 2 d < 9?0:1 3.5× 88 d < 7?5×5:3×3 3×3 1 2 d < 11?0:1 4× 100 d < 8?5×5:3×3 3×3 1 2 0 表 2 几个数据集上的超分辨重建算法比较:分别比较2×以及4×模型的P重建PSNR/SSIM的均值
Table 2. Quantitative evaluation of state-of-the-art SR algorithms: average PSNR/SSIM for scale factors 2×, 4×
Scale Method SET5 SET14 BSD100 URBAN100 PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM 2× Bicubic 33.65/0.930 30.34/0.870 29.56/0.844 26.88/0.841 A+[16] 36.54/0.964 32.40/0.906 31.22/0.887 29.23/0.894 SRCNN[8] 36.65/0.954 32.29/0.903 31.36/0.888 29.52/0.895 FSRCNN[17] 36.99/0.955 32.73/0.909 31.51/0.891 29.87/0.901 DRCN[12] 37.63/0.959 32.98/0.913 31.85/0.894 30.76/0.913 VDSR[10] 37.53/0.958 32.97/0.913 31.90/0.896 30.77/0.914 DRRN[11] 37.74/0.959 33.23/0.913 32.05/0.897 31.23/0.918 DCN(2×) 37.79/0.961 33.31/0.914 32.10/0.899 31.53/0.916 DCN-L2(2×) 37.48/0.958 32.99/0.910 31.94/0.894 31.04/0.913 DCN(4×) 37.78/0.964 33.33/0.908 32.34/0.897 31.44/0.918 DCN-L2(4×) 37.51/0.956 32.89/0.913 32.16/0.896 31.28/0.914 3× Bicubic 30.39/0.868 27.55/0.774 27.21/0.739 24.46/0.735 A+[16] 32.58/0.909 29.13/0.819 28.29/0.784 26.03/0.797 SRCNN[8] 32.75/0.909 29.28/0.821 28.41/0.786 26.24/0.799 FSRCNN[17] 32.63/0.909 29.43/0.824 28.60/0.814 26.86/0.818 DRCN[12] 33.82/0.923 29.76/0.831 28.80/0.796 27.15/0.828 VDSR[10] 33.66/0.921 29.77/0.831 28.82/0.798 27.14/0.828 DRRN[11] 34.03/0.924 29.96/0.835 28.95/0.800 27.53/0.838 4× DCN(4×) 34.06/0.928 30.02/0.833 29.03/0.813 27.61/0.840 DCN-L2(4×) 34.03/0.923 29.99/0.831 28.95/0.796 27.59/0.831 Bicubic 28.42/0.810 26.10/0.704 25.96/0.669 23.15/0.659 A+[16] 30.30/0.859 27.43/0.752 26.82/0.710 24.34/0.720 SRCNN[8] 30.49/0.862 27.61/0.754 26.91/0.712 24.53/0.724 FSRCNN[17] 30.71/0.865 27.70/0.756 26.97/0.714 24.61/0.727 DRCN[12] 31.53/0.884 28.04/0.770 27.24/0.724 25.14/0.752 VDSR[10] 31.35/0.882 28.03/0.770 27.29/0.726 25.18/0.753 DRRN[11] 31.68/0.889 28.21/0.772 27.38/0.728 25.44/0.764 DCN(4×) 31.71/0.891 28.31/0.774 27.43/0.732 25.61/0.758 DCN-L2(4×) 31.69/0.884 28.26/0.770 27.38/0.726 25.44/0.758 -
[1] Schulter S, Leistner C, Bischof H. Fast and accurate image upscaling with super-resolution forests[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3791-3799.
[2] Bevilacqua M, Roumy A, Guillemot C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C]//British Machine Vision Conference, 2012.
[3] Chang H, Yeung D Y, Xiong Y M. Super-resolution through neighbor embedding[C]//Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004: I.
[4] Timofte R, De V, Van Gool L. Anchored neighborhood regression for fast example-based super-resolution[C]//IEEE International Conference on Computer Vision, 2013: 1920-1927.
[5] 吴从中, 胡长胜, 张明君, 等.有监督多类字典学习的单幅图像超分辨率重建[J].光电工程, 2016, 43(11): 69-75. doi: 10.3969/j.issn.1003-501X.2016.11.011
Wu C Z, Hu C S, Zhang M J, et al. Single image super-resolution reconstruction via supervised multi-dictionary learning[J]. Opto-Electronic Engineering, 2016, 43(11): 69-75. doi: 10.3969/j.issn.1003-501X.2016.11.011
[6] 詹曙, 方琪.边缘增强的多字典学习图像超分辨率重建算法[J].光电工程, 2016, 43(4): 40-47. http://www.cnki.com.cn/Article/CJFDTotal-GDGC201604008.htm
Zhan S, Fang Q. Image super-resolution based on edge-enhancement and multi-dictionary learning[J]. Opto-Electronic Engineering, 2016, 43(4): 40-47. http://www.cnki.com.cn/Article/CJFDTotal-GDGC201604008.htm
[7] 汪荣贵, 汪庆辉, 杨娟, 等.融合特征分类和独立字典训练的超分辨率重建[J].光电工程, 2018, 45(1): 170542. doi: 10.12086/oee.2018.170542 http://www.oejournal.org/J/OEE/Article/Details/A180213000010/CN
Wang R G, Wang Q H, Yang J, et al. Image super-resolution reconstruction by fusing feature classification and independent dictionary training[J]. Opto-Electronic Engineering, 2018, 45(1), 170542 doi: 10.12086/oee.2018.170542 http://www.oejournal.org/J/OEE/Article/Details/A180213000010/CN
[8] Dong C, Loy C C, He K M, et al. Learning a deep convolutional network for image super-resolution[C]//Computer Vision ECCV 2014. Springer International Publishing, 2014: 184-199.
[9] Dong C, Loy C C, He K M, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2): 295-307. doi: 10.1109/TPAMI.2015.2439281
[10] Shi W, Caballero J, Huszar F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]// Computer Vision and Pattern Recognition. IEEE, 2016: 1874-1883.
[11] Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]//Computer Vision and Pattern Recognition. IEEE, 2016: 1646-1654.
[12] Tai Y, Yang J, Liu X M. Image super-resolution via deep recursive residual network[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2790-2798.
[13] Kim J, Lee J K, Lee K M. Deeply-recursive convolutional network for image super-resolution[C]//Computer Vision and Pattern Recognition. IEEE, 2016: 1637-1645.
[14] Dong C, Loy C C, Tang X O. Accelerating the super-resolution convolutional neural network[C]//Computer Vision ECCV, 2016: 391-407.
[15] Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution[C]//Computer Vision-ECCV 2016, 2016, 9906: 694-711.
[16] Wang L, Guo S, Huang W, et al. Places205-VGGNet models for scene recognition[EB/OL]. https://arxiv.org/abs/1508.01667.
[17] Yang J, Wright J, Huang T S, et al. Image super-resolution via sparse representation[J]. IEEE Transactions on Image Processing, 2010, 19(11): 2861-2873. doi: 10.1109/TIP.2010.2050625
[18] Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings 8th IEEE International Conference on Computer Vision, 2001, 2: 416-423.
[19] Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations[C]//International Conference on Curves and Surfaces, 2010, 6920: 711-730.
[20] Huang J B, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 5197-5206.
[21] Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings 8th IEEE International Conference on Computer Vision, 2001, 2: 416-423.
[22] Jia Y Q, Shelhamer E. Caffe: Convolutional Architecture for fast feature embedding[EB/OL]. https: //arxiv. org/abs/1408. 5093.
[23] Timofte R, De Smet V, Van Gool L. A+: Adjusted anchored neighborhood regression for fast super-resolution[C]// Cremers D, Reid I, Saito H, et al. Computer Vision--ACCV 2014, 2014, 9006: 111-126.
[24] Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. doi: 10.1109/TIP.2003.819861