一种深度级联网络结构的单帧超分辨重建算法

王飞, 王伟, 邱智亮. 一种深度级联网络结构的单帧超分辨重建算法[J]. 光电工程, 2018, 45(7): 170729. doi: 10.12086/oee.2018.170729
引用本文: 王飞, 王伟, 邱智亮. 一种深度级联网络结构的单帧超分辨重建算法[J]. 光电工程, 2018, 45(7): 170729. doi: 10.12086/oee.2018.170729
Wang Fei, Wang Wei, Qiu Zhiliang. A single super-resolution method via deep cascade network[J]. Opto-Electronic Engineering, 2018, 45(7): 170729. doi: 10.12086/oee.2018.170729
Citation: Wang Fei, Wang Wei, Qiu Zhiliang. A single super-resolution method via deep cascade network[J]. Opto-Electronic Engineering, 2018, 45(7): 170729. doi: 10.12086/oee.2018.170729

一种深度级联网络结构的单帧超分辨重建算法

详细信息
    作者简介:
    通讯作者: 王飞, E-mail: 290727048@qq.com
  • 中图分类号: TP391.41;TP18

A single super-resolution method via deep cascade network

More Information
  • 利用深度学习进行超分辨重建已经获得了极大的成功,但是目前绝大多数网络结构依然存在训练以及重建速度较慢,一个模型仅能重建一个尺度以及重建图像过于平滑等问题。针对这些问题,本文设计了一种级联的网络结构(DCN)来逐级对图像进行重建。使用L2和感知损失函数共同优化网络,在每一级的共同作用下得到了最终高质量的重建图像。此外,本文的方法可以同时重建多个尺度,比如4×的模型可以重建1.5×,2×,2.5×,3×,3.5×,4×。在几个常用数据集上的实验表明,该方法在准确性和视觉效果均优于现有的方法。

  • Overview: Recovering high resolution (HR) image from its low resolution (LR) image is an important issue in the field of digital image processing and other vision tasks. Recently, Dong et al. found that a convolutional neural network (CNN) can be used to learn end-to-end mapping from LR to HR. The network is expanded into many different forms, using sub-pixel convolutional network, very deep convolutional network, and recursive residual network. Although these models have achieved the desired results, the issues still exist some problems as described as following. First, most methods use up-sampling operators, such as bi-cubic interpolation, to upscale the input image to the bigger size. This pre-processing adds considerable unnecessary computations and often results in visible reconstruction artifacts. To solve this problem, there are several algorithms such as ESPCN using sub-pixels and FSRCNN with transposed convolution. However, the network structures of these methods are extremely too simple to lean complex and detailed mappings. Second, most existing methods use only L2 to optimize the network, which will result in an excessively smooth image less suitable for human vision. Third, those methods cannot reconstruct more than one scale, which means a model is only for one scale, and this will increase the extra-works of training for the other scales, especially for large-scale training.

    To address these defects, we propose a deep cascaded network (DCN). DCN is a cascade structure, and it takes an LR image as input and predicts a residual image in each scale. The predicted residual for each scale is used to efficiently reconstruct the HR image through up-sampling and adding operations. We train the DCN with L2 and perceptual loss function to obtain a robust image.

    Our approach differs from existing CNN-based methods in the following aspects:

    1) Multiple scales with cascade layers. Our network has a cascade structure and generates multiple intermediate SR predictions in feed-forward process. This progressive reconstruction can get more accurate results. Our 4× model can obtain 1.5×, 2×, 2.5×, 3×, 3.5× reconstructed images.

    2) Optimize network with L2 and perceptual loss function. Using L2 can get more accurate pixel-level reconstruction and using perceptual loss function may be closer to human vision.

    3) Features extraction on LR image. Our method does not require traditional interpolation methods to up-sample images as a pre-processing, thus greatly reducing the computational complexity.

    Extensive experiments on several large benchmark datasets show that the proposed approach performs better than existing methods in terms of accuracy and visual improvement.

  • 加载中
  • 图 1  本文级联的网络结构。浅蓝色部分为特征提取部分,浅绿色为图像重建,网络每一级都包含d个卷积层conv和一个用于上采样的反卷积层deconv

    Figure 1.  Deep cascaded network architectures. The light green part is the image reconstruction of each cascade, while the light blue part is the part of the feature extraction and for each cascade. This network includes several convolution layers and one transposed convolutions layer (upsampling) in each cascade

    图 2  不同网络参数对重建效果和耗时的影响(Set5数据集平均值)

    Figure 2.  The effects of different network parameters on the reconstruction effect and time consuming

    图 3  Set5数据集中的“Baby”图片4×重建,DCN睫毛部分重建效果比其他方法稍好一些,DCN-L2在PSNR和SSIM数据指标上均低于DCN(使用L2和感知损失优化)

    Figure 3.  "Baby" from Set5 with an upscaling factor 4, DCN can reconstruct eyelashes better, DCN- L2 is lower than DCN on both PSNR and SSIM (using L2 and perceptual loss optimization)

    图 4  URBAN100数据集中的“img-092”图片4×模型中3×重建,DCN-L2和DCN在第十层以后的纹理处更接近真实图像,DCN(使用L2和感知损失优化)在视觉效果和PSNR/SSIM指标均高于其他方法

    Figure 4.  "img-092" image from URBAN100 with an upscaling factor 3, DCN model is 3× reconstruction in 4× model, only DCN and DCN-L2 can correctly recover sharp lines, DCN performance is better

    图 5  监控视频中单帧图像4×重建,从视觉上本文的DCN在车辆后玻璃窗部分重建效果更真实

    Figure 5.  4× reconstruction of single frame image in video surveillance, the visual effect of the DCN in the rear window part of the vehicle is more authentic

    图 6  本文4×网络在1.5×, 2×, 2.5×, 3×, 3.5×, 4×重建效果对比。第1行为使用VDSR的4×模型重建的各尺度效果,第2行为本文4×模型重建各尺度的效果

    Figure 6.  The comparison between our model and VDSR at different scales(1.5×, 2×, 2.5×, 3×, 3.5×, 4×). The first line is the result of VDSR reconstruction. The second line is reconstructed by our model

    表 1  每一层的卷积核参数。其中,conv列中,d < 6?5×5:3×3表示在层数小于6时,卷积核大小是5×5,其他都是3×3;在padding列中,d < 4?0:1表示在层数小于4时,不填充0,其他填充1个0;Size表示每一级后训练图像的分辨率

    Table 1.  Net parameters for each cascade. In the conv column, d < 6?5×5:3×3 means if d < 6, the kernel size is 5×5, otherwise is 3×3. In the padding column, d < 4?0:1 means if d < 4, we do not pad zero, otherwise pad one zero. The size in the table means the output image size for the cascade

    Model Cascade Size Conv Deconv Conv stride Deconv stride Padding
    1.5× 38 3×3 3×3 1 2 d < 4?0:1
    52 3×3 3×3 1 2 d < 7?0:1
    1.5× 38 3×3 3×3 1 2 d < 4?0:1
    52 3×3 3×3 1 2 d < 7?0:1
    2.5× 64 3×3 3×3 1 2 d < 11?0:1
    76 d < 6?5×5:3×3 3×3 1 2 d < 9?0:1
    3.5× 88 d < 7?5×5:3×3 3×3 1 2 d < 11?0:1
    100 d < 8?5×5:3×3 3×3 1 2 0
    下载: 导出CSV

    表 2  几个数据集上的超分辨重建算法比较:分别比较2×以及4×模型的P重建PSNR/SSIM的均值

    Table 2.  Quantitative evaluation of state-of-the-art SR algorithms: average PSNR/SSIM for scale factors 2×, 4×

    Scale Method SET5 SET14 BSD100 URBAN100
    PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM
    Bicubic 33.65/0.930 30.34/0.870 29.56/0.844 26.88/0.841
    A+[16] 36.54/0.964 32.40/0.906 31.22/0.887 29.23/0.894
    SRCNN[8] 36.65/0.954 32.29/0.903 31.36/0.888 29.52/0.895
    FSRCNN[17] 36.99/0.955 32.73/0.909 31.51/0.891 29.87/0.901
    DRCN[12] 37.63/0.959 32.98/0.913 31.85/0.894 30.76/0.913
    VDSR[10] 37.53/0.958 32.97/0.913 31.90/0.896 30.77/0.914
    DRRN[11] 37.74/0.959 33.23/0.913 32.05/0.897 31.23/0.918
    DCN(2×) 37.79/0.961 33.31/0.914 32.10/0.899 31.53/0.916
    DCN-L2(2×) 37.48/0.958 32.99/0.910 31.94/0.894 31.04/0.913
    DCN(4×) 37.78/0.964 33.33/0.908 32.34/0.897 31.44/0.918
    DCN-L2(4×) 37.51/0.956 32.89/0.913 32.16/0.896 31.28/0.914
    Bicubic 30.39/0.868 27.55/0.774 27.21/0.739 24.46/0.735
    A+[16] 32.58/0.909 29.13/0.819 28.29/0.784 26.03/0.797
    SRCNN[8] 32.75/0.909 29.28/0.821 28.41/0.786 26.24/0.799
    FSRCNN[17] 32.63/0.909 29.43/0.824 28.60/0.814 26.86/0.818
    DRCN[12] 33.82/0.923 29.76/0.831 28.80/0.796 27.15/0.828
    VDSR[10] 33.66/0.921 29.77/0.831 28.82/0.798 27.14/0.828
    DRRN[11] 34.03/0.924 29.96/0.835 28.95/0.800 27.53/0.838
    DCN(4×) 34.06/0.928 30.02/0.833 29.03/0.813 27.61/0.840
    DCN-L2(4×) 34.03/0.923 29.99/0.831 28.95/0.796 27.59/0.831
    Bicubic 28.42/0.810 26.10/0.704 25.96/0.669 23.15/0.659
    A+[16] 30.30/0.859 27.43/0.752 26.82/0.710 24.34/0.720
    SRCNN[8] 30.49/0.862 27.61/0.754 26.91/0.712 24.53/0.724
    FSRCNN[17] 30.71/0.865 27.70/0.756 26.97/0.714 24.61/0.727
    DRCN[12] 31.53/0.884 28.04/0.770 27.24/0.724 25.14/0.752
    VDSR[10] 31.35/0.882 28.03/0.770 27.29/0.726 25.18/0.753
    DRRN[11] 31.68/0.889 28.21/0.772 27.38/0.728 25.44/0.764
    DCN(4×) 31.71/0.891 28.31/0.774 27.43/0.732 25.61/0.758
    DCN-L2(4×) 31.69/0.884 28.26/0.770 27.38/0.726 25.44/0.758
    下载: 导出CSV
  • [1]

    Schulter S, Leistner C, Bischof H. Fast and accurate image upscaling with super-resolution forests[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3791-3799.

    [2]

    Bevilacqua M, Roumy A, Guillemot C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C]//British Machine Vision Conference, 2012.

    [3]

    Chang H, Yeung D Y, Xiong Y M. Super-resolution through neighbor embedding[C]//Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004: I.

    [4]

    Timofte R, De V, Van Gool L. Anchored neighborhood regression for fast example-based super-resolution[C]//IEEE International Conference on Computer Vision, 2013: 1920-1927.

    [5]

    吴从中, 胡长胜, 张明君, 等.有监督多类字典学习的单幅图像超分辨率重建[J].光电工程, 2016, 43(11): 69-75. doi: 10.3969/j.issn.1003-501X.2016.11.011

    Wu C Z, Hu C S, Zhang M J, et al. Single image super-resolution reconstruction via supervised multi-dictionary learning[J]. Opto-Electronic Engineering, 2016, 43(11): 69-75. doi: 10.3969/j.issn.1003-501X.2016.11.011

    [6]

    詹曙, 方琪.边缘增强的多字典学习图像超分辨率重建算法[J].光电工程, 2016, 43(4): 40-47. http://www.cnki.com.cn/Article/CJFDTotal-GDGC201604008.htm

    Zhan S, Fang Q. Image super-resolution based on edge-enhancement and multi-dictionary learning[J]. Opto-Electronic Engineering, 2016, 43(4): 40-47. http://www.cnki.com.cn/Article/CJFDTotal-GDGC201604008.htm

    [7]

    汪荣贵, 汪庆辉, 杨娟, 等.融合特征分类和独立字典训练的超分辨率重建[J].光电工程, 2018, 45(1): 170542. doi: 10.12086/oee.2018.170542 http://www.oejournal.org/J/OEE/Article/Details/A180213000010/CN

    Wang R G, Wang Q H, Yang J, et al. Image super-resolution reconstruction by fusing feature classification and independent dictionary training[J]. Opto-Electronic Engineering, 2018, 45(1), 170542 doi: 10.12086/oee.2018.170542 http://www.oejournal.org/J/OEE/Article/Details/A180213000010/CN

    [8]

    Dong C, Loy C C, He K M, et al. Learning a deep convolutional network for image super-resolution[C]//Computer Vision ECCV 2014. Springer International Publishing, 2014: 184-199.

    [9]

    Dong C, Loy C C, He K M, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2): 295-307. doi: 10.1109/TPAMI.2015.2439281

    [10]

    Shi W, Caballero J, Huszar F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]// Computer Vision and Pattern Recognition. IEEE, 2016: 1874-1883.

    [11]

    Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks[C]//Computer Vision and Pattern Recognition. IEEE, 2016: 1646-1654.

    [12]

    Tai Y, Yang J, Liu X M. Image super-resolution via deep recursive residual network[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2790-2798.

    [13]

    Kim J, Lee J K, Lee K M. Deeply-recursive convolutional network for image super-resolution[C]//Computer Vision and Pattern Recognition. IEEE, 2016: 1637-1645.

    [14]

    Dong C, Loy C C, Tang X O. Accelerating the super-resolution convolutional neural network[C]//Computer Vision ECCV, 2016: 391-407.

    [15]

    Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution[C]//Computer Vision-ECCV 2016, 2016, 9906: 694-711.

    [16]

    Wang L, Guo S, Huang W, et al. Places205-VGGNet models for scene recognition[EB/OL]. https://arxiv.org/abs/1508.01667.

    [17]

    Yang J, Wright J, Huang T S, et al. Image super-resolution via sparse representation[J]. IEEE Transactions on Image Processing, 2010, 19(11): 2861-2873. doi: 10.1109/TIP.2010.2050625

    [18]

    Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings 8th IEEE International Conference on Computer Vision, 2001, 2: 416-423.

    [19]

    Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations[C]//International Conference on Curves and Surfaces, 2010, 6920: 711-730.

    [20]

    Huang J B, Singh A, Ahuja N. Single image super-resolution from transformed self-exemplars[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 5197-5206.

    [21]

    Martin D, Fowlkes C, Tal D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//Proceedings 8th IEEE International Conference on Computer Vision, 2001, 2: 416-423.

    [22]

    Jia Y Q, Shelhamer E. Caffe: Convolutional Architecture for fast feature embedding[EB/OL]. https: //arxiv. org/abs/1408. 5093.

    [23]

    Timofte R, De Smet V, Van Gool L. A+: Adjusted anchored neighborhood regression for fast super-resolution[C]// Cremers D, Reid I, Saito H, et al. Computer Vision--ACCV 2014, 2014, 9006: 111-126.

    [24]

    Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. doi: 10.1109/TIP.2003.819861

  • 加载中

(6)

(2)

计量
  • 文章访问数:  7073
  • PDF下载数:  3540
  • 施引文献:  0
出版历程
收稿日期:  2017-10-30
修回日期:  2018-04-11
刊出日期:  2018-07-01

目录

/

返回文章
返回