Hou Zhiqiang, Liu Xiaoyi, Yu Wangsheng, et al. Improved algorithm of Faster R-CNN based on double threshold-non-maximum suppression[J]. Opto-Electronic Engineering, 2019, 46(12): 190159. doi: 10.12086/oee.2019.190159
Citation: Hou Zhiqiang, Liu Xiaoyi, Yu Wangsheng, et al. Improved algorithm of Faster R-CNN based on double threshold-non-maximum suppression[J]. Opto-Electronic Engineering, 2019, 46(12): 190159. doi: 10.12086/oee.2019.190159

Improved algorithm of Faster R-CNN based on double threshold-non-maximum suppression

    Fund Project: Supported by National Natural Science Foundation of China (61703423, 61473309) and Xi'an University of Posts and Telecommunications Graduate Innovation Fund (CXJJ2017019)
More Information
  • According to the problems of target missed detection and repeated detection in the object detection algorithm, this paper proposes an improved Faster R-CNN algorithm based on dual threshold-non-maximum suppression. The algorithm first uses the deep convolutional network architecture to extract the multi-layer convolution features of the targets, and then proposes the dual threshold-non-maximum suppression (DT-NMS) algorithm in the RPN(region proposal network). The phase extracts the deep information of the target candidate regions, and finally uses the bilinear interpolation method to improve the nearest neighbor interpolation method in the original RoI pooling layer, so that the algorithm can more accurately locate the target on the detection dataset. The experimental results show that the DT-NMS algorithm effectively balances the relationship between the single-threshold algorithm and the target missed detection problem, and reduces the probability of repeated detection. Compared with the soft-NMS algorithm, the repeated detection rate of the DT-NMS algorithm in PASCAL VOC2007 is reduced by 2.4%, and the target error rate of multiple detection is reduced by 2%. Compared with the Faster R-CNN algorithm, the detection accuracy of this algorithm on the PASCAL VOC2007 is 74.7%, the performance is improved by 1.5%, and the performance on the MSCOCO dataset is improved by 1.4%. At the same time, the algorithm has a fast detection speed, reaching 16 FPS.
  • 加载中
  • [1] Borji A, Cheng M M, Jiang H Z, et al. Salient object detection: a benchmark[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5706-5722. doi: 10.1109/TIP.2015.2487833

    CrossRef Google Scholar

    [2] 罗海波, 许凌云, 惠斌, 等.基于深度学习的目标跟踪方法研究现状与展望[J].红外与激光工程, 2017, 46(5): 0502002.

    Google Scholar

    Luo H B, Xu L Y, Hui B, et al. Status and prospect of target tracking based on deep learning[J]. Infrared and Laser Engineering, 2017, 46(5): 0502002.

    Google Scholar

    [3] 侯志强, 韩崇昭.视觉跟踪技术综述[J].自动化学报, 2006, 32(4): 603-617.

    Google Scholar

    Hou Z Q, Han C Z. A survey of visual tracking[J]. Acta Automatica Sinica, 2006, 32(4): 603-617.

    Google Scholar

    [4] 辛鹏, 许悦雷, 唐红, 等.全卷积网络多层特征融合的飞机快速检测[J].光学学报, 2018, 38(3): 0315003.

    Google Scholar

    Xin P, Xu Y L, Tang H, et al. Fast airplane detection based on multi-layer feature fusion of fully convolutional networks[J]. Acta Optica Sinica, 2018, 38(3): 0315003.

    Google Scholar

    [5] 戴伟聪, 金龙旭, 李国宁, 等.遥感图像中飞机的改进YOLOv3实时检测算法[J].光电工程, 2018, 45(12): 180350. doi: 10.12086/oee.2018.180350

    CrossRef Google Scholar

    Dai W C, Jin L X, Li G N, et al. Real-time airplane detection algorithm in remote-sensing images based on improved YOLOv3[J]. Opto-Electronic Engineering, 2018, 45(12): 180350. doi: 10.12086/oee.2018.180350

    CrossRef Google Scholar

    [6] 王思明, 韩乐乐.复杂动态背景下的运动目标检测[J].光电工程, 2018, 45(10): 180008 doi: 10.12086/oee.2018.180008

    CrossRef Google Scholar

    Wang S M, Han L L. Moving object detection under complex dynamic background[J]. Opto-Electronic Engineering, 2018, 45(10): 180008. doi: 10.12086/oee.2018.180008

    CrossRef Google Scholar

    [7] 周炫余, 刘娟, 卢笑, 等.一种联合文本和图像信息的行人检测方法[J].电子学报, 2017, 45(1): 140-146. doi: 10.3969/j.issn.0372-2112.2017.01.020

    CrossRef Google Scholar

    Zhou X Y, Liu J, Lu X, et al. A method for pedestrian detection by combining textual and visual information[J]. Acta Electronica Sinica, 2017, 45(1): 140-146. doi: 10.3969/j.issn.0372-2112.2017.01.020

    CrossRef Google Scholar

    [8] 曹明伟, 余烨.基于多层背景模型的运动目标检测[J].电子学报, 2016, 44(9): 2126-2133. doi: 10.3969/j.issn.0372-2112.2016.09.016

    CrossRef Google Scholar

    Cao M W, Yu Y. Moving object detection based on multi-layer background model[J]. Acta Electronica Sinica, 2016, 44(9): 2126-2133. doi: 10.3969/j.issn.0372-2112.2016.09.016

    CrossRef Google Scholar

    [9] Zhang Z S, Qiao S Y, Xie C H, et al. Single-shot object detection with enriched semantics[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 5813-5821.

    Google Scholar

    [10] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779-788.

    Google Scholar

    [11] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[EB/OL]. (2016-12-29) [2019-05-28]. arXiv: 1512. 02325 v1. https://arxiv.org/abs/1512.02325v1.

    Google Scholar

    [12] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580-587.

    Google Scholar

    [13] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. doi: 10.1109/TPAMI.2015.2389824

    CrossRef Google Scholar

    [14] Girshick R. Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440-1448.

    Google Scholar

    [15] Ren S, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[EB/OL]. (2015-06-04)[2019-05-28]. arXiv: 1506.01497. https: //arxiv.org/abs/1506.01497?source=post_page.

    Google Scholar

    [16] Bodla N, Singh B, Chellappa R, et al. Soft-NMS - improving object detection with one line of code[C]//Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5562-5570.

    Google Scholar

    [17] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]// Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980-2988.

    Google Scholar

    [18] Wang X L, Shrivastava A, Gupta A. A-Fast-RCNN: hard positive generation via adversary for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3039-3048.

    Google Scholar

    [19] Kong T, Sun F C, Yao A B, et al. RON: reverse connection with objectness prior networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5244-5252.

    Google Scholar

    [20] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6517-6525.

    Google Scholar

    [21] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, doi: 10.1109/TPAMI.2018.2858826.

    CrossRef Google Scholar

    [22] Redmon J, Farhadi A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2019-05-28]. arXiv: 1804.02767. https://arxiv.org/abs/1804.02767.

    Google Scholar

    [23] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770-778.

    Google Scholar

    [24] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2261-2269.

    Google Scholar

  • Overview: The Faster R-CNN algorithm uses the non-maximum suppression algorithm for proposals filtering. It adopts the idea of “non-one or zero”, leaving only the candidate box with the highest score of the classification targets, which greatly increases the risk that the target will be missed when it is highly overlapping. Therefore, the “weight penalty” strategy is employed by the soft-NMS algorithm to solve this problem, which reduces the target missed detection to a certain extent. However, the test found that the use of the soft-NMS algorithm will greatly increase the number of proposals, resulting in a new problem that the same target is repeatedly detected and multiple detections have mis-targeted the targets, especially when there are multiple targets in the image and the degree of overlap of the targets is high. According to the problems of target missed detection and repeated detection in the object detection algorithm, this paper proposes an improved Faster R-CNN algorithm based on double threshold-non-maximum suppression. The algorithm first uses the VGG-Net-16 deep convolutional network architecture to extract the multi-layer convolution features of the targets, and then proposes the dual threshold-non-maximum suppression (DT-NMS) algorithm in the RPN (region proposal network). The stage extracts the deep information of the target candidate regions, and finally uses the bilinear interpolation method to improve the nearest neighbor interpolation method in the original RoI pooling layer, so that the algorithm can locate the targets more accurately on the detection dataset. In order to highlight the performance of the DT-NMS algorithm on the target repetitive detection problem, this paper first proposed the repeated detection rate and the object mis-distribution rate of multiple detections as the measurement index. By simply setting the threshold in the DT-NMS algorithm, the relationship between the single-threshold algorithm and the target misdetection problem is effectively balanced, and the probability that the same target is detected multiple times is reduced. The improved Faster R-CNN algorithm re-adjusts network training and parameters on the VGG-Net-16 network structure, and a lot of experimental verification on the PASCAL VOC data set has been implemented. The experimental results show that compared with the soft-NMS algorithm, the repeated detection rate of the proposed algorithm in PASCAL VOC2007 is reduced by 2.4%, and the target error rate of multiple detections is reduced by 2%, indicating that the improved algorithm solves the problem of target missed detection and repeated detection in the traditional algorithms. Compared with the Faster R-CNN algorithm, the detection accuracy of this algorithm on the PASCAL VOC2007 is 74.7%, and the performance is improved by 1.5%. At the same time, the algorithm has a fast detection speed, reaching 16 FPS.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(7)

Tables(7)

Article Metrics

Article views(10477) PDF downloads(3298) Cited by(0)

Access History

Other Articles By Authors

Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint