Citation: |
|
[1] | Hou Y L, Song Y Y, Hao X L, et al. Multispectral pedestrian detection based on deep convolutional neural networks[C]//2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), 2018. |
[2] | 朱大炜.基于深度学习的红外图像飞机目标检测方法[D].西安: 西安电子科技大学, 2018. Zhu D W. Infrared image plane target detection method based on deep learning[D]. Xi'an: Xidian University, 2018. |
[3] | Herrmann C, Ruf M, Beyerer J. CNN-based thermal infrared person detection by domain adaptation[J]. Proceedings of SPIE, 2018, 10643: 1064308. |
[4] | 侯志强, 刘晓义, 余旺盛, 等.基于双阈值-非极大值抑制的Faster R-CNN改进算法[J].光电工程, 2019, 46(12): 190159. doi: 10.12086/oee.2019.190159 Hou Z Q, Liu X Y, Yu W S, et al. Improved algorithm of faster R-CNN based on double threshold-non-maximum suppression[J]. Opto-Electronic Engineering, 2019, 46(12): 190159. doi: 10.12086/oee.2019.190159 |
[5] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016, 9905: 21–37. |
[6] | Fu C Y, Liu W, Ranga A, et al. DSSD: deconvolutional single shot detector[Z]. arXiv: 1701.06659[cs.CV], 2017. |
[7] | Redmon J, Farhadi A. YOLOv3: an incremental improvement[Z]. arXiv: 1804.02767[cs.CV], 2018. |
[8] | Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[Z]. arXiv: 2004.10934[cs.CV], 2020. |
[9] | 金瑶, 张锐, 尹东.城市道路视频中小像素目标检测[J].光电工程, 2019, 46(9): 190053. doi: 10.12086/oee.2019.190053 Jin Y, Zhang R, Yin D. Object detection for small pixel in urban roads videos[J]. Opto-Electronic Engineering, 2019, 46(9): 190053. doi: 10.12086/oee.2019.190053 |
[10] | Li Z M, Peng C, Yu G, et al. DetNet: a backbone network for object detection[Z]. arXiv: 1804.06215[cs.CV], 2018. |
[11] | Liu S T, Huang D, Wang Y H. Receptive field block net for accurate and fast object detection[Z]. arXiv: 1711.07767[cs.CV], 2017. |
[12] | 赵春梅, 陈忠碧, 张建林.基于深度学习的飞机目标跟踪应用研究[J].光电工程, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261 Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electronic Engineering, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261 |
[13] | 石超, 陈恩庆, 齐林.红外视频中的舰船检测[J].光电工程, 2018, 45(6): 170748. doi: 10.12086/oee.2018.170748 Shi C, Chen E Q, Qi L. Ship detection from infrared video[J]. Opto-Electronic Engineering, 2018, 45(6): 170748. doi: 10.12086/oee.2018.170748 |
[14] | Yu J H, Jiang Y N, Wang Z Y, et al. UnitBox: An advanced object detection network[C]//Proceedings of the 24th ACM International Conference on Multimedia, 2016. |
[15] | Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. |
Overview: In recent years, with the continuous development of computer vision, the ability of target detection based on deep learning has been significantly improved. However, most of the images used by mainstream target detection networks are RGB images, and there are few studies on the direction of infrared target detection. Moreover, the mainstream target detection network has a prominent target detection capability in high quality RGB images, but the target detection performance in infrared images with poor resolution is significantly reduced. Compared with infrared images, visible images have higher imaging resolution and rich target detail information. However, under certain weather conditions, the visible images cannot be obtained. Infrared imaging technology has the characteristics of long range, strong anti-interference ability, high measurement accuracy, not affected by weather, able to work day and night, and strong ability to penetrate smoke. Therefore, infrared imaging technology has been widely used once it was proposed. The demand for infrared target detection is also urgent.
In order to improve the performance of infrared target detection in complex scenes, the following measures are adopted in this paper: First, referring to the field adaptive method, appropriate infrared image preprocessing means are adopted to make the infrared image closer to the RGB image, so as to further improve the detection accuracy by applying the mainstream target detection network. Secondly, mean square error (MSE), a loss function, regards the coordinate value of each point of BBox as an independent variable, which does not consider the integrity of the target frame, and ln-norms is sensitive to the scale of the object, so the algorithm is based on the single-stage target detection network YOLOv3 and replaces the original MSE loss function with GIOU loss function. It is verified by experiments that the detection accuracy on FLIR, an open infrared data set, is significantly improved, and the problem of inaccurate location in the original network is effectively improved. Thirdly, in view of the problem of large span of target size in the FLIR data set, the SPP module is added to enrich the expression ability of feature map and expand the receptive field of feature map by referring to the idea of space pyramid. The experimental results show that the network detection error rate decreases after the addition of SPP module, and after overcoming the original deficiency of the YOLOv3, the target accuracy of detection can be further improved compared with the modification of GIOU loss function only.
Different infrared images (per line) from the FLIR dataset. (a) Original image; (b) Inversion; (c) Histogram equalization; (d) Denoising + image sharpening
Modified YOLOv3 network structure
SPP module
Algorithm flow chart
(a) The results of detection speed and accuracy of all categories of different networks; (b) The results of vehicle detection speed and accuracy of different networks; (c) The results of human detection speed and accuracy of different networks; (d) The results of bicycle detection speed and accuracy on different networks
(a) The network detection result of YOLOv3; (b) Ground truth; (c) The YOLOv3 network detection results using GIOU loss function; (d) Detection results for YOLOv3 network that use GIOU loss function and SPP modules