城市道路视频中小像素目标检测

金瑶; 张锐; 尹东

doi:10.12086/oee.2019.190053

城市道路视频中小像素目标检测

- 1.
  中国科学技术大学信息科学技术学院，安徽合肥 230027
- 2.
  中国科学院电磁空间信息重点实验室，安徽合肥 230027
基金项目:
2018年度安徽省重点研究和开发计划项目(1804a09020049)

详细信息

作者简介:
金瑶(1995-)，女，硕士研究生，主要从事计算机视觉的研究。E-mail：joye@mail.ustc.edu.cn

通讯作者: 尹东(1965-)，男，副教授，主要从事图像处理的研究。E-mail：yindong@ustc.edu.cn

中图分类号: TB872; TP391.4

收稿日期: 2019-01-30

修回日期: 2019-04-08

刊出日期: 2019-09-30

Object detection for small pixel in urban roads videos

- 1.
  College of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China
- 2.
  Key Laboratory of Electromagnetic Space Information, Chinese Academy of Sciences, Hefei, Anhui 230027, China
Fund Project: Supported by 2018 Anhui Key Research and Development Plan Project (1804a09020049)

More Information

Corresponding author: Yin Dong, E-mail:yindong@ustc.edu.cn

Received Date 30 January 2019

Revised Date 08 April 2019

Published Date 30 September 2019

摘要

摘要:
视频图像中的小像素目标难以检测。针对城市道路视频中的小像素目标，本文提出了一种改进YOLOv3的卷积神经网络Road_Net检测方法。首先，基于改进的YOLOv3，设计了一种新的卷积神经网络Road_Net；其次，针对小像素目标检测更依赖于浅层特征，采用了4个尺度检测方法。最后，结合改进的M-Softer-NMS算法来进一步提高图像中目标的检测精度。为了验证所提出算法的有效性，本文收集并标注了用于城市道路小像素目标物体检测的数据集Road-garbage Dataset，实验结果表明，本文算法能有效地检测出诸如纸屑、石块等在视频中相对于路面的较小像素目标。
- 视频图像 /
- 小像素目标 /
- 卷积神经网络
Abstract:
Small pixel targets in video images are difficult to detect. Aiming at the small pixel target in urban road video, this paper proposed a novel detection method named Road_Net based on the YOLOv3 convolutional neural network. Firstly, based on the improved YOLOv3, a new convolutional neural network Road_Net is designed. Secondly, for small pixel target detection depending on shallow level features, a detection method of 4 scales is adopted. Finally, combined with the improved M-Softer-NMS algorithm, it gets higher detection accuracy of the target in the image. In order to verify the effectiveness of the proposed algorithm, this paper collects and labels the data set named Road-garbage Dataset for small pixel target object detection on urban roads. The experimental results show that the algorithm can effectively detect objects such as paper scraps and stones, which are smaller pixel targets in the video relative to the road surface.
- video image /
- smaller pixel object /
- convolutional neural network

Overview

Overview: Small pixel target detection is a kind of difficult program. Existing object detection benchmarks and methods mainly focus on standard detection task. However, these ways cannot get good performance on low-pixel ratio object detection, which has a few pixel in high resolution images. And the early target detection frameworks such as R-CNN, YOLO series are not very good for small pixel target detection. In order to solve this problem, this paper proposes an improved YOLOv3 network and the algorithm using M-Softer-NMS to improve the detection ability of small targets. Firstly, Road_Net convolutional neural network is proposed. YOLOv3's Darknet53 network is too complicated and redundant. What's more, too many parameters will bring difficulty in training, increase the requirements on the dataset, and reduce the speed of detection, which will not achieve better real-time performance. Accuracy and real-time performance are challenging in small object detection on urban roads. Therefore, we proposed a convolutional neural network Road_Net with relatively low computational complexity as a feature extraction network. Secondly, a detection method of 4 scales is used to more fully use shallow level features. In view of the fact that the targets in this context are mostly small pixel targets, the original three scale detections are extended to four scale detections, and the larger feature maps are assigned to the smaller pixel targets with more accurate anchor frames. Finally, M-Softer-NMS algorithm is used to further improve the detection accuracy of the target in the image. Softer-NMS is further improved after Soft-NMS. A new loss function (KL Loss) for bounding box regression is proposed to learn the bounding box transformation and positional reliability at the same time. Combined with the characteristics of small pixel targets in this paper, the M-softer-NMS algorithm for this paper is proposed based on softer-NMS. In order to verify the effectiveness of the algorithm, we collected and labeled the data set named Road-garbage Dataset for the detection of small pixel target objects on the road. The Dataset is based on several main roads in a certain city and selects 1200 different main roads in different regions. The experimental results show that the accuracy, recall rate and AP can reach 95.29%, 91.12% and 82.41% respectively, while real-time detection is 57.9 f/s. In the next work, we will continue to improve the network and optimize the algorithm for higher accuracy and lower time cost, and continue to capture and use our more realistic scene images to expand our dataset for better application.

HTML全文

图 1 目标框位置预测

Figure 1. Predicting target box position

下载: 全尺寸图片幻灯片

图 2 Road_Net架构图

Figure 2. Road_Net network architecture diagram

下载: 全尺寸图片幻灯片

图 3 多尺度检测

Figure 3. Multi-scale detection

下载: 全尺寸图片幻灯片

图 4 测试图像和检测结果

Figure 4. Testing images and detection results

下载: 全尺寸图片幻灯片

图 5 异常检测示例

Figure 5. Examples for anomaly detection

下载: 全尺寸图片幻灯片

表算法1：M-Softer-NMS

Input: B={b₁, .., b_N }, S={s₁, .., s_N }, C={σ₁², .., σ²_N }, N_t
Output : D，S
1	Begin:
2	D ← {}	//初始化D
3	while B!= empty do
4	m←argmax S	//取出最高的得分
5	M←b_m	//取出得分最高对应的检测框
6	D←D∪M	//更新D
7	B←B─M	//更新B
8	for b_i in B do
9	idx←IOU(M, B)≥N_t	//取出IOU值大于阈值N_t的下标
10	M ← B[idx]/C[idx]/sum(1/C[idx])	//按方差的倒数加权去和得到新的检测框
11	end for
12	endwhile
13	return D, S	//返回检测框和对应的分数
14	end

下载: 导出CSV

表 1 5种算法的性能对比

Table 1. Performance comparison of five algorithms

Method	P/%	R/%	AP/%	速度/(f/s)
Faster R-CNN	89.63	70.5	70.65	21.6
YOLOv2	86.45	63.18	71.53	44.2
YOLOv3	92.56	78.5	75.64	33.2
Road_Net	94.18	85.71	79.97	58.7
Road_Net +M-Softer-NMS	95.29	91.12	82.41	57.9

下载: 导出CSV

参考文献(22)

[1]	Lowe D G. Object recognition from local scale-invariant features[C]//The Proceedings of the 7th IEEE International Conference on Computer Vision, 1999, 2: 1150–1157.http://www.researchgate.net/publication/2373439_Object_Recognition_from_Local_Scale-Invariant_Features/
[2]	Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110. doi: 10.1023/B:VISI.0000029664.99615.94
[3]	Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, 1: 886–893.
[4]	Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971–987. doi: 10.1109/TPAMI.2002.1017623
[5]	Cortes C, Vapnik V. Support-vector networks[J]. Machine Learning, 1995, 20(3): 273–297. http://d.old.wanfangdata.com.cn/Periodical/hwyhmb200803006
[6]	Ho T K. Random decision forests[C]//Proceedings of the 3rd International Conference on Document Analysis and Recognition, 1995, 1: 278–282.
[7]	罗振杰, 曾国强.基于改进MTI算法的视频图像空间目标检测[J].光电工程, 2018, 45(8): 180048. doi: 10.12086/oee.2018.180048 Luo Z J, Zeng G Q. Space objects detection in video satellite images using improved MTI algorithm[J]. Opto-Electronic Engineering, 2018, 45(8): 180048. doi: 10.12086/oee.2018.180048
[8]	樊香所, 徐智勇, 张建林.改进粒子滤波的弱小目标跟踪[J].光电工程, 2018, 45(8): 170569. CNKI:SUN:GDGC.0.2018-08-003 Fan X S, Xu Z Y, Zhang J L. Dim small target tracking based on improved particle filter[J]. Opto-Electronic Engineering, 2018, 45(8): 170569. CNKI:SUN:GDGC.0.2018-08-003
[9]	Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 815–823.
[10]	Wang X H, Gao L L, Wang P, et al. Two-stream 3-D convNet fusion for action recognition in videos with arbitrary size and length[J]. IEEE Transactions on Multimedia, 2018, 20(3): 634–644. doi: 10.1109/TMM.2017.2749159
[11]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587.http://www.researchgate.net/publication/258374356_Rich_feature_hierarchies_for_accurate_object_detection_and_semantic_segmentation/links/0301dd4e0cf23c5c592c85c9.pdf
[12]	Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448.
[13]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 91–99.http://www.tandfonline.com/servlet/linkout?suffix=CIT0014&dbid=8&doi=10.1080%2F2150704X.2018.1475770&key=27295650
[14]	Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 761–769.https://www.researchgate.net/publication/301876837_Training_Region-based_Object_Detectors_with_Online_Hard_Example_Mining
[15]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779–788.
[16]	Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2): 154–171. doi: 10.1007/s11263-013-0620-5
[17]	Zitnick C L, Dollár P. Edge boxes: locating object proposals from edges[C]//Proceedings of the 13th European Conference on Computer Vision, 2014: 391–405.
[18]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.
[19]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944.
[20]	戴伟聪, 金龙旭, 李国宁, 等.遥感图像中飞机的改进YOLOv3实时检测算法[J].光电工程, 2018, 45(12): 180350. doi: 10.12086/oee.2018.180350 Dai W C, Jin L X, Li G N, et al. Real-time airplane detection algorithm in remote-sensing images based on improved YOLOv3[J]. Opto-Electronic Engineering, 2018, 45(12): 180350. doi: 10.12086/oee.2018.180350
[21]	Bodla N, Singh B, Chellappa R, et al. Soft-NMS—improving object detection with one line of code[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 5562–5570.
[22]	He Y H, Zhang X Y, Savvides M, et al. Softer-NMS: rethinking bounding box regression for accurate object detection[J]. arXiv: 1809.08545v1[cs.CV], 2018.

施引文献

资源附件(0)

访问统计

点击扫一扫

图(5)

表(2)

计量

文章访问数: 8444
PDF下载数: 2590
施引文献: 0

城市道路视频中小像素目标检测

作者简介:
金瑶(1995-)，女，硕士研究生，主要从事计算机视觉的研究。E-mail：joye@mail.ustc.edu.cn

通讯作者: 尹东(1965-)，男，副教授，主要从事图像处理的研究。E-mail：yindong@ustc.edu.cn

Object detection for small pixel in urban roads videos

Corresponding author: Yin Dong, E-mail:yindong@ustc.edu.cn

计量

目录

作者须知

其他内容

条款和政策

城市道路视频中小像素目标检测

作者简介: 金瑶(1995-)，女，硕士研究生，主要从事计算机视觉的研究。E-mail：joye@mail.ustc.edu.cn

通讯作者: 尹东(1965-)，男，副教授，主要从事图像处理的研究。E-mail：yindong@ustc.edu.cn

Object detection for small pixel in urban roads videos

Corresponding author: Yin Dong, E-mail:yindong@ustc.edu.cn

计量

出版历程

目录

作者须知

其他内容

条款和政策

作者简介:
金瑶(1995-)，女，硕士研究生，主要从事计算机视觉的研究。E-mail：joye@mail.ustc.edu.cn