光电工程  2020, Vol. 47 Issue (1): 190161      DOI: 10.12086/oee.2020.190161

An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model
Lu Caiwu, Qi Fan, Ruan Shunling
School of Management, Xi'an University of Architecture and Technology, Xi'an, Shaanxi 710055, China
Abstract: In order to solve the problem that the current driving warning method cannot adapt to the unstructured road in open-pit mine, this paper proposes an early warning method that integrates target detection and obstacle distance threshold. Firstly, the original Mask R-CNN detection framework was improved according to the characteristics of open-pit mine obstacles, and dilated convolution was introduced into the framework network to expand the receptive field range without reducing the feature map to ensure the detection accuracy of larger targets. Then, a linear distance factor was constructed based on the target detection results to represent the depth information of obstacles in the input image, and an SVM warning model was established. Finally, in order to ensure the generalization ability of the warning model, transfer learning method was adopted to carry out pre-training of the network in COCO data set, and both the C5 stage and detection layer were trained in the data collected in the field. The experimental results show that the accuracy and recall of the proposed method reach 98.47% and 97.56% in the field data detection, respectively, and the manually designed linear distance factor has a good adaptability to the SVM warning model.
Keywords: obstacle warning    target detection    distance threshold model    dilated convolution    transfer learning

1 引言

2 目标检测框架 2.1 Mask R-CNN

 图 1 改进Mask R-CNN框架 Fig. 1 Improved mask R-CNN framework

 $\left\{ {\begin{array}{*{20}{l}} {{{P'}_i} = {\rm{sum(upsample(}}{{P'}_{i{\rm{ + 1}}}}{\rm{)}}, {\rm{conv(}}C))} \\ {{P_{\rm{6}}} = {\rm{maxpooling}}({P_{\rm{5}}}){\rm{ }}} \\ {{P_i} = {\rm{conv}}({{P'}_i}{\rm{ }}){\rm{ }}} \end{array}} \right.,$ (1)

 $\left\{ {\begin{array}{*{20}{l}} {x = (1 + \Delta x) \cdot x} \\ {y = (1 + \Delta y) \cdot y} \\ {w = \exp (\Delta w) \cdot w} \\ {h = \exp (\Delta h) \cdot h} \end{array}} \right.。$ (2)

2.2 Block单元改进

 图 2 两种卷积操作。 Fig. 2 Two convolution operations. (a)常规卷积；(b)扩展率为2的扩展卷积 (a) A conventional convolution; (b) An dilated convolution with

 图 3 C5阶段引入空洞卷积 Fig. 3 Empty convolution is introduced in C5
3 预警模型

 $\left\{ {\begin{array}{*{20}{l}} {{w_{{\rm{anchor}}}} = {{w'}_{{\rm{anchor}}}}/w} \\ {{h_{{\rm{anchor}}}} = {{h'}_{{\rm{anchor}}}}/h} \\ {{s_{{\rm{anchor}}}} = {{s'}_{{\rm{anchor}}}}/s} \\ {{s_{{\rm{mask}}}} = {{s'}_{{\rm{mask}}}}/s} \end{array}} \right.,$ (3)

 $\left\{ \begin{array}{l} \min {\rm{ }}\frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}K({x_i}, {x_j}) - } } \sum\limits_{i = 1}^N {{\alpha _i}} \\ {\rm{s}}{\rm{.t}}{\rm{. }}\sum\limits_{i = 1}^N {{\alpha _i}{y_i}} {\rm{ , 0}} \leqslant {\alpha _i} \leqslant C, \;\;i = 1, 2, \cdots , N \\ \end{array} \right.,$ (4)

 $\begin{array}{l} T = \{ ({x_1}, {y_1}), ({x_{\rm{2}}}, {y_{\rm{2}}}), \ldots , ({x_N}, {y_N})\} , \\ {x_i} = ({w_{{\rm{anchor}}}}, {h_{{\rm{anchor}}}}, {s_{{\rm{anchor}}}}, {s_{{\rm{mask}}}}, l), \;\;{y_i} = \left( {0, 1} \right), \end{array}$

4 实验 4.1 数据采集与模型训练

 图 4 数据集组成 Fig. 4 Data set composition

4.2 实验结果与分析

 $P{\rm{ = }}\frac{{TP}}{{FP + TP}},$ (5)

 $R{\rm{ = }}\frac{{TP}}{{FP + FN}},$ (6)

F1分数：

 ${F_{\rm{1}}}{\rm{ = }}\frac{{{\rm{2}}P \cdot R}}{{P + R}},$ (7)

 Type Accuracy/% Recall/% F1/% 1 wanchor+hanchor+sanchor+l 88.12 95.13 91.49 2 wanchor+hanchor+smask+l 90.16 92.84 91.48 3 hanchor+sanchor+smask+l 90.22 81.36 85.56 4 wanchor+sanchor+smask+l 94.34 82.64 88.10 5 wanchor+hanchor+sanchor+smask 55.61 64.92 59.51 6 wanchor+hanchor+sanchor+smask+l 98.47 97.56 98.01

 Model Accuracy/% Recall/% F1/% Time/ms yolov3+SVM 95.08 95.31 95.19 87 Mask R-CNN+SVM 96.64 95.89 96.26 134 Ours model 98.47 97.56 98.01 136

 图 5 三种算法在多种场景下的预警效果对比图。从左到右分别是Mask R-CNN、本文框架、yolov3经过预警模型分类的检测结果，红色代表检测为预警目标，绿色代表安全目标。(a)会车场景一；(b)会车场景二；(c)会车场景三；(d)跟车场景一；(e)跟车场景二；(f)行人场景；(g)跟车与行人复杂场景；(h)中距离多车交会；(i)近距离多车交会；(j)远距离多车交会 Fig. 5 Comparison diagram of three models in various scenarios. From left to right are the detection results of Mask R-CNN, framework of this paper and yolov3 classified by the warning model. The red represents the detection of warning targets and the green represents the security targets. (a) Meeting scene 1; (b) Meeting scene 2; (c) Meeting scene 3; (d) Following the car scene 1; (e) Scene 2 with the car; (f) Pedestrian scene; (g) Complex scenes with cars and pedestrians; (h) Medium-distance multi-vehicle meeting; (i) Close multiple vehicle meeting; (j) Long distance multi-vehicle meeting
5 结论

 [1] Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893. [2] Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008: 1-8. [3] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587. [4] Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440-1448. [5] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [Crossref] [6] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788. [7] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525. [8] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//The 14th European Conference on Computer Vision, 2016: 21-37. [9] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2980-2988. [10] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936-944. [11] Yang H C, Zhu W B, Tong Y. Pedestrian collision warning system based on looking-in and looking-out visual information analysis[J]. CAAI Transactions on Intelligent Systems, 2019, 14(4): 756-760. 杨会成, 朱文博, 童英. 基于车内外视觉信息的行人碰撞预警方法[J]. 智能系统学报, 2019, 14(4): 756-760 [Crossref] [12] Yang D F, Sun F C, Wang S C, et al. Simultaneous estimation of ego-motion and vehicle distance by using a monocular camera[J]. Science China Information Sciences, 2014, 57(5): 1-10. [Crossref] [13] Xu Y F, Wang Y, Guo L. Unsupervised ego-motion and dense depth estimation with monocular video[C]//Proceedings of 2018 IEEE 18th International Conference on Communication Technology, 2018: 1306-1310. [14] Tateno K, Tombari F, Laina I, et al. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6565-6574. [15] Teichmann M, Weber M, Zöllner M, et al. MultiNet: real-time joint semantic reasoning for autonomous driving[C]//Proceedings of 2018 IEEE Intelligent Vehicles Symposium (Ⅳ), 2018: 1013-1020. [16] Li B J, Liu S, Xu W C, et al. Real-time object detection and semantic segmentation for autonomous driving[J]. Proceedings of SPIE, 2017, 10608: 106080P. [Crossref] [17] Chen L F, Yang Z, Ma J J, et al. Driving scene perception network: real-time joint detection, depth estimation and semantic segmentation[C]//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision, 2018: 1283-1291. [18] Peng Q C, Song Y C. Object recognition and localization based on Mask R-CNN[J]. Journal of Tsinghua University (Science and Technology), 2019, 59(2): 135-141. 彭秋辰, 宋亦旭. 基于Mask R-CNN的物体识别和定位[J]. 清华大学学报(自然科学版), 2019, 59(2): 135-141 [Crossref] [19] Kong H, Audibert J Y, Ponce J. Vanishing point detection for road detection[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 96-103. [20] Moghadam P, Starzyk J A, Wijesoma W S. Fast vanishing-point detection in unstructured environments[J]. IEEE Transactions on Image Processing, 2012, 21(1): 425-430. [Crossref] [21] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. [22] Li Z M, Peng C, Yu G, et al. DetNet: design backbone for object detection[C]//The 15th European Conference on Computer Vision, 2018: 339-354. [23] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[EB/OL]. (2016-04-30). https://arxiv.org/abs/1511.07122v2.