基于卷积网络的目标跟踪应用研究

赵春梅,陈忠碧,张建林. 基于卷积网络的目标跟踪应用研究[J]. 光电工程,2020,47(1):180668. doi: 10.12086/oee.2020.180668
引用本文: 赵春梅,陈忠碧,张建林. 基于卷积网络的目标跟踪应用研究[J]. 光电工程,2020,47(1):180668. doi: 10.12086/oee.2020.180668
Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668
Citation: Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668

基于卷积网络的目标跟踪应用研究

  • 基金项目:
    重大专项基金(G158207)
详细信息
    作者简介:
    通讯作者: 陈忠碧(1975-),女,博士,副研究员,主要从事运动目标检测与跟踪的研究。E-mail:ioeyoyo@126.com
  • 中图分类号: TP391.41

Research on target tracking based on convolutional networks

  • Fund Project: Supported by Major Special Fund (G158207)
More Information
  • 本文针对目标跟踪应用,提出了基于Siamese-FC跟踪网络的改进卷积网络Siamese-MF,意在更进一步提升跟踪速度和准确性,满足目标跟踪的工程应用需求。对于跟踪网络,考虑速度和精度的权衡,减少计算量,增加卷积特征的感受野是改进跟踪网络的速度和精度的方向。在卷积网络结构上面进行改进结构创新,改进主要集中为两点:1)引入特征融合,丰富特征;2)引入空洞卷积,减少计算量的同时增强感受野。Siamese-MF算法实现了对于复杂场景目标的实时准确跟踪,在公开数据集OTB上测试速度达到平均76 f/s,跟踪成功率的均值达到0.44,而跟踪稳定性的均值达到0.61,实时性、准确性和稳定性均提升,满足目标实时跟踪应用。

  • Overview: Deep learning has achieved good results in image classification, semantic segmentation, target detection and target recognition. However, it is still restricted by small sample training sets on object tracking. Object tracking is one of the most important researches in the field of computer vision, and has a wide range of applications. The challenge of object tracking lies in the complex states such as the target rotation, multi target, blur target, complex background, size change, target occlusion, fast moving and so on. Aiming at target tracking, this paper proposes an improved convolution network Siamese-MF (multi-feature Siamese networks) based on Siamese-FC (fully-convolutional Siamese networks). For tracking networks, considering the balance between speed and accuracy, reducing computational complexity and increasing the receptive field of convolution feature are the directions to improve the speed and accuracy of tracking networks. The improvement of the classical convolution network structure is mainly focused on two points: 1) introducing feature fusion to enrich features; 2) introducing dilated convolution to reduce computational complexity and enhance the receptive field. The improved convolution layer acts as feature extraction layer, and calculates the correlation between the target and the search area through the full convolution layer, so as to get the location of the tracking target according to the correlation graph. Siamese-MF algorithm achieves real-time and accurate tracking of targets in complex scenes. The average speed test on OTB2015 reaches 76 f/s, the mean value of overlap reaches 0.44, and the mean value of precision reaches 0.61, which meets the requirement in real-time tracking application of targets. For target tracking in this paper, the Siamese-MF networks are trained by using 5 convolutional layers of Conv1~Conv5 of AlexNet and 2 connected layers Skip1~Skip2 to extract the feature of target. In the tracking process, the trained networks are used as feed-forward networks, and the maximum score of outputs is regarded as the target location, while template updating is done in time series. Also the result of tracking is adaptive to scale transformation.

  • 加载中
  • 图 1  Siamese-MF前馈网络

    Figure 1.  Feedforward network of Siamese-MF

    图 2  Siamese-MF跟踪流程

    Figure 2.  Tracking process of Siamese-MF

    图 3  Siamese-MF在OTB2015部分的跟踪结果

    Figure 3.  Tracking results of Siamese-MF on OTB2015

    图 4  定性评价指标分析。(a)跟踪成功率;(b)跟踪稳定性;(c)跟踪速度

    Figure 4.  Qualitative evaluation index analysis. (a) Overlap; (b) Accuracy; (c) Velocity

    表 1  Siamese-MF网络操作以及结果

    Table 1.  Operation and results of Siamese-MF network

    Operation Input Filter_size Stride Out
    Conv1 3@127×127 3@255×255 96@11×11 2 96@59×59 96×123×123
    Maxpooling 96@59×59 96×123×123 96@3×3 2 96@29×29 96@61×61
    Skip1 96@29×29 96@61×61 32@3×3+3 > 7×7 2 32@12×12 32@28×28
    32@12×12 32@28×28 16@3×3+3 > 7×7 1 16@6×6 16×22×22
    Conv2 96@29×29 96@61×61 256@5×5 1 256@25×25 256×57×57
    Maxpooling 256@25×25 256×57×57 256@3×3 2 256@12×12 256×28×28
    Conv3 256@12×12 256×28×28 384@3×3 1 384@10×10 384×26×26
    Skip2 384@10×10 384×26×26 16@1×1 1 16@10×10 16×26×26
    16@10×10 16×26×26 16@3×3+2 > 5×5 1 16@6×6 16×22×22
    Conv4 384@10×10 384@26×26 384@3×3 1 384@8×8 384×24×24
    Conv5 384@8×8 384×24×24 32@3×3 1 32@6×6 32×22×22
    下载: 导出CSV

    表 算法1  Siamese-MF训练过程算法流程

    if loop_time < 50:
    input data_set =’ ILSVRC2015’ with16 samples;
    compute loss and update {w1, …, w7} with learning_rate =0.0001, then SGD with momentum=0.9 and weight_dacay=0.0005 else:
       out pretrained Conv1~Conv5 filters{w1, …, w5}, Skip1~Skip2 filters{w6, w7};
    下载: 导出CSV

    表 2  Siamese-MF与Siamese-FC在OTB2015对比测试结果

    Table 2.  Test results of Siamese-MF and Siamese-FC on OTB2015

    Model Overlap Accuracy v/(f/s)
    Siamese-MF 0.44 0.61 76
    Siamese-FC 0.27 0.52 58
    下载: 导出CSV

    表 3  Siamese-MF与Siamese-FC的定量分析结果

    Table 3.  Quantitative analysis results of Siamese-MF and Siamese-FC

    Videos Overlap/(%) Accuracy/(%) v/(f/s)
    Siam-MF Siam-FC Siam-MF Siam-FC Siam-MF Siam-FC
    0034004 43.7 36.9 92.4 61.6 47.8 42.8
    0034014 58 56.8 90.7 70.1 44.4 41.7
    0034019 55.4 50.6 99.6 99.3 48.9 44.8
    0034023 59.1 58.7 100 100 48.6 44.1
    0064003 92.4 91.2 88 69.6 15.1 15
    0117004 56.8 51.2 100 100 51.4 45.9
    0117019 73.4 67.6 100 100 52.4 49.5
    0117024 59.4 56.7 71.8 65.6 36.8 32.4
    0117041 49.5 48.8 100 100 50.4 47.1
    0259004 75.8 37 100 43.9 52.4 48
    0259014 90.6 88.3 100 100 31 29.5
    0259019 89.5 78.1 100 86.7 18.4 17.5
    0321003 60.2 58.9 70.6 66.7 48.2 43.5
    0473003 82.9 80.1 84.4 81 30.3 27.5
    0555003 77.3 74.9 100 100 48.7 45.6
    743004 52.9 38 100 97.3 49.6 44.7
    0899003 79.5 76.9 64.5 59.9 22.5 20.8
    1000004 69.5 51.7 71.4 50 52.1 47.2
    1035001 84.3 83.8 32.6 32.1 11.6 11.2
    Mean 69 62.4 87.7 78.1 40 36.8
    下载: 导出CSV
  • [1]

    Yilmaz A, Javed O, Shah M. Object tracking: a survey[J]. ACM Computing Surveys, 2006, 38(4): 13. doi: 10.1145/1177352.1177355

    [2]

    Sivanantham S, Paul N N, Iyer R S. Object tracking algorithm implementation for security applications[J]. Far East Journal of Electronics and Communications, 2016, 16(1): 1–13. doi: 10.17654/EC016010001

    [3]

    Kwak S, Cho M, Laptev I, et al. Unsupervised object discovery and tracking in video collections[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 3173–3181.

    [4]

    罗海波, 许凌云, 惠斌, 等.基于深度学习的目标跟踪方法研究现状与展望[J].红外与激光工程, 2017, 46(5): 502002. doi: 10.3788/IRLA201746.0502002

    Luo H B, Xu L Y, Hui B, et al. Status and prospect of target tracking based on deep learning[J]. Infrared and Laser Engineering, 2017, 46(5): 502002. doi: 10.3788/IRLA201746.0502002

    [5]

    Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564–575. doi: 10.1109/TPAMI.2003.1195991

    [6]

    Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision[C]//Proceedings of the 7th International Joint Conference on Artificial Intelligence, 1981: 674–679.

    [7]

    Jia X, Lu H C, Yang M H. Visual tracking via adaptive structural local sparse appearance model[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 1822–1829.

    [8]

    Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596. doi: 10.1109/TPAMI.2014.2345390

    [9]

    樊香所, 徐智勇, 张建林.改进粒子滤波的弱小目标跟踪[J].光电工程, 2018, 45(8): 170569. doi: 10.12086/oee.2018.170569

    Fan X S, Xu Z Y, Zhang J L. Dim small target tracking based on improved particle filter[J]. Opto-Electronic Engineering, 2018, 45(8): 170569. doi: 10.12086/oee.2018.170569

    [10]

    奚玉鼎, 于涌, 丁媛媛, 等.一种快速搜索空中低慢小目标的光电系统[J].光电工程, 2018, 45(4): 170654. doi: 10.12086/oee.2018.170654

    Xi Y D, Yu Y, Ding Y Y, et al. An optoelectronic system for fast search of low slow small target in the air[J]. Opto-Electronic Engineering, 2018, 45(4): 170654. doi: 10.12086/oee.2018.170654

    [11]

    Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012: 1097–1105.

    [12]

    Karen S Y, Andrew Z M. Very Deep Convolutional Networks for Large-scale Image Recognition[Z]. arXiv: 1409.1556[cs: CV], 2015. https://arxiv.org/abs/1409.1556

    [13]

    Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

    [14]

    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

    [15]

    Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

    [16]

    Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211–252. doi: 10.1007/s11263-015-0816-y

    [17]

    Chatfield K, Simonyan K, Vedaldi A, et al. Return of the devil in the details: delving deep into convolutional nets[Z]. arXiv: 1405.3531[cs: CV], 2014. https://arxiv.org/abs/1405.3531

    [18]

    Shelhamer E, Long G, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. doi: 10.1109/TPAMI.2016.2572683

    [19]

    Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587.

    [20]

    李贺.基于深度学习的目标跟踪算法研究综述[J].黑龙江科技信息, 2017(17): 49. doi: 10.3969/j.issn.1673-1328.2017.17.046

    Li H. An overview of target tracking algorithm based on deep learning[J]. Heilongjiang Science and technology information, 2017(17): 49. doi: 10.3969/j.issn.1673-1328.2017.17.046

    [21]

    Wang N Y, Yeung D Y. Learning a Deep Compact Image Representation for Visual Tracking[C]//NIPS. Curran Associates Inc. 2013: 809–817.

    [22]

    Nam H, Baek M, Han B. Modeling and Propagating CNNs in a Tree Structure for Visual Tracking[Z]. arXiv: 1608.07242v1[cs: CV], 2016. https://arxiv.org/abs/1608.07242v1

    [23]

    Wang L J, Ouyang W L, Wang X G, et al. Visual tracking with fully convolutional networks[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), 2015: 3119–3127.

    [24]

    Ma C, Huang J B, Yang X K, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), 2015.

    [25]

    Heid D, Thrun S, Savarese S. Learning to track at 100 FPS with deep regression networks[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 749–765.

    [26]

    Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional Siamese networks for object tracking[C]//European Conference on Computer Vision, 2016: 850–865.

    [27]

    Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[Z]. arXiv: 1511.07122[cs: CV], 2016. https://arxiv.org/abs/1511.07122

    [28]

    王慧燕, 杨宇涛, 张政, 等.深度学习辅助的多行人跟踪算法[J].中国图象图形学报, 2017, 22(3): 349–357. doi: 10.11834/jig.20170309

    Wang H Y, Yang Y T, Zhang Z, et al. Deep-learning-aided multi-pedestrian tracking algorithm[J]. Journal of Image and Graphics, 2017, 22(3): 349–357. doi: 10.11834/jig.20170309

    [29]

    王晓冬.视觉角度对游戏可玩性的影响[J].河南科技, 2014(7): 12. http://d.old.wanfangdata.com.cn/Periodical/hnkj201407010

    Wang X D. The influence of visual angle on the playability of games[J]. Henan Science and Technology, 2014(7): 12 http://d.old.wanfangdata.com.cn/Periodical/hnkj201407010

    [30]

    Horikoshi K, Misawa K, Lang R. 20-fps motion capture of phase-controlled wave-packets for adaptive quantum control[C]//Proceedings of the 15th International Conference on Ultrafast Phenomena XV, 2006: 175–177.

    [31]

    赵春梅, 陈忠碧, 张建林.基于深度学习的飞机目标跟踪应用研究[J].光电工程, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261

    Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electronic Engineering, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261

  • 加载中

(4)

(4)

计量
  • 文章访问数:  7101
  • PDF下载数:  2063
  • 施引文献:  0
出版历程
收稿日期:  2018-12-19
修回日期:  2019-03-22
刊出日期:  2020-01-01

目录

/

返回文章
返回