Research on target tracking based on convolutional networks

Zhao Chunmei; Chen Zhongbi; Zhang Jianlin

doi:10.12086/oee.2020.180668

Article navigation > Opto-Electronic Engineering > 2020 Vol. 47 > No. 1 > 180668

Next Article Previous Article

Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668

Citation:

Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668

Research on target tracking based on convolutional networks

1.
Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
2.
University of Chinese Academy of Sciences, Beijing 100049, China

Fund Project: Supported by Major Special Fund (G158207)

More Information

Corresponding author: Chen Zhongbi, E-mail: ioeyoyo@126.com

Received Date 19 December 2018

Revised Date 22 March 2019

Published Date 01 January 2020

Abstract

Abstract

In this paper, aiming at the application of target tracking, an improved convolutional network Siamese-MF (multi-feature Siamese networks) based on Siamese-FC (fully-convolutional Siamese networks) is proposed to further improve the tracking speed and accuracy to meet the requirements of target tracking in engineering applications. For tracking networks, considering the trade-off between speed and accuracy, reducing computational complexity and increasing the receptive field of convolution feature are the directions to improve the speed and accuracy of tracking networks. There are two main points to improve the structure of convolution network: 1) introducing feature fusion to enrich features; 2) introducing dilated convolution to reduce the amount of computation and enhance the field of perception. Siamese-MF algorithm achieves real-time and accurate tracking of targets in complex scenes. The average speed of testing on OTB of public data sets reaches 76 f/s, the average value of overlap reaches 0.44, and the average value of accuracy reaches 0.61. The real-time, accuracy and stability are improved to meet the requirement in real-time target tracking application.
- Siamese-MF /
- feature fusion /
- full convolution /
- dilated convolution /
- real-time tracking

FullText(HTML)

References

[1]	Yilmaz A, Javed O, Shah M. Object tracking: a survey[J]. ACM Computing Surveys, 2006, 38(4): 13. doi: 10.1145/1177352.1177355 CrossRef Google Scholar
[2]	Sivanantham S, Paul N N, Iyer R S. Object tracking algorithm implementation for security applications[J]. Far East Journal of Electronics and Communications, 2016, 16(1): 1–13. doi: 10.17654/EC016010001 CrossRef Google Scholar
[3]	Kwak S, Cho M, Laptev I, et al. Unsupervised object discovery and tracking in video collections[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 3173–3181. Google Scholar
[4]	罗海波, 许凌云, 惠斌, 等.基于深度学习的目标跟踪方法研究现状与展望[J].红外与激光工程, 2017, 46(5): 502002. doi: 10.3788/IRLA201746.0502002 CrossRef Google Scholar Luo H B, Xu L Y, Hui B, et al. Status and prospect of target tracking based on deep learning[J]. Infrared and Laser Engineering, 2017, 46(5): 502002. doi: 10.3788/IRLA201746.0502002 CrossRef Google Scholar
[5]	Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564–575. doi: 10.1109/TPAMI.2003.1195991 CrossRef Google Scholar
[6]	Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision[C]//Proceedings of the 7th International Joint Conference on Artificial Intelligence, 1981: 674–679. Google Scholar
[7]	Jia X, Lu H C, Yang M H. Visual tracking via adaptive structural local sparse appearance model[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 1822–1829. Google Scholar
[8]	Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596. doi: 10.1109/TPAMI.2014.2345390 CrossRef Google Scholar
[9]	樊香所, 徐智勇, 张建林.改进粒子滤波的弱小目标跟踪[J].光电工程, 2018, 45(8): 170569. doi: 10.12086/oee.2018.170569 CrossRef Google Scholar Fan X S, Xu Z Y, Zhang J L. Dim small target tracking based on improved particle filter[J]. Opto-Electronic Engineering, 2018, 45(8): 170569. doi: 10.12086/oee.2018.170569 CrossRef Google Scholar
[10]	奚玉鼎, 于涌, 丁媛媛, 等.一种快速搜索空中低慢小目标的光电系统[J].光电工程, 2018, 45(4): 170654. doi: 10.12086/oee.2018.170654 CrossRef Google Scholar Xi Y D, Yu Y, Ding Y Y, et al. An optoelectronic system for fast search of low slow small target in the air[J]. Opto-Electronic Engineering, 2018, 45(4): 170654. doi: 10.12086/oee.2018.170654 CrossRef Google Scholar
[11]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012: 1097–1105. Google Scholar
[12]	Karen S Y, Andrew Z M. Very Deep Convolutional Networks for Large-scale Image Recognition[Z]. arXiv: 1409.1556[cs: CV], 2015. https://arxiv.org/abs/1409.1556 Google Scholar
[13]	Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. Google Scholar
[14]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. Google Scholar
[15]	Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. Google Scholar
[16]	Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211–252. doi: 10.1007/s11263-015-0816-y CrossRef Google Scholar
[17]	Chatfield K, Simonyan K, Vedaldi A, et al. Return of the devil in the details: delving deep into convolutional nets[Z]. arXiv: 1405.3531[cs: CV], 2014. https://arxiv.org/abs/1405.3531 Google Scholar
[18]	Shelhamer E, Long G, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. doi: 10.1109/TPAMI.2016.2572683 CrossRef Google Scholar
[19]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587. Google Scholar
[20]	李贺.基于深度学习的目标跟踪算法研究综述[J].黑龙江科技信息, 2017(17): 49. doi: 10.3969/j.issn.1673-1328.2017.17.046 CrossRef Google Scholar Li H. An overview of target tracking algorithm based on deep learning[J]. Heilongjiang Science and technology information, 2017(17): 49. doi: 10.3969/j.issn.1673-1328.2017.17.046 CrossRef Google Scholar
[21]	Wang N Y, Yeung D Y. Learning a Deep Compact Image Representation for Visual Tracking[C]//NIPS. Curran Associates Inc. 2013: 809–817. Google Scholar
[22]	Nam H, Baek M, Han B. Modeling and Propagating CNNs in a Tree Structure for Visual Tracking[Z]. arXiv: 1608.07242v1[cs: CV], 2016. https://arxiv.org/abs/1608.07242v1 Google Scholar
[23]	Wang L J, Ouyang W L, Wang X G, et al. Visual tracking with fully convolutional networks[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), 2015: 3119–3127. Google Scholar
[24]	Ma C, Huang J B, Yang X K, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), 2015. Google Scholar
[25]	Heid D, Thrun S, Savarese S. Learning to track at 100 FPS with deep regression networks[C]//Proceedings of the 14^th European Conference on Computer Vision, 2016: 749–765. Google Scholar
[26]	Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional Siamese networks for object tracking[C]//European Conference on Computer Vision, 2016: 850–865. Google Scholar
[27]	Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[Z]. arXiv: 1511.07122[cs: CV], 2016. https://arxiv.org/abs/1511.07122 Google Scholar
[28]	王慧燕, 杨宇涛, 张政, 等.深度学习辅助的多行人跟踪算法[J].中国图象图形学报, 2017, 22(3): 349–357. doi: 10.11834/jig.20170309 CrossRef Google Scholar Wang H Y, Yang Y T, Zhang Z, et al. Deep-learning-aided multi-pedestrian tracking algorithm[J]. Journal of Image and Graphics, 2017, 22(3): 349–357. doi: 10.11834/jig.20170309 CrossRef Google Scholar
[29]	王晓冬.视觉角度对游戏可玩性的影响[J].河南科技, 2014(7): 12. Google Scholar Wang X D. The influence of visual angle on the playability of games[J]. Henan Science and Technology, 2014(7): 12 Google Scholar
[30]	Horikoshi K, Misawa K, Lang R. 20-fps motion capture of phase-controlled wave-packets for adaptive quantum control[C]//Proceedings of the 15th International Conference on Ultrafast Phenomena XV, 2006: 175–177. Google Scholar
[31]	赵春梅, 陈忠碧, 张建林.基于深度学习的飞机目标跟踪应用研究[J].光电工程, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261 CrossRef Google Scholar Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electronic Engineering, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261 CrossRef Google Scholar

Overview

Overview

Overview: Deep learning has achieved good results in image classification, semantic segmentation, target detection and target recognition. However, it is still restricted by small sample training sets on object tracking. Object tracking is one of the most important researches in the field of computer vision, and has a wide range of applications. The challenge of object tracking lies in the complex states such as the target rotation, multi target, blur target, complex background, size change, target occlusion, fast moving and so on. Aiming at target tracking, this paper proposes an improved convolution network Siamese-MF (multi-feature Siamese networks) based on Siamese-FC (fully-convolutional Siamese networks). For tracking networks, considering the balance between speed and accuracy, reducing computational complexity and increasing the receptive field of convolution feature are the directions to improve the speed and accuracy of tracking networks. The improvement of the classical convolution network structure is mainly focused on two points: 1) introducing feature fusion to enrich features; 2) introducing dilated convolution to reduce computational complexity and enhance the receptive field. The improved convolution layer acts as feature extraction layer, and calculates the correlation between the target and the search area through the full convolution layer, so as to get the location of the tracking target according to the correlation graph. Siamese-MF algorithm achieves real-time and accurate tracking of targets in complex scenes. The average speed test on OTB2015 reaches 76 f/s, the mean value of overlap reaches 0.44, and the mean value of precision reaches 0.61, which meets the requirement in real-time tracking application of targets. For target tracking in this paper, the Siamese-MF networks are trained by using 5 convolutional layers of Conv1~Conv5 of AlexNet and 2 connected layers Skip1~Skip2 to extract the feature of target. In the tracking process, the trained networks are used as feed-forward networks, and the maximum score of outputs is regarded as the target location, while template updating is done in time series. Also the result of tracking is adaptive to scale transformation.