Zhang C Y, Hou Z Q, Pu L, et al. Siamese network visual tracking algorithm based on online learning[J]. Opto-Electron Eng, 2021, 48(4): 200140. doi: 10.12086/oee.2021.200140
Citation: Zhang C Y, Hou Z Q, Pu L, et al. Siamese network visual tracking algorithm based on online learning[J]. Opto-Electron Eng, 2021, 48(4): 200140. doi: 10.12086/oee.2021.200140

Siamese network visual tracking algorithm based on online learning

    Fund Project: National Natural Science Foundation of China (61473309, 61703423)
More Information
  • Visual tracking algorithm based on a Siamese network is an important method in the field of visual tracking in recent years, and it has good performance in tracking speed and accuracy. However, most tracking algorithms based on the Siamese network rely on an off-line training model and lack of online update to tracker. In order to solve this problem, we propose an online learning-based visual tracking algorithm for Siamese networks. The algorithm adopts the idea of double template, treats the target in the first frame as a static template, and uses the high confidence update strategy to obtain the dynamic template in the subsequent frame; in online tracking, the fast transform learning model is used to learn the apparent changes of the target from the double template, and the target likelihood probability map of the search area is calculated according to the color histogram characteristics of the current frame, and the background suppression learning is carried out. Finally, the response map obtained by the dual templates is weighted, and the final prediction result is obtained. The experimental results on OTB2015, TempleColor128, and VOT datasets show that the test results of this algorithm are improved compared with the mainstream algorithms in recent years and have better tracking performance in target deformation, similar background interference, fast motion, and other scenarios.
  • 加载中
  • [1] 侯志强, 韩崇昭. 视觉跟踪技术综述[J]. 自动化学报, 2006, 32(4): 603-617.

    Google Scholar

    Hou Z Q, Han C Z. A survey of visual tracking[J]. Acta Automat Sin, 2006, 32(4): 603-617.

    Google Scholar

    [2] 汤学猛, 陈志国, 傅毅. 基于核滤波器实时运动目标的抗遮挡再跟踪[J]. 光电工程, 2020, 47(1): 190279. doi: 10.12086/oee.2020.190279

    CrossRef Google Scholar

    Tang X M, Chen Z G, Fu Y. Anti-occlusion and re-tracking of real-time moving target based on kernelized correlation filter[J]. Opto-Electron Eng, 2020, 47(1): 190279. doi: 10.12086/oee.2020.190279

    CrossRef Google Scholar

    [3] 卢湖川, 李佩霞, 王栋. 目标跟踪算法综述[J]. 模式识别与人工智能, 2018, 31(1): 61-76.

    Google Scholar

    Lu H C, Li P X, Wang D. Visual object tracking: a survey[J]. Patt Recog Artif Intell, 2018, 31(1): 61-76.

    Google Scholar

    [4] 赵春梅, 陈忠碧, 张建林. 基于卷积网络的目标跟踪应用研究[J]. 光电工程, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668

    CrossRef Google Scholar

    Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668

    CrossRef Google Scholar

    [5] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional Siamese networks for object tracking[C]//European Conference on Computer Vision, Cham, 2016: 850-865.

    Google Scholar

    [6] Dong X P, Shen J B. Triplet loss in Siamese network for object tracking[C]//Proceedings of the European Conference on Computer Vision (ECCV), Cham, 2018.

    Google Scholar

    [7] Wang Q, Gao J, Xing J L, et al. Dcfnet: Discriminant correlation filters network for visual tracking[Z]. arXiv: 1704.04057v1, 2017.

    Google Scholar

    [8] Li B, Yan J J, Wu W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 8971-8980.

    Google Scholar

    [9] Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge[J]. Int J Comput Vis, 2015, 115(3): 211-252. doi: 10.1007/s11263-015-0816-y

    CrossRef Google Scholar

    [10] Real E, Shlens J, Mazzocchi S, et al. Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 5296-5305.

    Google Scholar

    [11] Guo Q, Feng W, Zhou C, et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 1763-1771.

    Google Scholar

    [12] Kuai Y L, Wen G J, Li D D. Masked and dynamic Siamese network for robust visual tracking[J]. Inf Sci, 2019, 503: 169-182. doi: 10.1016/j.ins.2019.07.004

    CrossRef Google Scholar

    [13] Wu Y, Lim J, Yang M H. Object tracking benchmark[J]. IEEE Trans Patt Anal Mach Intellig, 2015, 37(9): 1834-1848.

    Google Scholar

    [14] Liang P P, Blasch E, Ling H B. Encoding color information for visual tracking: Algorithms and benchmark[J]. IEEE Trans Image Process, 2015, 24(12): 5630-5644. doi: 10.1109/TIP.2015.2482905

    CrossRef Google Scholar

    [15] Kristan M, Matas J, Leonardis A, et al. The visual object tracking vot2015 challenge results[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops, Santiago, Chile, 2015: 1-23.

    Google Scholar

    [16] Wang M M, Liu Y, Huang Z Y. Large margin object tracking with circulant feature maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017: 4021-4029.

    Google Scholar

    [17] 侯志强, 陈立琳, 余旺盛, 等. 基于双模板Siamese网络的鲁棒视觉跟踪算法[J]. 电子与信息学报, 2019, 41(9): 2247-2255.

    Google Scholar

    Hou Z Q, Chen L L, Yu W S, et al. Robust visual tracking algorithm based on siamese network with dual templates[J]. J Electr Inf Technol, 2019, 41(9): 2247-2255.

    Google Scholar

    [18] Possegger H, Mauthner T, Bischof H. In defense of color-based model-free tracking[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015: 2113-2120.

    Google Scholar

    [19] 谢瑜, 陈莹. 空间注意机制下的自适应目标跟踪[J]. 系统工程与电子技术, 2019, 41(9): 1945-1954.

    Google Scholar

    Xie Y, Chen Y. Adaptive object tracking based on spatial attention mechanism[J]. Syst Eng Electr, 2019, 41(9): 1945-1954.

    Google Scholar

    [20] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems, New York, NY, USA, 2012.

    Google Scholar

    [21] Song Y B, Ma C, Gong L J, et al. Crest: Convolutional residual learning for visual tracking[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2555-2564.

    Google Scholar

    [22] Bertinetto L, Valmadre J, Golodetz S, et al. Staple: Complementary learners for real-time tracking[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 1401-1409.

    Google Scholar

    [23] Wang N, Song Y B, Ma C, et al. Unsupervised deep tracking[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019: 1308-1317.

    Google Scholar

    [24] Danelljan M, Häger G, Khan F S, et al. Learning spatially regularized correlation filters for visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4310-4318.

    Google Scholar

    [25] Danelljan M, Häger G, Khan F, et al. Accurate scale estimation for robust visual tracking[C]//British Machine Vision Conference, Nottingham, 2014.

    Google Scholar

    [26] Zhang J M, Ma S G, Sclaroff S. MEEM: robust tracking via multiple experts using entropy minimization[C]//European Conference on Computer Vision, Cham, 2014: 188-203.

    Google Scholar

    [27] Valmadre J, Bertinetto L, Henriques J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 2805-2813.

    Google Scholar

    [28] Galoogahi H K, Fagg A, Lucey S. Learning background-aware correlation filters for visual tracking[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 1135-1143.

    Google Scholar

    [29] Zhang Z P, Peng H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 4591-4600.

    Google Scholar

    [30] Li B, Wu W, Wang Q, et al. Siamrpn++: Evolution of siamese visual tracking with very deep networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019: 4282-4291.

    Google Scholar

    [31] Li X, Ma C, Wu B Y, et al. Target-aware deep tracking[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 1369-1378.

    Google Scholar

  • Overview: Visual tracking is a fundamental challenging task in computer vision. Tracking predicts a target position in all subsequent frames given the initial frame information. It has been widely used in intelligent surveillance, unmanned driving, military detection, and other fields. In visual tracking, the target is usually faced with scale change, motion blur, target deformation, occlusion. At present, most trackers based on discriminative models include the correlation filters trackers which use hand-crafted features or CNNs and the Siamese network trackers. Visual tracking algorithm based on the Siamese network is an important method in the field of visual tracking in recent years, and it has good performance in tracking speed and accuracy. However, most tracking algorithms based on the Siamese network rely on off-line training model and lack of online update to tracker. Guo et al. proposed the DSiam algorithm, which constructed a dynamic Siamese network structure, including a fast transform learning model, and was able to learn the apparent changes and background suppression of the online target in the tracking phase. But it still has some disadvantages. Firstly, in the tracking stage, the rich information in the history frame is not used. Second, when background suppression, only a Gaussian weight graph is used in the search area, which cannot effectively highlight the target and suppress the background. In order to solve these problems, we propose an online learning-based visual tracking algorithm for Siamese networks. Main tasks as follows:

    The algorithm adopts the idea of double template, treats the target in the first frame as a static template, and uses the high confidence update strategy to obtain the dynamic template in the subsequent frame.

    In online tracking, the fast transform learning model is used to learn the apparent changes of the target from the double template, and the target likelihood probability map of the search area is calculated according to the color histogram characteristics of the current frame, and the background suppression learning is carried out.

    Finally, the response map obtained by the dual templates is weighted and the final prediction result is obtained.

    The experimental results on OTB2015, TempleColor128 and VOT datasets show that the test results of this algorithm are improved compared with the mainstream algorithms in recent years, and have better tracking performance in target deformation, similar background interference, fast motion, and other scenarios.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(7)

Tables(6)

Article Metrics

Article views() PDF downloads() Cited by()

Access History
Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint