目前行人重识别的研究只关注了可见光下跨摄像头提取图像不变的特征表示,忽视了红外条件下的成像特点,并结合两种模态的研究成果很少。此外,当前行人重识别在判别两个图像时,通常是计算单个卷积层特征图的相似性,这会导致弱特征学习现象。为了解决上述问题,本文提出了基于特征金字塔的随机融合网络,它可以同时计算多个特征层级的相似性,匹配图像时是基于多个语义层的判别因子。该模型关注到红外图像的特性,并且缩小了可见光和红外模态内部负作用的偏差,平衡了模态间的异质差距,综合了局部特征和全局特征学习的优势,有效地解决了跨模态行人重识别问题。实验在SYSU-MM01数据集上对平均精确度和收敛速度进行验证。结果表明,所提的模型优于现有的先进算法,特征金字塔随机融合网络实现了快速收敛且平均精确度达到了32.12%。
基于红外和可见光模态的随机融合特征金子塔行人重识别
作者单位信息

出版日期:2020年12月22日
摘要
参考文献
[1] Xu M, Yu X S, Chen D Y, et al. Pedestrian detection in complex thermal infrared surveillance scene[J]. Journal of Image and Graphics, 2018, 23(12): 1829–1837.
许茗, 于晓升, 陈东岳, 等. 复杂热红外监控场景下行人检测[J]. 中国图象图形学报, 2018, 23(12): 1829–1837.
[2] Zheng L, Shen L Y, Tian L, et al. Scalable person re-identi?cation: a benchmark[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1116–1124.
[3] Dai Z Z, Chen M Q, Zhu S Y, et al. Batch feature erasing for person re-identification and beyond[Z]. arXiv: 1811.07130[cs:CV], 2018: 1811–07130.
[4] Wu A C, Zheng W S, Yu H X, et al. RGB-infrared cross-modality person re-identification[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 2380–7504.
[5] Dai P Y, Ji R R, Wang H B, et al. Cross-modality person re-identification with generative adversarial training[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018: 677–683.
[6] Ye M, Wang Z, Lan X Y, et al. Visible thermal person re-identi?cation via dual-constrained top-ranking[C]// Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Palo Alto, 2018: 1092–1099.
[7] Gray D, Tao H. Viewpoint invariant pedestrian recognition with an ensemble of localized features[C]//Proceedings of the 10th European Conference on Computer Vision, Marseille, France, 2008: 262–275.
[8] Wang X G, Doretto G, Sebastian T, et al. Shape and appearance context modeling[C]//Proceedings of the 11th International Conference on Computer Vision, Rio de Janeiro, 2007: 1–8.
[9] Li W, Zhao R, Xiao T, et al. DeepReID: deep filter pairing neural network for person re-identification[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014: 152–159.
[10] Huang Y, Xu J S, Wu Q, et al. Multi-pseudo regularized label for generated data in person re-identification[J]. IEEE Transactions on Image Processing, 2018, 28(3): 1391–1403.
[11] Liu J W, Zha Z J, Tian Q, et al. Multi-scale triplet CNN for person re-identification[C]//Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 2016: 192–196.
[12] Qian X L, Fu Y W, Jiang Y G, et al. Multi-scale deep learning architectures for person re-identification[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 5399–5408.
[13] Chen Y B, Zhu X T, Gong S G. Person re-identification by deep learning multi-scale representations[C]//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops, Venice, 2017: 2590–2600.
[14] Li X, Zheng W S, Wang X J, et al, Gong S. Multi-scale learning for low-resolution person re-identification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 3765–3773.
[15] Wang Z, Hu R M, Yu Y, et al. Scale-adaptive low-resolution person re-identification via learning a discriminating surface[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, 2016: 2669–2675.
[16] Jing X Y, Zhu X K, Wu F, et al. Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning[J]. IEEE Transactions on Image Processing, 2017, 26(3): 1363–1378.
[17] Zhang D Q, Li W J. Large-scale supervised multimodal hashing with semantic correlation maximization[C]//Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Quebec City, 2014: 2177–2183.
[18] Chen Y C, Zhu X T, Zheng W S, et al. Person re-identification by camera correlation aware feature augmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(2): 392–408.
[19] Zhu X F, Huang Z, Shen H T, et al. Linear cross-modal hashing for efficient multimedia search[C]//Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, 2013: 143–152.
[20] Zhai D M, Chang H, Zhen Y, et al. Parametric local multimodal hashing for cross-view similarity search[C]//Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, 2013: 2754–2760.
[21] Srivastava N, Salakhutdinov R. Multimodal learning with deep Boltzmann machines[J]. Journal of Machine Learning Research, 2014, 15(84): 2949–2980.
[22] Nguyen D T, Hong H G, Kim K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605.
[23] Sarfraz M S, Stiefelhagen R. Deep perceptual mapping for cross-modal face recognition[J]. International Journal of Computer Vision, 2017, 122(3): 426–438.
[24] Xiao T, Li H S, Ouyang W L, et al. Learning deep feature representations with domain guided dropout for person re-identification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016: 1249–1258.
[25] Wang F Q, Zuo W M, Lin L, et al. Joint learning of single-image and cross-image representations for person re-identification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016: 1288–1296.
[26] Jiang X Y, Wu F, Li X, et al. Deep compositional cross-modal learning to rank via local-global alignment[C]//Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, 2015: 69–78.
[27] M?gelmose A, Bahnsen C, Moeslund T B, et al. Tri-modal person re-identification with RGB, depth and thermal features[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, 2013: 301–307.
[28] Sun Y F, Zheng L, Deng W J, et al. SVDNet for pedestrian retrieval[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 3800–3808.
[29] Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models[C]//Proceedings of 30th International Conference on Machine Learning, Atlanta, Georgia, 2013: 18–23.
[30] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, 2011: 315–323.
[31] Bottou L. Stochastic gradient descent tricks[M]//Montavon G, Orr G B, Müller K R. Neural Networks: Tricks of the Trade. Berlin, Heidelberg: Springer, 2012: 421–436.
[32] Dong C, Loy C C, He K M, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(2): 295–307.
导出参考文献,格式为:
引用本文:
汪荣贵, 王静, 杨娟, 等. 基于红外和可见光模态的随机融合特征金子塔行人重识别[J]. 光电工程, 2020, 47(12): 190669.