Soft multilabel learning and deep feature fusion for unsupervised person re-identification

Zhang Baohua; Zhu Siyu; Lv Xiaoqi; Gu Yu; Wang Yueming; Liu Xin; Ren Yan; Li Jianjun; Zhang Ming

doi:10.12086/oee.2020.190636

Article navigation > Opto-Electronic Engineering > 2020 Vol. 47 > No. 12 > 190636

Next Article Previous Article

Zhang B H, Zhu S Y, Lv X Q, et al. Soft multilabel learning and deep feature fusion for unsupervised person re-identification[J]. Opto-Electron Eng, 2020, 47(12): 190636. doi: 10.12086/oee.2020.190636

Citation:

Zhang B H, Zhu S Y, Lv X Q, et al. Soft multilabel learning and deep feature fusion for unsupervised person re-identification[J]. Opto-Electron Eng, 2020, 47(12): 190636. doi: 10.12086/oee.2020.190636

Soft multilabel learning and deep feature fusion for unsupervised person re-identification

1.
School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China
2.
School of Information Engineering, Mongolia Industrial University, Huhehaote, Inner Mongolia 010051, China
3.
Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, Baotou, Inner Mongolia 014010, China

Fund Project: Supported by National Natural Science Foundation of China (61962046, 61663036, 61841204), Inner Mongolia Jieqing Cultivation Project (2018JQ02), Inner Mongolia Grassland Talents, Inner Mongolia Youth Science and Technology Innovation Talent Project (Level 1), Inner Mongolia Autonomous Region Natural Science Fund (2015MS0604, 2018MS06018), Inner Mongolia Autonomous Region Higher Education Science Funded by the Technical Research Project (NJZY145)

More Information

Corresponding author: Zhang Baohua, E-mail: zbh_wj2004@imust.edu.cn

Received Date 24 October 2019

Revised Date 02 March 2020

Published Date 15 December 2020

Abstract

Abstract

In cross-camera scenarios, it relies on the learning of label mapping relationships to improve recognition accuracy. The supervised person re-identification model has better recognition accuracy, but there are scalability problems. For example, the accuracy of algorithm identification relies heavily on effective supervised information. When adding a small amount of data in the classification process, all data needs to be reprocessed, resulting in poor real-time performance. Aiming at the above problems, an unsupervised person re-identification algorithm based on soft label is proposed. In order to improve the accuracy of label matching, first, learn soft multilabel to make it close to the real label, and obtain the reference agent by calculating the loss function of the reference data set to achieve the purpose of pre-training the reference data set. Then, calculate the expected value of the minimum distance between the generated data and the real data distribution (using the simplified 2-Wasserstein distance), calculate the mean and standard deviation vector of the soft multilabel in the camera view, and the resulting loss function can solve cross-view domain label consistency issues. In order to improve the validity of the soft tag on the unmarked target data set, the joint embedding loss is calculated, the similar pairs between different categories are mined, and the cross-domain distribution misalignment is corrected. In view of the problem that the residual network training duration and the unsupervised learning accuracy are low, the structure of the residual network is improved by combining the SENet and fusing multi-level depth feature to improve the training speed and accuracy. The experimental results show that the rank-1 and mAP are better than advanced correlation algorithms.
- resnet /
- person re-identification /
- soft multilabel /
- unsupervised /
- depth feature

FullText(HTML)

References

[1]	Xiong F, Xiao Y, Cao Z G, et al. Good practices on building effective CNN baseline model for person re-identification[J]. Proceedings of SPIE, 2019, 11069: 110690I. Google Scholar
[2]	Wang S Q, Xu X, Liu L, et al. Multi-level feature fusion model-based real-time person re-identification for forensics[J]. Journal of Real-Time Image Processing, 2020, 17(1): 73-81. doi: 10.1007/s11554-019-00908-4 CrossRef Google Scholar
[3]	Bak S, Carr P, Lalonde J F. Domain adaptation through synthesis for unsupervised person re-identification[J]. ECCV, 2018: 189-205. Google Scholar
[4]	Ye M, Li J W, Ma A J, et al. Dynamic graph co-matching for unsupervised video-based person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2976-2990. doi: 10.1109/TIP.2019.2893066 CrossRef Google Scholar
[5]	Yu H X, Wu A C, Zheng W S. Cross-view asymmetric metric learning for unsupervised person re-identification[C]// Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 994-1002. Google Scholar
[6]	Fan H H, Zheng L, Yan C G, et al. Unsupervised person re-identification: clustering and fine-tuning[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018, 14(4): 83. Google Scholar
[7]	Wang J Y, Zhu X T, Gong S G, et al. Transferable joint attribute-identity deep learning for unsupervised person re-identification[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 2275-2284. Google Scholar
[8]	Wei L G, Zhang S l, Gao W, et al. Person transfer GAN to bridge domain gap for person re-identification[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 79-88. Google Scholar
[9]	Deng W J, Zheng L, Ye Q X, et al. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 994-1003. Google Scholar
[10]	Zhong Z, Zheng L, Li S Z, et al. Generalizing a person retrieval model hetero-and homogeneously[C]//Proceedings of the European Conference on Computer Vision, Glasgow, 2018: 172-188. Google Scholar
[11]	Yu H X, Zheng W S, Wu A C, et al. Unsupervised person re-identification by soft multilabel learning[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019: 2148-2157. Google Scholar
[12]	He R, Wu X, Sun Z N, et al. Wasserstein CNN: learning invariant features for NIR-VIS face recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(7): 1761-1773. Google Scholar
[13]	Wang F, Xiang X, Cheng J, et al. NormFace: L₂ hypersphere embedding for face verification[C]//Proceedings of the 25th ACM International Conference on Multimedia, California, Mountain View, 2017: 1041-1049. Google Scholar
[14]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 7132-7141. Google Scholar
[15]	Wang C, Zhang Q, Huang C, et al. Mancs: a multi-task attentional network with curriculum sampling for person re-identification[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018: 365-381. Google Scholar
[16]	Fan H, Zheng L, Yan C, et al. Unsupervised Person Re-identification by Deep Learning Tracklet Association[J]. Acm Transactions on Multimedia Computing Communications & Applications, 2018, 14(4): 1-18. Google Scholar
[17]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 770-778. Google Scholar
[18]	Wang Y, Wang L Q, You Y R, et al. Resource aware person re-identification across multiple resolutions[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 8042-8051. Google Scholar
[19]	Hu Y, Wen G H, Luo M N, et al. Competitive inner-imaging squeeze and excitation for residual network[Z]. arXiv: 1807.08920[cs: CV], 2018. Google Scholar
[20]	Zheng L, Shen L Y, Tian L, et al. Scalable person re-identification: a benchmark[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1116-1124. Google Scholar
[21]	Zheng Z D, Zheng L, Yang Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 3754-3762. Google Scholar
[22]	Lin S, Li H L, Li C T, et al. Multi-task mid-level feature alignment network for unsupervised cross-dataset person re-identification[Z]. arXiv: 1807.01440[cs: CV], 2018. Google Scholar
[23]	Yu H X, Wu A C, Zheng W S. Unsupervised person re-identification by deep asymmetric metric embedding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(4): 956-973. Google Scholar
[24]	Li Y J, Yang F E, Liu Y C, et al. Adaptation and re-identification network: an unsupervised deep transfer learning approach to person re-identification[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, 2018: 172-178. Google Scholar
[25]	Lin Y T, Dong X Y, Zheng L, et al. A bottom-up clustering approach to unsupervised person re-identification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019: 8738-8745. Google Scholar

Overview

Overview

Overview: People re-identification is mainly used to retrieve pedestrians of interest in the images taken by the camera, and then retrieve targets similar to the people's image. This technology can save a lot of time and manpower in finding the images of the suspect in the pedestrian database, and has good application prospects in intelligent security, criminal investigation, and image retrieval. The supervised person re-identification model has better recognition accuracy, but there are scalability problems. For example, the accuracy of algorithm identification relies heavily on effective supervised information. When adding a small amount of data in the classification process, all data needs to be reprocessed, resulting in poor real-time performance. Aiming at the above problems, an unsupervised person re-identification algorithm based on soft multilabel is proposed. By learning the feature of the target, and then comparing it with the labeled reference datasets, each unlabeled target gets a soft multilabel. In this learning process, in order to obtain more accurate soft multilabel, we introduce the concept of reference agents and in order to reduce the difference between reference agents and labeled reference datasets, we pre-trained the reference datasets. Using a reference agent instead of a labeled reference dataset to compare with an unlabeled target. We also use three loss functions, which are used to mine hard negative pair information, make the cross-camera labels of the same target consistent, and correct cross-domain distribution misalignment. In these three loss functions, the purpose of mining hard negative pair information is to determine negative pairs more accurately and push the distance of negative pairs farther away; The cross-camera label consistency is to reduce the gap between multilabel for the same target under different camera distributions. Using the simplified 2-Wasserstein distance, the mean and standard deviation vectors of soft multilabel in different camera views are calculated; In order to further improve the effectiveness of the reference agent and solve the problem of cross-domain distribution misalignment, for each reference agent, find unlabeled people close to it and design a loss function. In the process of feature extraction, we use multi-level deep feature fusion to complement deep features with shallow features to achieve the purpose of improving feature robustness and thereby improving the recognition accuracy. We also tried to integrate squeeze-and-excitation networks (SENet) into the residual network to achieve a function similar to the attention mechanism to improve the learning speed. Experimental results show that rank-1 and mAP in this paper are superior to advanced correlation algorithms.