RGB-D object recognition algorithm based on improved double stream convolution recursive neural network

Li Xun; Li Linpeng; Alexander Lazovik; Wang Wenjie; Wang Xiaohua

doi:10.12086/oee.2021.200069

Article navigation > Opto-Electronic Engineering > 2021 Vol. 48 > No. 2 > 200069

Next Article Previous Article

Li X, Li L P, Lazovik A, et al. RGB-D object recognition algorithm based on improved double stream convolution recursive neural network[J]. Opto-Electron Eng, 2021, 48(2): 200069. doi: 10.12086/oee.2021.200069

Citation:

Li X, Li L P, Lazovik A, et al. RGB-D object recognition algorithm based on improved double stream convolution recursive neural network[J]. Opto-Electron Eng, 2021, 48(2): 200069. doi: 10.12086/oee.2021.200069

RGB-D object recognition algorithm based on improved double stream convolution recursive neural network

1.
School of Electronics and Information, Xi'an Polytechnic University, Xi'an, Shaanxi 710048, China
2.
Bernoulli Institute, University of Groningen, Groningen 9747 AG, Netherlands

Fund Project: National Natural Science Foundation of China (61971339), Basic Research Program of Natural Science in Shaanxi Province (2019JM567), Science and Technology Guiding Project of China Textile Industry Federation (2018094), and Innovation and Entrepreneurship Training Programme for University Students (201910709019)

More Information

^*Corresponding author: Li Linpeng, E-mail: 771613990@qq.com

Received Date 02 April 2020

Revised Date 13 June 2020

Published Date 15 February 2021

Abstract

Abstract

An algorithm (Re-CRNN) of image processing is proposed using RGB-D object recognition, which is improved based on a double stream convolutional recursive neural network, in order to improve the accuracy of object recognition. Re-CRNN combines RGB image with depth optical information, the double stream convolutional neural network (CNN) is improved based on the idea of residual learning as follows: top-level feature fusion unit is added into the network, the representation of federation feature is learning in RGB images and depth images and the high-level features are integrated in across channels of the extracted RGB images and depth images information, after that, the probability distribution was generated by Softmax. Finally, the experiment was carried out on the standard RGB-D data set. The experimental results show that the accuracy was 94.1% using Re-CRNN algorithm for the RGB-D object recognition, which was significantly improved compared with the existing image-based object recognition methods.
- RGB-D image /
- structured light /
- object recognition /
- deep learning /
- depth image

FullText(HTML)

References

[1]	Lai K, Bo L F, Ren X F, et al. A large-scale hierarchical multi-view RGB-D object dataset[C]//Proceedings of 2011 IEEE International Conference on Robotics and Automation, 2011: 1817-1824. Google Scholar
[2]	Paulk D, Metsis V, McMurrough C, et al. A supervised learning approach for fast object recognition from RGB-D data[C]//Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments, 2014: 5. Google Scholar
[3]	Bo L F, Ren X F, Fox D. Depth kernel descriptors for object recognition[C]//Proceedings of 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011: 821-826. Google Scholar
[4]	Blum M, Springenberg J T, Wülfing J, et al. A learned feature descriptor for object recognition in RGB-D data[C]//Proceedings of 2012 IEEE International Conference on Robotics and Automation, 2012: 1298-1303. Google Scholar
[5]	向程谕. RGB-D图像的特征提取与分类方法研究[D]. 湘潭: 湘潭大学, 2017: 28-31. Google Scholar Xiang C Y. Research on feature extraction and classification method of RGB-D images[D]. Xiangtan: Xiangtan University, 2017: 28-31. Google Scholar
[6]	李珣, 李林鹏, 南恺恺, 等. 智能家居移动机器人的人脸识别方法[J]. 西安工程大学学报, 2020, 34(1): 61-66. Google Scholar Li X, Li L P, Nan K K, et al. Face recognition method of smart home mobile robot[J]. J Xi'an Poly Univ, 2020, 34(1): 61-66. Google Scholar
[7]	Socher R, Huval B, Bhat B, et al. Convolutional-recursive deep learning for 3D object classification[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012: 665-673. Google Scholar
[8]	殷云华, 李会方. 基于混合卷积自编码极限学习机的RGB-D物体识别[J]. 红外与激光工程, 2018, 47(2): 0203008. Google Scholar Yin Y H, Li H F. RGB-D object recognition based on hybrid convolutional auto-encoder extreme learning machine[J]. Infrared Laser Eng, 2018, 47(2): 0203008. Google Scholar
[9]	Eitel A, Springenberg J T, Spinello L, et al. Multimodal deep learning for robust RGB-D object recognition[C]//Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015: 681-687. Google Scholar
[10]	Aakerberg A, Nasrollahi K, Rasmussen C B, et al. Depth value pre-processing for accurate transfer learning based RGB-D object recognition[C]//Proceedings of the International Joint Conference on Computational Intelligence, 2017: 121-128. Google Scholar
[11]	Liu H P, Li F X, Xu X Y, et al. Multi-modal local receptive field extreme learning machine for object recognition[J]. Neurocomputing, 2018, 277: 4-11. doi: 10.1016/j.neucom.2017.04.077 CrossRef Google Scholar
[12]	Asif U, Bennamoun M, Sohel F A. RGB-D object recognition and grasp detection using hierarchical cascaded forests[J]. IEEE Trans Robot, 2017, 33(3): 547-564. doi: 10.1109/TRO.2016.2638453 CrossRef Google Scholar
[13]	李梁华, 王永雄. 高效3D密集残差网络及其在人体行为识别中的应用[J]. 光电工程, 2020, 47(2): 190139. doi: 10.12086/oee.2020.190139 CrossRef Google Scholar Li L H, Wang Y X. Efficient 3D dense residual network and its application in human action recognition[J]. Opto-Electron Eng, 2020, 47(2): 190139. doi: 10.12086/oee.2020.190139 CrossRef Google Scholar
[14]	Cheng Y H, Cai R, Zhao X, et al. Convolutional fisher kernels for RGB-D object recognition[C]//Proceedings of 2015 International Conference on 3D Vision, 2015: 135-143. Google Scholar
[15]	Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9. Google Scholar
[16]	Li X, Zhao Z F, Liu L, et al. An optimization model of multi-intersection signal control for trunk road under collaborative information[J]. J Control Sci Eng, 2017, 2017: 2846987. Google Scholar
[17]	沈明玉, 俞鹏飞, 汪荣贵, 等. 多路径递归网络结构的单帧图像超分辨率重建[J]. 光电工程, 2019, 46(11): 180489. doi: 10.12086/oee.2019.180489 CrossRef Google Scholar Shen M Y, Yu P F, Wang R G, et al. Image superresolution via multipath recursive convolutional network[J]. Opto-Electron Eng, 2019, 46(11): 180489. doi: 10.12086/oee.2019.180489 CrossRef Google Scholar
[18]	Ren X F, Bo L F, Fox D. RGB-(D) scene labeling: features and algorithms[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 2759-2766. Google Scholar
[19]	Schwarz M, Schulz H, Behnke S. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features[C]//Proceedings of 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015: 1329-1335. Google Scholar
[20]	向程谕, 王冬丽, 周彦, 等. 基于RGB-D融合特征的图像分类[J]. 计算机工程与应用, 2018, 54(8): 178-182, 254. Google Scholar Xiang C Y, Wang D L, Zhou Y, et al. Image classification based on RGB-D fusion feature[J]. Comput Eng Appl, 2018, 54(8): 178-182, 254. Google Scholar

Overview

Overview

Overview: The object recognition of RGB image is easily affected by the external environment, and the recognition accuracy has reached the bottleneck, which is difficult to meet the requirements of practical application. In recent years, the recognition method combined with depth image has become a new way to improve the accuracy of object recognition. The RGB image contains the color and texture features of the object, and the depth image contains the geometric features of the object and has illumination invariance. The fusion of RGB features and depth features can effectively improve the recognition accuracy. In order to make full use of the potential feature information of RGB-D image, and overcome the problem that the existing literature pays attention to the recognition results of single-mode and ignores the complementary advantages of RGB image and depth image, an RGB-D object recognition algorithm (Re-CRNN) based on improved double stream convolution recursive neural network is proposed. The depth image is encoded by calculating the surface normal. The depth image of a single channel is encoded into three channels. The transfer learning method is used to train the original image to generate the same level features as the RGB image. The backbone network is based on the double stream convolution neural network with improved residual learning. Residual learning is introduced to optimize the network structure and reduce the complexity of the model. The parameters of each data stream network are the same. The RGB image and depth image are trained respectively to extract the high-order features of RGB image and depth image. A feature fusion unit is added at the top layer of the network. The extracted high-level features of RGB image and depth image are fused across channels and mapped to a public space. Next, the fused features are further extracted by using a recursive neural network to generate a new feature sequence, which is classified by the softmax classifier. Finally, experiments are carried out on the standard RGB-D data set to compare the effects of different extrusion functions on the experimental results, as well as the fusion results of different convolution layers. The experimental results show that the recognition accuracy of RGB-D image is higher than that of RGB image, and the fusion of RGB features and depth features can further improve the accuracy of object recognition. The RGB-D object recognition algorithm proposed in this paper has achieved the best recognition results. The recognition accuracy rate on the RGB-D data set reaches 94.1%, which is obviously improved compared with the existing methods.