Small object detection based on multi-scale feature fusion using remote sensing images

Ma Liang; Gou Yutao; Lei Tao; Jin Lei; Song Yixuan

doi:10.12086/oee.2022.210363

Article navigation > Opto-Electronic Engineering > 2022 Vol. 49 > No. 4 > 210363

Next Article Previous Article

Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

Citation:

Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

Small object detection based on multi-scale feature fusion using remote sensing images

1.
Photoelectric Detection Technology Laboratory, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
2.
Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
3.
University of Chinese Academy of Sciences, Beijing 100049, China

More Information

^*Corresponding author: taoleiyan@ioe.ac.cn

Received Date 15 November 2021

Revised Date 06 January 2022

Published Date 25 April 2022

Abstract

Abstract

This paper proposes a robust small object detection method based on multi-scale feature fusion using remote sensing images. When the natural image-based pre-training model is directly applied to the remote sensing images, the large number of parameters and excessive down sampling in widely feature extractions may lead to the disappearances of small objects due to feature gaps. Therefore, based on the distribution of all object sizes in the dataset (i.e., prior knowledge), a lightweight feature extraction module is first integrated via dynamic selection mechanism that allows each neuron to adaptively allocate the receptive field size for detection. Meanwhile, the information reflected by various scale features has different amounts and emphasis. To increase the accuracy of image feature expression, the FPN (feature pyramid networks) module based on adaptive feature weighted fusion is applied by using the grouping convolution to group all feature channels without affecting each other. In addition, deep learning needs a large amount of data to drive. Due to the lack of remote sensing small object dataset, we built a remote sensing plane small object dataset, and processed the plane and small-vehicle objects in DOTA dataset to make its distribution of size meet the requirement of small object detection. Experimental results show that compared with most mainstream detection methods, the proposed method achieves better results on DOTA and self-built datasets.
- multi-scale features /
- small object detection /
- feature fusion /
- scene complexity

FullText(HTML)

References

[1]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587. Google Scholar
[2]	Girshick R. Fast R-Cnn[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448. Google Scholar
[3]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015, 28: 91–99. Google Scholar
[4]	Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21–37. Google Scholar
[5]	Redmon J, Farhadi A. YOLOV3: an incremental improvement[Z]. arXiv: 1804.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767 Google Scholar
[6]	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2999–3007. Google Scholar
[7]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944. Google Scholar
[8]	Fu C Y, Liu W, Ranga A, et al. DSSD: deconvolutional single shot detector[Z]. arXiv: 1701.06659, 2017. https://arxiv.org/abs/1701.06659 Google Scholar
[9]	Li Z X, Zhou F Q. FSSD: feature fusion single shot multibox detector[Z]. arXiv: 1712.00960, 2017. https://doi.org/10.48550/arXiv.1712.00960 Google Scholar
[10]	Cui L S, Ma R, Lv P, et al. MDSSD: multi-scale deconvolutional single shot detector for small objects[Z]. arXiv: 1805.07009, 2018. https://doi.org/10.48550/arXiv.1805.07009 Google Scholar
[11]	Liang Z W, Shao J, Zhang D Y, et al. Small object detection using deep feature pyramid networks[C]//Proceedings of the 19th Pacific Rim Conference on Multimedia, 2018: 554–564. Google Scholar
[12]	Cao G M, Xie X M, Yang W Z, et al. Feature-fused SSD: fast detection for small objects[J]. Proc SPIE, 2018, 10615: 106151E. Google Scholar
[13]	Zhang S F, Wen L Y, Bian X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4203–4212. Google Scholar
[14]	Zhao Q J, Sheng T, Wang Y T, et al. M2Det: a single-shot object detector based on multi-level feature pyramid network[J]. Proc AAAI Conf Artif Intell, 2019, 33(1): 9259−9266. Google Scholar
[15]	徐安林, 杜丹, 王海红, 等. 结合层次化搜索与视觉残差网络的光学舰船目标检测方法[J]. 光电工程, 2021, 48(4): 200249. Google Scholar Xu A L, Du D, Wang H H, et al. Optical ship target detection method combining hierarchical search and visual residual network[J]. Opto-Electron Eng, 2021, 48(4): 200249. Google Scholar
[16]	赵春梅, 陈忠碧, 张建林. 基于卷积网络的目标跟踪应用研究[J]. 光电工程, 2020, 47(1): 180668. Google Scholar Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668. Google Scholar
[17]	金瑶, 张锐, 尹东. 城市道路视频中小像素目标检测[J]. 光电工程, 2019, 46(9): 190053. Google Scholar Jin Y, Zhang R, Yin D. Object detection for small pixel in urban roads videos[J]. Opto-Electron Eng, 2019, 46(9): 190053. Google Scholar
[18]	Pang J M, Li C, Shi J P, et al. R²-CNN: fast tiny object detection in large-scale remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2019, 57(8): 5512−5524. doi: 10.1109/TGRS.2019.2899955 CrossRef Google Scholar
[19]	Zhang G J, Lu S J, Zhang W. CAD-Net: a context-aware detection network for objects in remote sensing imagery[J]. IEEE Trans Geosci Remote Sens, 2019, 57(12): 10015−10024. doi: 10.1109/TGRS.2019.2930982 CrossRef Google Scholar
[20]	Gong Y Q, Yu X H, Ding Y, et al. Effective fusion factor in FPN for tiny object detection[C]//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision, 2021: 1159–1167. Google Scholar
[21]	Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3974–3983. Google Scholar
[22]	Ding J, Xue N, Xia G S, et al. Object detection in aerial images: a large-scale benchmark and challenges[Z]. arXiv: 2102.12219, 2021. https://doi.org/10.48550/arXiv.2102.12219 Google Scholar
[23]	Han J W, Zhang D W, Cheng G, et al. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning[J]. IEEE Trans Geosci Remote Sens, 2015, 53(6): 3325−3337. doi: 10.1109/TGRS.2014.2374218 CrossRef Google Scholar
[24]	Long Y, Gong Y P, Xiao Z F, et al. Accurate object localization in remote sensing images based on convolutional neural networks[J]. IEEE Trans Geosci Remote Sens, 2017, 55(5): 2486−2498. doi: 10.1109/TGRS.2016.2645610 CrossRef Google Scholar
[25]	Hu F, Xia G S, Hu J W, et al. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery[J]. Remote Sens, 2015, 7(11): 14680−14707. doi: 10.3390/rs71114680 CrossRef Google Scholar
[26]	Ševo I, Avramović A. Convolutional neural network based automatic object detection on aerial images[J]. IEEE Geosci Remote Sens Lett, 2016, 13(5): 740−744. doi: 10.1109/LGRS.2016.2542358 CrossRef Google Scholar
[27]	Cheng G, Zhou P C, Han J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2016, 54(12): 7405−7415. doi: 10.1109/TGRS.2016.2601622 CrossRef Google Scholar
[28]	赵春梅, 陈忠碧, 张建林. 基于深度学习的飞机目标跟踪应用研究[J]. 光电工程, 2019, 46(9): 180261. Google Scholar Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electron Eng, 2019, 46(9): 180261. Google Scholar
[29]	Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248–255. Google Scholar
[30]	Xu Y C, Fu M T, Wang Q M, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Trans Pattern Anal Mach Intell, 2021, 43(4): 1452−1459. doi: 10.1109/TPAMI.2020.2974745 CrossRef Google Scholar
[31]	Yang X, Yang J R, Yan J C, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 8231–8240. Google Scholar
[32]	Azimi S M, Vig E, Bahmanyar R, et al. Towards multi-class object detection in unconstrained remote sensing imagery[C]//Proceedings of the 14th Asian Conference on Computer Vision, 2018: 150–165. Google Scholar
[33]	He Y H, Xu D Z, Wu L F, et al. LFFD: a light and fast face detector for edge devices[Z]. arXiv: 1904.10633, 2019. https://doi.org/10.48550/arXiv.1904.10633 Google Scholar
[34]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[Z]. arXiv: 1409.1556, 2014. https://doi.org/10.48550/arXiv.1409.1556 Google Scholar
[35]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778. Google Scholar
[36]	Zhu C C, He Y H, Savvides M. Feature selective anchor-free module for single-shot object detection[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 840–849. Google Scholar
[37]	Woo S, Park J, Lee J Y, et al. Cbam: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 3–19. Google Scholar

Overview

Overview

In recent years, with the continuous development of remote sensing optical technology, the acquisition of a large number of high-resolution remote sensing images has promoted the construction of environmental monitoring, animal protection, national defense and military. In numerous remote sensing image visual tasks, remote sensing aircraft detection is of great significance for civil and national defense. Research of the remote sensing small object detection technology is important. Currently, the object detection method based on deep learning has achieved excellent results in large and medium object testing tasks, but the performance and application of remote sensing small object detection are poor. The main reasons are the following: 1) the model is huge, and the real-time is poor; 2) remote sensing image is complicated and the object scale distribution is wide; 3) remote sensing small object detection dataset is extremely lacking.

To solve the above problems, this paper proposes a robust small object detection method based on multi-scale feature fusion using remote sensing images. The main work as follows. First, as the image will be sampled and convolved for many times after being input into common neural networks (such as ResNet and VGG-16), the features of small objects will be seriously lost and the final detection accuracy will be affected. To this end, according to the distribution of all object sizes in the dataset (i.e., prior knowledge), we propose a lightweight feature extraction module based on dynamic selection mechanism, which allows each neuron to adaptively allocate the receptive field size for detection and control the sampling times based on different scale of the objects. Second, although FPN is widely used to solve the problem of small object undetected, the information reflected by various scale features usually has different amounts and emphasis. Therefore, the FPN module based on adaptive feature weighted fusion is proposed, which uses the method of grouping convolution to group all feature channels without affecting each other, so as to further improve the accuracy of image feature expression. Third, for the issue of lack of remote sensing small object dataset, this paper built a remote sensing small object dataset of plane, and processed the plane and small-vehicle objects in DOTA-1.5 dataset to make its distribution of size meet the requirement of small object detection. Finally, experimental results on DOTA and self-built datasets show that our method posseses the best results compared with mainstream detection methods.