Wu G, Ge Y, Chu J, et al. Cascade pooling self-attention research for remote sensing image retrieval[J]. Opto-Electron Eng, 2022, 49(12): 220029. doi: 10.12086/oee.2022.220029
Citation: Wu G, Ge Y, Chu J, et al. Cascade pooling self-attention research for remote sensing image retrieval[J]. Opto-Electron Eng, 2022, 49(12): 220029. doi: 10.12086/oee.2022.220029

Cascade pooling self-attention research for remote sensing image retrieval

    Fund Project: National Natural Science Foundation of China (42261070, 41801288, 41261091, 62162045), and Natural Science Foundation of Jiangxi Province (20202BAB212011)
More Information
  • In high-resolution remote sensing image retrieval, due to the complex image content and rich detailed information, it is difficult for the features extracted by a convolutional neural network to effectively express the salient information of the image. In response to this issue, a self-attention module based on cascade pooling is proposed to improve the feature representation of convolutional neural networks. Firstly, a cascade pooling self-attention module is designed, and the self-attention module can learn key salient features of images on the basis of establishing semantic dependencies. Cascade pooling uses max pooling based on a small region, and then adopts average pooling based on the max pooled feature map. The cascade pooling is exploited in the self-attention module, which can keep important details of the image while paying attention to the salient information of the image, thereby enhancing feature discrimination. After that, the cascade pooled self-attention module is embedded into the convolutional neural network for feature optimization and extraction. Finally, in order to further improve the retrieval efficiency, supervised hashing with kernels is applied to reduce the dimensionality of features, and then the obtained low-dimensional hash code is utilized for remote sensing image retrieval. The experimental results on the UC Merced, AID and NWPU-RESISC45 data sets show that the proposed method can improve the retrieval performance effectively.
  • 加载中
  • [1] Husain S S, Bober M. REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval[J]. IEEE Trans Image Process, 2019, 28(10): 5201−5213. doi: 10.1109/TIP.2019.2917234

    CrossRef Google Scholar

    [2] Ge Y, Jiang S L, Xu Q Y, et al. Exploiting representations from pre-trained convolutional neural networks for high-resolution remote sensing image retrieval[J]. Multimed Tools Appl, 2018, 77(13): 17489−17515. doi: 10.1007/s11042-017-5314-5

    CrossRef Google Scholar

    [3] 葛芸, 马琳, 储珺. 结合判别相关分析与特征融合的遥感图像检索[J]. 中国图象图形学报, 2020, 25(12): 2665−2676. doi: 10.11834/jig.200009

    CrossRef Google Scholar

    Ge Y, Ma L, Chu J. Remote sensing image retrieval combining discriminant correlation analysis and feature fusion[J]. J Image Graphics, 2020, 25(12): 2665−2676. doi: 10.11834/jig.200009

    CrossRef Google Scholar

    [4] Hou F, Liu B, Zhuo L, et al. Remote sensing image retrieval with deep features encoding of Inception V4 and largevis dimensionality reduction[J]. Sens Imaging, 2021, 22(1): 20. doi: 10.1007/s11220-021-00341-7

    CrossRef Google Scholar

    [5] 江曼, 张皓翔, 程德强, 等. 融合HSV与方向梯度特征的多尺度图像检索[J]. 光电工程, 2021, 48(11): 210310. doi: 10.12086/oee.2021.210310

    CrossRef Google Scholar

    Jiang M, Zhang H X, Cheng D Q, et al. Multi-scale image retrieval based on HSV and directional gradient features[J]. Opto-Electron Eng, 2021, 48(11): 210310. doi: 10.12086/oee.2021.210310

    CrossRef Google Scholar

    [6] Liu Y S, Chen C H, Han Z Z, et al. High-resolution remote sensing image retrieval based on classification-similarity networks and double fusion[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2020, 13: 1119−1133. doi: 10.1109/JSTARS.2020.2981372

    CrossRef Google Scholar

    [7] Zhang M D, Cheng Q M, Luo F, et al. A triplet nonlocal neural network with dual-anchor triplet loss for high-resolution remote sensing image retrieval[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2021, 14: 2711−2723. doi: 10.1109/JSTARS.2021.3058691

    CrossRef Google Scholar

    [8] Cheng Q M, Gan D Q, Fu P, et al. A novel ensemble architecture of residual attention-based deep metric learning for remote sensing image retrieval[J]. Remote Sens, 2021, 13(17): 3445. doi: 10.3390/rs13173445

    CrossRef Google Scholar

    [9] Zhuo Z, Zhou Z. Remote sensing image retrieval with Gabor-CA-ResNet and split-based deep feature transform network[J]. Remote Sens, 2021, 13(5): 869. doi: 10.3390/rs13050869

    CrossRef Google Scholar

    [10] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7132−7141. doi: 10.1109/CVPR.2018.00745.

    Google Scholar

    [11] Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 3–19. doi: 10.1007/978-3-030-01234-2_1.

    Google Scholar

    [12] Wang Q L, Wu B G, Zhu P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11531–11539. doi: 10.1109/CVPR42600.2020.01155.

    Google Scholar

    [13] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13708–13717. doi: 10.1109/CVPR46437.2021.01350.

    Google Scholar

    [14] Wang X L, Girshick R, Gupta A, et al. Non-local neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7794–7803. doi: 10.1109/CVPR.2018.00813.

    Google Scholar

    [15] Fu J, Liu J, Tian H J, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3141–3149. doi: 10.1109/CVPR.2019.00326.

    Google Scholar

    [16] Huang Z L, Wang X G, Huang L C, et al. CCNet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 603–612. doi: 10.1109/ICCV.2019.00069.

    Google Scholar

    [17] Wang Y M, Ji S P, Lu M, et al. Attention boosted bilinear pooling for remote sensing image retrieval[J]. Int J Remote Sens, 2020, 41(7): 2704−2724. doi: 10.1080/01431161.2019.1697010

    CrossRef Google Scholar

    [18] Wold S, Esbensen K, Geladi P. Principal component analysis[J]. Chemom Intell Lab Syst, 1987, 2(1–3): 37−52. doi: 10.1016/0169-7439(87)80084-9

    CrossRef Google Scholar

    [19] Yang W J, Wang L J, Cheng S L, et al. Deep hash with improved dual attention for image retrieval[J]. Information, 2021, 12(7): 285. doi: 10.3390/info12070285

    CrossRef Google Scholar

    [20] Liu W, Wang J, Ji R R, et al. Supervised hashing with kernels[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2012: 2074–2081. doi: 10.1109/CVPR.2012.6247912.

    Google Scholar

    [21] Ge Y, Tang Y L, Jiang S L, et al. Region-based cascade pooling of convolutional features for HRRS image retrieval[J]. Remote Sens Lett, 2018, 9(10): 1002−1010. doi: 10.1080/2150704X.2018.1504334

    CrossRef Google Scholar

    [22] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778. doi: 10.1109/CVPR.2016.90.

    Google Scholar

    [23] 孙月驰, 李冠. 基于卷积神经网络嵌套模型的人群异常行为检测[J]. 计算机应用与软件, 2019, 36(3): 196−201, 276. doi: 10.3969/j.issn.1000-386x.2019.03.036

    CrossRef Google Scholar

    Sun Y C, Li G. Abnormal behavior detection of crowds based on nested model of convolutional neural network[J]. Comput Appl Software, 2019, 36(3): 196−201, 276. doi: 10.3969/j.issn.1000-386x.2019.03.036

    CrossRef Google Scholar

    [24] Yang Y, Newsam S. Geographic image retrieval using local invariant features[J]. IEEE Trans Geosci Remote Sens, 2013, 51(2): 818−832. doi: 10.1109/TGRS.2012.2205158

    CrossRef Google Scholar

    [25] Xia G S, Hu J W, Hu F, et al. AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Trans Geosci Remote Sens, 2017, 55(7): 3965−3981. doi: 10.1109/TGRS.2017.2685945

    CrossRef Google Scholar

    [26] Cheng G, Han J W, Lu X Q. Remote sensing image scene classification: benchmark and state of the art[J]. Proc IEEE, 2017, 105(10): 1865−1883. doi: 10.1109/JPROC.2017.2675998

    CrossRef Google Scholar

    [27] Izenman A J. Linear discriminant analysis[M]//Izenman A J. Modern Multivariate Statistical Techniques. New York: Springer, 2013: 237–280. doi: 10.1007/978-0-387-78189-1_8.

    Google Scholar

    [28] Ye F M, Xiao H, Zhao X Q, et al. Remote sensing image retrieval using convolutional neural network features and weighted distance[J]. IEEE Geosci Remote Sens Lett, 2018, 15(10): 1535−1539. doi: 10.1109/LGRS.2018.2847303

    CrossRef Google Scholar

    [29] Ye F M, Dong M, Luo W, et al. A new re-ranking method based on convolutional neural network and two image-to-class distances for remote sensing image retrieval[J]. IEEE Access, 2019, 7: 141498−141507. doi: 10.1109/ACCESS.2019.2944253

    CrossRef Google Scholar

    [30] 叶发茂, 孟祥龙, 董萌, 等. 遥感图像蚁群算法和加权图像到类距离检索法[J]. 测绘学报, 2021, 50(5): 612−620. doi: 10.11947/j.AGCS.2021.20200357

    CrossRef Google Scholar

    Ye F M, Meng X L, Dong M, et al. Remote sensing image retrieval with ant colony optimization and a weighted image-to-class distance[J]. Acta Geod Cartogr Sin, 2021, 50(5): 612−620. doi: 10.11947/j.AGCS.2021.20200357

    CrossRef Google Scholar

    [31] Roy S, Sangineto E, Demir B, et al. Metric-learning-based deep hashing network for content-based retrieval of remote sensing images[J]. IEEE Geosci Remote Sens Lett, 2021, 18(2): 226−230. doi: 10.1109/LGRS.2020.2974629

    CrossRef Google Scholar

    [32] Song W W, Li S T, Benediktsson J A. Deep hashing learning for visual and semantic retrieval of remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2021, 59(11): 9661−9672. doi: 10.1109/TGRS.2020.3035676

    CrossRef Google Scholar

    [33] Liu C, Ma J J, Tang X, et al. Deep hash learning for remote sensing image retrieval[J]. IEEE Trans Geosci Remote Sens, 2021, 59(4): 3420−3443. doi: 10.1109/TGRS.2020.3007533

    CrossRef Google Scholar

    [34] Tang X, Jiao L C, Emery W J. SAR image content retrieval based on fuzzy similarity and relevance feedback[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2017, 10(5): 1824−1842. doi: 10.1109/JSTARS.2017.2664119

    CrossRef Google Scholar

    [35] Demir B, Bruzzone L. Hashing-based scalable remote sensing image search and retrieval in large archives[J]. IEEE Trans Geosci Remote Sens, 2016, 54(2): 892−904. doi: 10.1109/TGRS.2015.2469138

    CrossRef Google Scholar

    [36] Marmanis D, Datcu M, Esch T, et al. Deep learning earth observation classification using ImageNet pretrained networks[J]. IEEE Geosci Remote Sens Lett, 2016, 13(1): 105−109. doi: 10.1109/LGRS.2015.2499239

    CrossRef Google Scholar

    [37] Imbriaco R, Sebastian C, Bondarev E, et al. Aggregated deep local features for remote sensing image retrieval[J]. Remote Sens, 2019, 11(5): 493. doi: 10.3390/rs11050493

    CrossRef Google Scholar

    [38] Hou D Y, Miao Z L, Xing H Q, et al. Exploiting low dimensional features from the MobileNets for remote sensing image retrieval[J]. Earth Sci Inform, 2020, 13(4): 1437−1443. doi: 10.1007/s12145-020-00484-3

    CrossRef Google Scholar

    [39] Wang Y M, Ji S P, Zhang Y J. A learnable joint spatial and spectral transformation for high resolution remote sensing image retrieval[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2021, 14: 8100−8112. doi: 10.1109/JSTARS.2021.3103216

    CrossRef Google Scholar

    [40] Fan L L, Zhao H W, Zhao H Y. Distribution consistency loss for large-scale remote sensing image retrieval[J]. Remote Sens, 2020, 12(1): 175. doi: 10.3390/rs12010175

    CrossRef Google Scholar

  • With the development of remote sensing satellite technology and the expansion of the market in remote sensing images (RSIs), content-based remote sensing image retrieval (RSIR) plays an irreplaceable role in many fields, such as economic and social development, resource and environmental monitoring, and urban life management. However, there are complex content and rich background information in the high-resolution remote sensing images, whose features extracted by convolutional neural networks are difficult to effectively express the salient information of the RSIs. For this problem in high-resolution RSIR, a self-attention mechanism based on cascading pooling is proposed to enhance the feature expression of convolutional neural networks. Firstly, a cascade pooling self-attention module is designed. Cascade pooling uses max pooling based on a small region, and then adopts average pooling based on the max pooled feature map. Compared with traditional global pooling, cascade pooling combines the advantages of max pooling and average pooling, which not only pays attention to the salient information of the RSIs, but also retains crucial detailed information. The cascade pooling is employed in the self-attention module, which includes spatial self-attention and channel self-attention. The spatial self-attention combines self-attention and spatial attention based on location correlation, which enhances specific object regions of interest through spatial weights and weakens irrelevant background regions, to strengthen the ability of spatial feature description. The channel self-attention combines self-attention and content correlation-based channel attention, which assigns weights to different channels by linking contextual information. Each channel can be regarded as the response of one class of features, and more weights are assigned to the features with large contributions, thereby the ability to discriminate the salient features of the channel is enhanced. The cascade pooling self-attention module can learn crucial salient features of the RSIs based on the establishment of semantic dependencies. After that, the cascade pooled self-attention module is embedded into the convolutional neural networks to extract features and optimize features. Finally, in order to further increase the retrieval efficiency, supervised Hashing with kernels is applied to reduce the dimensionality of features, and then the obtained low-dimensional hash code is utilized in the RSIR. Experiments are conducted on the UC Merced, AID and NWPU-RESISC45 datasets, the mean average precisions reach 98.23%, 94.96% and 94.53% respectively. The results show that compared with the existing retrieval methods, the proposed method improves the retrieval accuracy effectively. Therefore, cascade pooling self-attention and supervised hashing with kernels optimize features from two aspects of network structure and feature compression respectively, which enhances the feature representation and improves retrieval performance.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(7)

Tables(6)

Article Metrics

Article views() PDF downloads() Cited by()

Access History

Other Articles By Authors

Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint