Fusion attention and feature enhancement of small targets detection in remote sensing images

Shen Xueli; Guo Xu

doi:10.12086/oee.2026.250267

Abstract

Abstract

Objective Remote sensing image target detection plays a crucial role in a wide range of practical applications, including resource exploration, urban planning and management, environmental monitoring, and disaster emergency response. With the rapid development of high-resolution sensors and large-scale data acquisition technologies, the demand for accurate and efficient object detection methods has increased significantly. However, targets in remote sensing images are typically characterized by small size, large scale variations, complex and cluttered backgrounds, blurred boundaries, and insufficient texture information, which significantly increases the difficulty of effective feature extraction and reliable detection. In addition, factors such as imaging altitude, atmospheric interference, and illumination variations further degrade image quality, resulting in low contrast between targets and the surrounding environment. Despite substantial progress in recent years, small target detection in remote sensing imagery still faces several critical challenges. From a localization perspective, commonly used metrics such as Intersection over Union are overly sensitive to slight variations in bounding boxes when applied to small targets, leading to unstable regression optimization and reduced localization accuracy. From a data perspective, the limited number of small target samples and the imbalance in category distribution often cause models to underrepresent small object features, resulting in missed detections. Moreover, in complex environments, small targets are often mixed with background regions that share similar textures or visual patterns, making it more difficult to accurately separate foreground from background and achieve precise localization.Therefore, improving the accuracy, robustness, and generalization capability of small target detection in remote sensing images, while enhancing training stability and localization precision under complex conditions, remains a critical and meaningful research problem.
Methods Addressing the above challenges, a remote sensing small target detection algorithm integrating attention mechanisms and feature enhancement is proposed. First, a multi-level feature collaboration strategy is constructed by combining channel attention and spatial attention with a gating mechanism. This design enables adaptive weighting of features across different levels, strengthens cross-scale information interaction, and effectively suppresses background interference. By dynamically emphasizing salient regions and reducing redundant responses, the model is able to capture subtle and discriminative features of small targets more reliably. Second, a multi-scale feature enhancement module is designed by integrating three-dimensional convolution with multi-scale feature encoding to form an enhanced neck structure. This module facilitates the fusion of semantic and spatial information across different scales, allowing the network to preserve global contextual information while retaining local details. The use of three-dimensional convolution further enhances the modeling of inter-scale dependencies, improving the representation of small targets under complex backgrounds and varying resolutions. Finally, SIoU and NWD are jointly employed as regression loss functions to optimize bounding box prediction. This combination improves geometric alignment between predicted and ground-truth boxes, reduces sensitivity to scale differences, and stabilizes the optimization process. As a result, the model achieves faster convergence and more accurate localization, especially for small and densely distributed targets.
Results and Discussions Extensive experiments are conducted on three widely used remote sensing datasets, including NWPU VHR-10, DIOR, and RSOD, to evaluate the effectiveness and generalization capability of the proposed method. The results show that, compared with the baseline model, the improved algorithm achieves consistent performance gains across all datasets, with mAP@0.5 increasing by 9.9%, 3.1%, and 3.2%, respectively. These improvements demonstrate that the proposed method effectively enhances feature representation and detection performance for small targets. Visualization results further indicate that the model can successfully identify targets that are easily missed by conventional approaches, particularly in scenarios with dense distributions and complex backgrounds. In addition, the proposed method shows improved robustness in distinguishing targets from visually similar background regions. Ablation studies confirm the effectiveness of each component, showing that the multi-level feature collaboration strategy, the multi-scale feature enhancement module, and the optimized loss function contribute synergistically to performance improvement. The results also suggest that the proposed framework maintains stable detection performance under different dataset distributions, indicating good generalization ability.
Conclusions In this study, an attention- and feature-enhancement-based small target detection algorithm for remote sensing images is proposed. By introducing a multi-level feature collaboration strategy and a multi-scale feature enhancement module, the method significantly improves cross-level feature interaction and strengthens the representation of small targets. Meanwhile, the joint optimization of SIoU and NWD loss functions further enhances localization accuracy and training efficiency. The proposed approach effectively addresses key challenges in remote sensing small target detection, including complex backgrounds, scale variations, and insufficient feature information. Experimental results on multiple benchmark datasets confirm the superiority, robustness, and generalization capability of the proposed method over the baseline model. These findings demonstrate its strong potential for practical applications in real-world remote sensing tasks, particularly in scenarios that require accurate detection of small and densely distributed targets. Furthermore, the proposed framework provides a reliable technical reference for future research and can serve as a foundation for further improvements in small target detection methods.