• 摘要: 为了解决遥感图像中小目标特征不明显、背景复杂及目标密集分布导致的检测遗漏问题,提出一种融合注意力与特征增强的遥感图像小目标检测算法。首先,构造多层级特征协同策略,结合通道注意力和空间注意力机制并利用门控机制动态调整信息流动,有效捕捉不同层级特征间的潜在关联,加强模型在复杂场景中的鲁棒性。其次,设计一个多尺度特征增强模块,将三维卷积和多尺度特征编码结合形成一个增强型颈部网络,优化信息提取能力与小目标的特征表达。最后,采用NWD结合SIoU作为损失函数,在提升模型训练效率的同时进一步加强小目标的定位准确性。在NWPU VHR-10数据集、DIOR数据集和RSOD数据集上的实验结果表明,改进算法相较于原始模型,mAP@0.5数值分别提高了9.9%、3.1%和3.2%,可视化实验结果显示改进算法可以检测出原始模型漏检的小目标,与其他主流检测算法相比具有更优的检测精度。

       

      Abstract:
      Objective Remote sensing image target detection plays a crucial role in a wide range of practical applications, including resource exploration, urban planning and management, environmental monitoring, and disaster emergency response. With the rapid development of high-resolution sensors and large-scale data acquisition technologies, the demand for accurate and efficient object detection methods has increased significantly. However, targets in remote sensing images are typically characterized by small size, large scale variations, complex and cluttered backgrounds, blurred boundaries, and insufficient texture information, which significantly increases the difficulty of effective feature extraction and reliable detection. In addition, factors such as imaging altitude, atmospheric interference, and illumination variations further degrade image quality, resulting in low contrast between targets and the surrounding environment. Despite substantial progress in recent years, small target detection in remote sensing imagery still faces several critical challenges. From a localization perspective, commonly used metrics such as Intersection over Union are overly sensitive to slight variations in bounding boxes when applied to small targets, leading to unstable regression optimization and reduced localization accuracy. From a data perspective, the limited number of small target samples and the imbalance in category distribution often cause models to underrepresent small object features, resulting in missed detections. Moreover, in complex environments, small targets are often mixed with background regions that share similar textures or visual patterns, making it more difficult to accurately separate foreground from background and achieve precise localization.Therefore, improving the accuracy, robustness, and generalization capability of small target detection in remote sensing images, while enhancing training stability and localization precision under complex conditions, remains a critical and meaningful research problem.
      Methods Addressing the above challenges, a remote sensing small target detection algorithm integrating attention mechanisms and feature enhancement is proposed. First, a multi-level feature collaboration strategy is constructed by combining channel attention and spatial attention with a gating mechanism. This design enables adaptive weighting of features across different levels, strengthens cross-scale information interaction, and effectively suppresses background interference. By dynamically emphasizing salient regions and reducing redundant responses, the model is able to capture subtle and discriminative features of small targets more reliably. Second, a multi-scale feature enhancement module is designed by integrating three-dimensional convolution with multi-scale feature encoding to form an enhanced neck structure. This module facilitates the fusion of semantic and spatial information across different scales, allowing the network to preserve global contextual information while retaining local details. The use of three-dimensional convolution further enhances the modeling of inter-scale dependencies, improving the representation of small targets under complex backgrounds and varying resolutions. Finally, SIoU and NWD are jointly employed as regression loss functions to optimize bounding box prediction. This combination improves geometric alignment between predicted and ground-truth boxes, reduces sensitivity to scale differences, and stabilizes the optimization process. As a result, the model achieves faster convergence and more accurate localization, especially for small and densely distributed targets.
      Results and Discussions Extensive experiments are conducted on three widely used remote sensing datasets, including NWPU VHR-10, DIOR, and RSOD, to evaluate the effectiveness and generalization capability of the proposed method. The results show that, compared with the baseline model, the improved algorithm achieves consistent performance gains across all datasets, with mAP@0.5 increasing by 9.9%, 3.1%, and 3.2%, respectively. These improvements demonstrate that the proposed method effectively enhances feature representation and detection performance for small targets. Visualization results further indicate that the model can successfully identify targets that are easily missed by conventional approaches, particularly in scenarios with dense distributions and complex backgrounds. In addition, the proposed method shows improved robustness in distinguishing targets from visually similar background regions. Ablation studies confirm the effectiveness of each component, showing that the multi-level feature collaboration strategy, the multi-scale feature enhancement module, and the optimized loss function contribute synergistically to performance improvement. The results also suggest that the proposed framework maintains stable detection performance under different dataset distributions, indicating good generalization ability.
      Conclusions In this study, an attention- and feature-enhancement-based small target detection algorithm for remote sensing images is proposed. By introducing a multi-level feature collaboration strategy and a multi-scale feature enhancement module, the method significantly improves cross-level feature interaction and strengthens the representation of small targets. Meanwhile, the joint optimization of SIoU and NWD loss functions further enhances localization accuracy and training efficiency. The proposed approach effectively addresses key challenges in remote sensing small target detection, including complex backgrounds, scale variations, and insufficient feature information. Experimental results on multiple benchmark datasets confirm the superiority, robustness, and generalization capability of the proposed method over the baseline model. These findings demonstrate its strong potential for practical applications in real-world remote sensing tasks, particularly in scenarios that require accurate detection of small and densely distributed targets. Furthermore, the proposed framework provides a reliable technical reference for future research and can serve as a foundation for further improvements in small target detection methods.