• 摘要: 小目标检测一直是计算机视觉处理中一项极具挑战性的任务,针对传统无人机航拍图像小目标多且密集,而传统检测算法存在漏检、误检的问题,提出一种基于频域感知特征融合的FFC-YOLO (FASFF-FreqFusion-CAA YOLO)目标检测算法。首先,通过增加P2检测层,重新设计了针对小目标检测的Detect-FASFF (four adaptively spatial feature fusion)检测头结构。然后,将传统的特征融合采样方法替换为频率感知特征融合FreqFusion (frequency-aware feature fusion)方法并结合BiFPN (bidirectional feature pyramid network)结构提升了算法处理密集小目标图像的能力。最后,在C2PSA模块中加入CAA (context anchor attention)注意力机制,增强了目标上下文特征信息的关联。在VisDrone2019数据集中,基于FFC-YOLO目标检测算法的mAP@0.5为40%,较传统算法如Fast-RCNN、RetinaNet分别提升了28.4%、18.6%,较YOLO算法如YOLO v8n、v10n、v11n分别提升了8.0%、7.6%、7.8%。在自建小目标数据集tiny-data中,基于FFC-YOLO的3类目标sperson、lperson和wperson的检测结果较YOLOv11n,mAP@0.5、PR分别提升了9.2%、8.7%、5.9%。实验结果表明,FFC-YOLO小目标检测算法能够更好地应用于无人机航拍图像检测领域。

       

      Abstract:
      Objective Small object detection remains a significant and persistent challenge within the computer vision domain. This challenge is particularly acute in the context of drone aerial imagery, where objects captured from high altitudes appear extremely small, densely packed, and often exhibit blurred features due to low resolution and environmental factors. Traditional object detection algorithms frequently suffer from performance degradation in these scenarios, manifesting as high rates of missed detections and false positives. To address these critical limitations, this paper proposes a novel object detection algorithm named FFC-YOLO (FASFF-FreqFusion-CAA YOLO), which is built upon a frequency-aware feature fusion framework.
      Methods The core of the FFC-YOLO algorithm lies in three systematic and synergistic innovations designed to enhance feature representation, fusion, and contextual understanding specifically for small and dense objects. First, to bolster multi-scale feature extraction, we redesign the detection head by incorporating a Four Adaptively Spatial Feature Fusion (Detect-FASFF) structure. This is augmented with an additional P2 detection layer dedicated to capturing finer-grained spatial information crucial for identifying minuscule targets. The FASFF mechanism leverages four adaptive branches to dynamically aggregate features across different scales, thereby significantly improving the model's sensitivity and discriminative power for small objects across varying sizes.Second, we comprehensively optimize the feature fusion pathway to counteract the information loss common in conventional sampling methods (e.g., up-sampling and down-sampling). We introduce a Frequency-aware Feature Fusion (FreqFusion) module. This module operates by decomposing features into different frequency components within the frequency domain, allowing for more deliberate and effective integration of high-frequency details (essential for edge and texture information of small objects) and low-frequency semantics. This FreqFusion module is then elegantly coupled with a Bidirectional Feature Pyramid Network (BiFPN) structure. The resulting fusion architecture enables efficient, multi-level bidirectional cross-scale information flow, effectively mitigating the problems of feature obliteration, category misclassification, and bounding box localization drift in scenes crowded with small objects.
      Third, to enhance the model's ability to leverage surrounding semantic context—a vital clue when object appearance is ambiguous—we integrate a Context Anchor Attention (CAA) mechanism into the C2PSA module. This attention mechanism functions by establishing dynamic associations between anchor points and their contextual surroundings, enabling the feature extraction process to selectively focus on and amplify informative regions around potential targets. This contextual awareness strengthens feature representation and aids in distinguishing small objects from complex backgrounds.
      Results and Discussions The efficacy of the proposed FFC-YOLO algorithm is rigorously validated through extensive experiments on two benchmark datasets. On the public VisDrone2019 dataset, a standard benchmark for drone vision, FFC-YOLO achieves a mean Average Precision (mAP@0.5) of 40.0%. This represents a substantial performance gain of 8.0%, 7.6%, and 7.8% over the widely-recognized YOLO v8n, YOLOv10n, and YOLOv11n baselines, respectively. To further verify its generalization capability and prowess in detecting extremely small targets, we conducted additional experiments on a custom-built, challenging dataset named `tiny-data`, which contains three categories of progressively smaller persons (`sperson`, `lperson`, `wperson`). On this dataset, FFC-YOLO demonstrates superior results compared to the strong YOLOv11n baseline, with improvements of 9.2% in mAP@0.5, 8.7% in Precision (P), and 5.9% in Recall (R). These consistent gains across both public and private datasets robustly confirm that the FFC-YOLO algorithm possesses strong generalization ability and excels in the demanding task of drone-based aerial image analysis, particularly for long-range and dense small object detection.
      Conclusions In conclusion, this paper presents a comprehensive and effective solution to the small object detection problem in aerial imagery. By innovating at the levels of feature extraction (FASFF head), feature fusion (FreqFusion-BiFPN), and contextual modeling (CAA attention), the FFC-YOLO framework successfully addresses key shortcomings of prior methods. The experimental evidence underscores its potential for practical application in UAV-based surveillance, traffic monitoring, and search-and-rescue operations. For future work, we plan to explore model compression techniques such as pruning and knowledge distillation to streamline the model for efficient deployment on resource-constrained edge devices, thereby bridging the gap between high accuracy and real-time performance.