Citation: | Xiao Z J, Lai S Y, Qu H C. Multi-level refined UAV image target detection[J]. Opto-Electron Eng, 2025, 52(5): 240287. doi: 10.12086/oee.2025.240287 |
[1] | 陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 |
[2] | Xiong X R, He M T, Li T Y, et al. Adaptive feature fusion and improved attention mechanism-based small object detection for UAV target tracking[J]. IEEE Internet Things J, 2024, 11(12): 21239−21249. doi: 10.1109/JIOT.2024.3367415 |
[3] | 马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 |
[4] | Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005: 886–893. https://doi.org/10.1109/CVPR.2005.177. |
[5] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031 |
[6] | He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2980–2988. https://doi.org/10.1109/ICCV.2017.322. |
[7] | Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91. |
[8] | Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 6517–6525. https://doi.org/10.1109/CVPR.2017.690. |
[9] | Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[Z]. arXiv: 2004.10934, 2020. https://doi.org/10.48550/arXiv.2004.10934. |
[10] | Ge Z, Liu S T, Wang F, et al. YOLOX: exceeding YOLO series in 2021[Z]. arXiv: 2107.08430, 2021. https://doi.org/10.48550/arXiv.2107.08430. |
[11] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//14th European Conference on Computer Vision, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. |
[12] | Misbah M, Orakazi F A, Tanveer L, et al. TF-BiFPN improves YOLOv5: enhancing small-scale multiclass drone detection in dark[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(2): 5354-5361. https://ieeexplore.ieee.org/document/10684481. |
[13] | Yu C, Shin Y. MCG-RTDETR: multi-convolution and context-guided network with cascaded group attention for object detection in unmanned aerial vehicle imagery[J]. Remote Sensing, 2024, 16(17): 3169 doi: 10.3390/rs16173169 |
[14] | Kang M, Ting C M, Ting F F, et al. BGF-YOLO: enhanced YOLOv8 with multiscale attentional feature fusion for brain tumor detection[C]//Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, 2024, 15008: 35–45. https://doi.org/10.1007/978-3-031-72111-3_4. |
[15] | 黄毅, 周纯, 刘欣军, 等. 基于YOLOv10的无人机复杂背景下多尺度检测模型[J/OL]. 光通信研究, 2024: 1–8[2024-11-26]. http://kns.cnki.net/kcms/detail/42.1266.TN.20240822.1307.002.html. Huang Y, Zhou C, Liu X J, et al. Multiscale detection model for complex backgrounds in UAV Images based on YOLOv10[J/OL]. Study Opt Commun, 2024: 1–8. [2024-11-26]. http://kns.cnki.net/kcms/detail/42.1266.TN.20240822.1307.002.html. |
[16] | 谌海云, 肖章勇, 郭勇, 等. 基于改进YOLOv8s的无人机航拍目标检测算法[J]. 电光与控制, 2024, 31(12): 55−63. doi: 10.3969/j.issn.1671-637X.2024.12.009 Chen H Y, Xiao Z Y, Guo Y, et al. A UAV aerial target detection algorithm based on improved YOLOv8s[J]. Electron Opt Control, 2024, 31(12): 55−63. doi: 10.3969/j.issn.1671-637X.2024.12.009 |
[17] | 赵青, 察豪, 牟伟琦, 等. 一种基于改进YOLOv7的无人机多目标光学检测方法[J]. 电讯技术, 2024, 64(8): 1213−1218. doi: 10.20079/j.issn.1001-893x.230704002 Zhao Q, Cha H, Mu W Q, et al. A multi-target optical detection method for UAV based on improved YOLOv7[J]. Telecommun Eng, 2024, 64(8): 1213−1218. doi: 10.20079/j.issn.1001-893x.230704002 |
[18] | 李姝, 李思远, 刘国庆. 基于YOLOv8无人机航拍图像的小目标检测算法研究[J]. 小型微型计算机系统, 2024, 45(9): 2165−2174. doi: 10.20009/j.cnki.21-1106/TP.2024-0329 Li S, Li S Y, Liu G Q. Research on small target detection algorithm based on YOLOv8 UAV aerial images[J]. J Chin Comput Syst, 2024, 45(9): 2165−2174. doi: 10.20009/j.cnki.21-1106/TP.2024-0329 |
[19] | Yaseen M. What is YOLOv8: an in-depth exploration of the internal features of the next-generation object detector[Z]. arXiv: 2408.15857, 2024. https://doi.org/10.48550/arXiv.2408.15857. |
[20] | 张润梅, 肖钰霏, 贾振楠, 等. 改进YOLOv7的无人机视角下复杂环境目标检测算法[J]. 光电工程, 2024, 51(5): 240051. doi: 10.12086/oee.2024.240051 Zhang R M, Xiao Y F, Jia Z N, et al. Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective[J]. Opto-Electron Eng, 2024, 51(5): 240051. doi: 10.12086/oee.2024.240051 |
[21] | 赵继达, 甄国涌, 储成群. 基于YOLOv8的无人机图像目标检测算法[J]. 计算机工程, 2024, 50(4): 113−120. doi: 10.19678/j.issn.1000-3428.0068268 Zhao J D, Zhen G Y, Chu C Q. Unmanned aerial vehicle image target detection algorithm based on YOLOv8[J]. Comput Eng, 2024, 50(4): 113−120. doi: 10.19678/j.issn.1000-3428.0068268 |
[22] | 孟鹏帅, 王峰, 翟伟光, 等. 基于YOLO-DSBE的无人机对地目标检测[J]. 航空兵器, 2025, 32(2): 94−103. doi: 10.12132/ISSN.1673-5048.2024.0064 Meng P S, Wang F, Zhai W G, et al. UAV-to-ground target detection based on YOLO-DSBE[J]. Aero Weaponry, 2025, 32(2): 94−103. doi: 10.12132/ISSN.1673-5048.2024.0064 |
[23] | Khanam R, Hussain M. YOLOv11: an overview of the key architectural enhancements[Z]. arXiv: 2410.17725, 2024. https://doi.org/10.48550/arXiv.2410.17725. |
[24] | Jegham N, Koh C Y, Abdelatti M, et al. Evaluating the evolution of YOLO (you only look once) models: a comprehensive benchmark study of YOLO11 and its predecessors[Z]. arXiv: 2411.00201, 2024. https://doi.org/10.48550/arXiv.2411.00201. |
[25] | Liu G L, Reda F A, Shih K J, et al. Image inpainting for irregular holes using partial convolutions[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 85–105. https://doi.org/10.1007/978-3-030-01252-6_6. |
[26] | Sun H, Wen Y, Feng H J, et al. Unsupervised bidirectional contrastive reconstruction and adaptive fine-grained channel attention networks for image Dehazing[J]. Neural Networks, 2024, 176: 106314. doi: 10.1016/j.neunet.2024.106314 |
[27] | Wang C Y, Mark Liao H Y, Wu Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020: 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203. |
[28] | Han K, Wang Y H, Tian Q, et al. GhostNet: More features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165. |
[29] | Zheng P, Hu Y F, Chen Y H. Application of enhancing YOLOv8 algorithm using FasterNet structure in human pose estimation[C]//Proceedings of the 9th International Conference on Computer and Communication Systems (ICCCS), 2024: 207–212. https://doi.org/10.1109/ICCCS61882.2024.10603101. |
[30] | Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7132–7141. https://doi.org/10.1109/CVPR.2018.00745. |
[31] | Wen G H, Li M, Luo Y H, et al. The improved YOLOv8 algorithm based on EMSPConv and SPE-head modules[J]. Multimed Tools Appl, 2024, 83(5): 2957−2972. doi: 10.1007/s11042-023-17957-4 |
[32] | Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 5987–5995. doi: 10.1109/CVPR.2017.634. |
[33] | Zheng Z H, Wang P, Liu W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999. |
[34] | Zhang H, Xu C, Zhang S J. Inner-IoU: more effective intersection over union loss with auxiliary bounding box[Z]. arXiv: 2311.02877, 2023. https://doi.org/10.48550/arXiv.2311.02877. |
[35] | Liu C, Wang K G, Li Q, et al. Powerful-IoU: more straightforward and faster bounding box regression loss with a nonmonotonic focusing mechanism[J]. Neural Networks, 2024, 170: 276−284. doi: 10.1016/j.neunet.2023.11.041 |
[36] | Du D W, Zhu P F, Wen L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop, 2019: 213–226. https://doi.org/10.1109/ICCVW.2019.00030. |
[37] | Cao Y R, He Z J, Wang L J, et al. VisDrone-DET2021: the vision meets drone object detection challenge results[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 2847–2854. https://doi.org/10.1109/ICCVW54120.2021.00319. |
[38] | Wang Y Y, Wang C, Zhang H, et al. Automatic ship detection based on RetinaNet using multi-resolution Gaofen-3 imagery[J]. Remote Sens, 2019, 11(5): 531. doi: 10.3390/rs11050531 |
[39] | Zhu X Z, Su W J, Lu L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[C]//9th International Conference on Learning Representations, 2021. |
[40] | 孙佳宇, 徐民俊, 张俊鹏, 等. 优化改进YOLOv8无人机视角下目标检测算法[J]. 计算机工程与应用, 2025, 61(1): 109−120. doi: 10.3778/j.issn.1002-8331.2405-0030 Sun J Y, Xu M J, Zhang J P, et al. Optimized and improved YOLOv8 target detection algorithm from UAV perspective[J]. Comput Eng Appl, 2025, 61(1): 109−120. doi: 10.3778/j.issn.1002-8331.2405-0030 |
[41] | Wang A, Chen H, Liu L H, et al. YOLOv10: real-time end-to-end object detection[C]//Proceedings of the 38th International Conference on Neural Information Processing Systems, 2024. |
In UAV (unmanned aerial vehicle) image target detection, challenges such as complex backgrounds, variable lighting conditions, target occlusion, and scale variations often lead to missed and false detections. To address these challenges and enhance both detection accuracy and robustness, we propose a multi-level fine-grained object detection algorithm for UAV images. This algorithm integrates several novel components and mechanisms aimed at improving feature representation across different scales, suppressing background noise, and accelerating model convergence, thus significantly optimizing detection performance. First, we introduce a cross-stage partial selective multi-scale feature fusion (CSP-SMSFF) module. This module combines multi-scale feature extraction and feature fusion enhancement strategies. By incrementally increasing convolution kernels and channel fusion, it accurately captures multi-scale target features. The CSP-SMSFF module effectively enhances the algorithm’s ability to handle targets of various sizes, which improves detection accuracy across different target scales, ensuring that small and large targets are both detected with high precision. Next, we incorporate the adaptive fine-grained channel attention (AFGCAttention) mechanism. This mechanism optimizes channel feature representations via dynamic tuning, improving the algorithm’s ability to perceive and discriminate important multi-scale sample features. It is particularly effective in handling complex backgrounds and occlusions, where the ability to retain fine-grained mapping information while suppressing background noise is crucial. This dynamic adjustment significantly reduces missed detections, especially for small and edge targets, thereby improving overall detection accuracy. Subsequently, we design the shared group convolution efficient (SGCE-Head) detection head, which utilizes efficient multi-scale convolution (EMSPConv) to capture both global important features and local fine details in the spatial-channel dimension. By improving the ability to locate and recognize multi-scale features, SGCE-Head addresses false detection issues and ensures that the algorithm distinguishes targets from backgrounds effectively, leading to more accurate and reliable results. Finally, we propose the Inner-Powerful-IoUv2 loss function. This function optimizes medium-quality anchor boxes and balances gradient strength for high and low IoU (intersection over union) samples. It enhances target localization accuracy and increases region attention, accelerating model convergence. The use of this loss function allows the model to achieve higher detection accuracy in a shorter amount of time, improving both its practical utility and efficiency. Experimental results on the VisDrone2019 and VisDrone2021 datasets demonstrate the effectiveness of our proposed method. Specifically, the algorithm achieves mAP@0.5 values of 47.5% and 45.3%, representing improvements of 5.7% and 4.7% over the baseline model. These results show that our algorithm not only outperforms existing state-of-the-art methods in terms of detection accuracy but also offers good versatility and real-time performance, making it suitable for a variety of application scenarios. In conclusion, the algorithm presented in this paper, by innovatively combining the CSP-SMSFF module, AFGCAttention mechanism, SGCE-Head detection head, and Inner-Powerful-IoUv2 loss function, provides a comprehensive solution to the core challenges of UAV image target detection. It significantly enhances detection accuracy, robustness, and real-time performance. This approach effectively addresses issues such as complex backgrounds, lighting variations, occlusion, and scale changes, while demonstrating strong versatility and achieving impressive results in practical applications.
Schematic illustration of typical challenges in UAV image detection. (a) Complex background; (b) Sudden change in illumination;(c) Target occlusion; (d) Inconsistent scales
Overall architecture diagram
Structure of CSP-SMSFF
Structure of the SMSFF module
Structure of AFGCAttention
Structure of SGCE-Head
Visualization comparison of ablation experiments
Comparison of evaluation metrics between YOLO11n and the improved model
Comparison of visualization effects on dataset VisDrone2019
Comparison of visualization effects on dataset VisDrone2021