Hypergraph computed efficient transmission multi-scale feature small target detection algorithm

Sun Yemei; Sang Xueting; Zhang Yan; Liu Guorui; Chen Shuaiyu

doi:10.12086/oee.2025.250061

Article navigation > Opto-Electronic Engineering > 2025 Vol. 52 > No. 5 > 250061

Next Article Previous Article

Sun Y M, Sang X T, Zhang Y, et al. Hypergraph computed efficient transmission multi-scale feature small target detection algorithm[J]. Opto-Electron Eng, 2025, 52(5): 250061. doi: 10.12086/oee.2025.250061

Citation:

Sun Y M, Sang X T, Zhang Y, et al. Hypergraph computed efficient transmission multi-scale feature small target detection algorithm[J]. Opto-Electron Eng, 2025, 52(5): 250061. doi: 10.12086/oee.2025.250061

Hypergraph computed efficient transmission multi-scale feature small target detection algorithm

School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300380, China

Fund Project: National College Students Innovation and Entrepreneurship Training Program (202410792012), Tianjin Philosophy and Social Science Planning Project (TJGL19XSX-045)

More Information

^*Corresponding authors: lgr@tcu.edu.cn; chenshuaiyu0906@163.com
CSTR: 32245.14.oee.2025.250061

Received Date 28 February 2025

Revised Date 08 April 2025

Accepted Date 08 April 2025

Published Date 30 May 2025

Abstract

Abstract

UAV aerial images have the characteristics of complex background, small and dense targets. Aiming at the problems of low precision and a large number of model parameters in UAV aerial image detection, an efficient multi-scale feature transfer small target detection algorithm based on hypergraph computation is proposed. Firstly, a multi-scale feature pyramid network is designed as a neck network to effectively reduce the problem of information loss caused by lengthy transmission paths by fusing multi-layer features in the middle layer and transmitting them directly to adjacent layers. In addition, the feature fusion process uses hypergraphs to model higher-order features, improving the nonlinear representation ability of the model. Secondly, a lightweight dynamic task-guided detection head is designed to effectively solve the problem of inaccurate detection targets caused by inconsistent classification and positioning task space in the traditional decoupling head with a small number of parameters through sharing mechanism. Finally, the pruning lightweight model based on layer adaptive amplitude is used to further reduce the model volume. The experimental results show that this algorithm has better performance than other architectures on VisDrone2019 dataset, with the accuracy mAP_0.5 and parameter number reaching 42.4% and 4.8 M, respectively. Compared with the benchmark YOLOv8, the parameter number is reduced by 54.7%. The model achieves a good balance between detection performance and resource consumption.
- small target detection /
- hypergraph /
- decoupling head /
- light weight

FullText(HTML)

References

[1]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587. https://doi.org/10.1109/CVPR.2014.81. Google Scholar
[2]	He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2980–2988. https://doi.org/10.1109/ICCV.2017.322. Google Scholar
[3]	Cai Z W, Vasconcelos N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6154–6162. https://doi.org/10.1109/CVPR.2018.00644. Google Scholar
[4]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91. Google Scholar
[5]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. Google Scholar
[6]	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2999–3007. https://doi.org/10.1109/ICCV.2017.324. Google Scholar
[7]	Ghiasi G, Lin T Y, Le Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7029–7038. https://doi.org/10.1109/CVPR.2019.00720. Google Scholar
[8]	Jiang Y Q, Tan Z Y, Wang J Y, et al. GiraffeDet: a heavy-neck paradigm for object detection[C]//Proceedings of the 10th International Conference on Learning Representations, 2022. Google Scholar
[9]	Ma M, Pang H. SP-YOLOv8s: an improved YOLOv8s model for remote sensing image tiny object detection[J]. applied sciences, 2023, 13(14): 8161 Google Scholar
[10]	Wang K X, Liew J H, Zou Y T, et al. PANet: few-shot image semantic segmentation with prototype alignment[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 9196–9205. https://doi.org/10.1109/ICCV.2019.00929. Google Scholar
[11]	Lin T Y, DollárP, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944. https://doi.org/10.1109/CVPR.2017.106. Google Scholar
[12]	Tan M X, Pang R M, Le Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079. Google Scholar
[13]	Liu S, Qi L, Qin H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8759–8768. https://doi.org/10.1109/CVPR.2018.00913. Google Scholar
[14]	Yang G Y, Lei J, Zhu Z K, et al. AFPN: asymptotic feature pyramid network for object detection[C]//Proceedings of 2023 IEEE International Conference on Systems, Man, and Cybernetics, 2023: 2184–2189. https://doi.org/10.1109/SMC53992.2023.10394415. Google Scholar
[15]	Xue Y J, Ju Z Y, Li Y M, et al. MAF-YOLO: multi-modal attention fusion based YOLO for pedestrian detection[J]. Infrared Phys Technol, 2021, 118: 103906. doi: 10.1016/j.infrared.2021.103906 CrossRef Google Scholar
[16]	Xu X Z, Jiang Y Q, Chen W H, et al. DAMO-YOLO: a report on real-time object detection design[Z]. arXiv: 2211.15444, 2023. https://doi.org/10.48550/arXiv.2211.15444. Google Scholar
[17]	Wang C C, He W, Nie Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023: 2224. Google Scholar
[18]	Gao Y, Zhang Z Z, Lin H J, et al. Hypergraph learning: methods and practices[J]. IEEE Trans Pattern Anal Mach Intell, 2020, 44(5): 2548−2566. doi: 10.1109/TPAMI.2020.3039374 CrossRef Google Scholar
[19]	Gao Y, Feng Y F, Ji S Y, et al. HGNN⁺: general hypergraph neural networks[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(3): 3181−3199. doi: 10.1109/TPAMI.2022.3182052 CrossRef Google Scholar
[20]	Liu Y Y, Yu Z Y, Zong D L, et al. Attention to task-aligned object detection for end-edge-cloud video surveillance[J]. IEEE Internet Things J, 2024, 11(8): 13781−13792. doi: 10.1109/JIOT.2023.3340151 CrossRef Google Scholar
[21]	Shen Q, Zhang L, Zhang Y X, et al. Distracted driving behavior detection algorithm based on lightweight StarDL-YOLO[J]. Electronics, 2024, 13(16): 3216. doi: 10.3390/electronics13163216 CrossRef Google Scholar
[22]	Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5MB model size[Z]. arXiv: 1602.07360, 2016. https://doi.org/10.48550/arXiv.1602.07360. Google Scholar
[23]	Zhang X Y, Zhou X Y, Lin M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6848–6856. https://doi.org/10.1109/CVPR.2018.00716. Google Scholar
[24]	Howard A G, Zhu M L, Chen B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[Z]. arXiv: 1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861. Google Scholar
[25]	Tan M X, Le Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 6105–6114. Google Scholar
[26]	Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165. Google Scholar
[27]	王舒梦, 徐慧英, 朱信忠, 等. 基于改进YOLOv8n航拍轻量化小目标检测算法: PECS-YOLO[J]. 计算机工程, 2024. doi: 10.19678/j.issn.1000-3428.0069353 CrossRef Google Scholar Wang S M, Xu H Y, Zhu X Z, et al. Lightweight small object detection algorithm based on improved YOLOv8n aerial photography: PECS-YOLO[J]. Comput Eng, 2024. doi: 10.19678/j.issn.1000-3428.0069353 CrossRef Google Scholar
[28]	张佳承, 韦锦, 陈义时. 改进YOLOv8的实时轻量化鲁棒绿篱检测算法[J]. 计算机工程, 2024. doi: 10.19678/j.issn.1000-3428.0069524 CrossRef Google Scholar Zhang J C, Wei J, Chen Y S. Improved YOLOv8 real-time lightweight robust hedge detection algorithm[J]. Comput Eng, 2024. doi: 10.19678/j.issn.1000-3428.0069524 CrossRef Google Scholar
[29]	Gale T, Elsen E, Hooker S. The state of sparsity in deep neural networks[Z]. arxiv: 1902.09574, 2019. https://doi.org/10.48550/arXiv.1902.09574. Google Scholar
[30]	Evci U, Gale T, Menick J, et al. Rigging the lottery: making all tickets winners[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 276. Google Scholar
[31]	Lee J, Park S, Mo S, et al. Layer-adaptive sparsity for the magnitude-based pruning[C]//Proceedings of the 9th International Conference on Learning Representations, 2021. Google Scholar
[32]	Feng Y F, Huang J G, Du S Y, et al. Hyper-YOLO: when visual object detection meets hypergraph computation[J]. IEEE Trans Pattern Anal Mach Intell, 2025, 47(4): 2388−2401. doi: 10.1109/TPAMI.2024.3524377 CrossRef Google Scholar
[33]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031 CrossRef Google Scholar
[34]	Ge Z, Liu S T, Wang F, et al. YOLOX: exceeding YOLO series in 2021[Z]. arXiv: 2107.08430, 2021. https://doi.org/10.48550/arXiv.2107.08430. Google Scholar
[35]	Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721. Google Scholar
[36]	Wang C Y, Yeh I H, Liao H Y M. YOLOv9: learning what you want to learn using programmable gradient information[C]//Proceedings of the 18th European Conference on Computer Vision, 2025: 1–21. https://doi.org/10.1007/978-3-031-72751-1_1. Google Scholar
[37]	Wang A, Chen H, Liu L H, et al. YOLOv10: real-time end-to-end object detection[C]//Proceedings of the 38th Conference on Neural Information Processing Systems, 2024. Google Scholar
[38]	Wang Z Y, Li C, Xu H Y, et al. Mamba YOLO: SSMs-based YOLO For object detection[Z]. arXiv: 2406.05835v1, 2024. https://doi.org/10.48550/arXiv.2406.05835. Google Scholar
[39]	孙佳宇, 徐民俊, 张俊鹏, 等. 优化改进YOLOv8无人机视角下目标检测算法[J]. 计算机工程与应用, 2025, 61(1): 109−120. doi: 10.3778/j.issn.1002-8331.2405-0030 CrossRef Google Scholar Sun J Y, Xu M J, Zhang J P, et al. Optimized and improved YOLOv8 target detection algorithm from UAV perspective[J]. Comput Eng Appl, 2025, 61(1): 109−120. doi: 10.3778/j.issn.1002-8331.2405-0030 CrossRef Google Scholar

Overview

Overview

Aiming at the characteristics of UAV aerial images such as complex background, small target size and dense distribution due to high-angle shooting, as well as the common problems of insufficient accuracy and parameter redundancy in existing detection models, this paper proposes an efficient multi-scale feature transfer small target detection algorithm based on hypergraph computation. By systematically improving network architecture, feature fusion mechanism and model compression strategy, the algorithm achieves an effective balance between detection performance and computational efficiency. In terms of network architecture design, this study innovatively constructs a multi-scale feature pyramid network as a neck structure. Different from the traditional feature pyramid layer-by-layer transmission, this network transmits the features of the middle layer directly to the adjacent layers through the cross-layer feature aggregation mechanism, which significantly shortens the feature transmission path. Specifically, by integrating shallow high-resolution features and deep semantic features, the spatial information loss caused by long-distance transmission is effectively alleviated, so that the location information and texture features of small targets can be completely preserved. In the feature fusion stage, hypergraph is introduced to break through the limitation of binary relation of traditional graph neural networks. By connecting multiple feature nodes with hyperedge and establishing a high-order feature interaction model, the nonlinear correlation between the object and the complex background in UAV images can be accurately described. This hypergraph structure can not only capture the geometric correlation between objects but also model the potential relationship between the interference factors such as illumination change and occlusion and the object features. Secondly, a lightweight dynamic task-guided detection head is designed to effectively solve the problem of inaccurate detection targets caused by inconsistent classification and positioning task space in the traditional decoupling head with a small number of parameters by sharing mechanism. Finally, a layer adaptive pruning amplitude strategy is used to break through the limitation of the traditional global pruning threshold. By analyzing the weight distribution characteristics of each convolution layer, the calculation model of the pruning coefficient based on layer sensitivity is established. Experimental results show that the proposed algorithm performs better than other architectures on VisDrone2019 dataset, with an accuracy of 42.4% and many parameters of 4.8 M. Compared to the benchmark YOLOv8, the number of parameters has been reduced by 54.7%. This model achieves a good balance between detection performance and resource consumption.