Citation: | Sun Y M, Sang X T, Zhang Y, et al. Hypergraph computed efficient transmission multi-scale feature small target detection algorithm[J]. Opto-Electron Eng, 2025, 52(5): 250061. doi: 10.12086/oee.2025.250061 |
[1] | Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587. https://doi.org/10.1109/CVPR.2014.81. |
[2] | He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2980–2988. https://doi.org/10.1109/ICCV.2017.322. |
[3] | Cai Z W, Vasconcelos N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6154–6162. https://doi.org/10.1109/CVPR.2018.00644. |
[4] | Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91. |
[5] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2. |
[6] | Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2999–3007. https://doi.org/10.1109/ICCV.2017.324. |
[7] | Ghiasi G, Lin T Y, Le Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7029–7038. https://doi.org/10.1109/CVPR.2019.00720. |
[8] | Jiang Y Q, Tan Z Y, Wang J Y, et al. GiraffeDet: a heavy-neck paradigm for object detection[C]//Proceedings of the 10th International Conference on Learning Representations, 2022. |
[9] | Ma M, Pang H. SP-YOLOv8s: an improved YOLOv8s model for remote sensing image tiny object detection[J]. applied sciences, 2023, 13(14): 8161 |
[10] | Wang K X, Liew J H, Zou Y T, et al. PANet: few-shot image semantic segmentation with prototype alignment[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 9196–9205. https://doi.org/10.1109/ICCV.2019.00929. |
[11] | Lin T Y, DollárP, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944. https://doi.org/10.1109/CVPR.2017.106. |
[12] | Tan M X, Pang R M, Le Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079. |
[13] | Liu S, Qi L, Qin H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8759–8768. https://doi.org/10.1109/CVPR.2018.00913. |
[14] | Yang G Y, Lei J, Zhu Z K, et al. AFPN: asymptotic feature pyramid network for object detection[C]//Proceedings of 2023 IEEE International Conference on Systems, Man, and Cybernetics, 2023: 2184–2189. https://doi.org/10.1109/SMC53992.2023.10394415. |
[15] | Xue Y J, Ju Z Y, Li Y M, et al. MAF-YOLO: multi-modal attention fusion based YOLO for pedestrian detection[J]. Infrared Phys Technol, 2021, 118: 103906. doi: 10.1016/j.infrared.2021.103906 |
[16] | Xu X Z, Jiang Y Q, Chen W H, et al. DAMO-YOLO: a report on real-time object detection design[Z]. arXiv: 2211.15444, 2023. https://doi.org/10.48550/arXiv.2211.15444. |
[17] | Wang C C, He W, Nie Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023: 2224. |
[18] | Gao Y, Zhang Z Z, Lin H J, et al. Hypergraph learning: methods and practices[J]. IEEE Trans Pattern Anal Mach Intell, 2020, 44(5): 2548−2566. doi: 10.1109/TPAMI.2020.3039374 |
[19] | Gao Y, Feng Y F, Ji S Y, et al. HGNN+: general hypergraph neural networks[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(3): 3181−3199. doi: 10.1109/TPAMI.2022.3182052 |
[20] | Liu Y Y, Yu Z Y, Zong D L, et al. Attention to task-aligned object detection for end-edge-cloud video surveillance[J]. IEEE Internet Things J, 2024, 11(8): 13781−13792. doi: 10.1109/JIOT.2023.3340151 |
[21] | Shen Q, Zhang L, Zhang Y X, et al. Distracted driving behavior detection algorithm based on lightweight StarDL-YOLO[J]. Electronics, 2024, 13(16): 3216. doi: 10.3390/electronics13163216 |
[22] | Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5MB model size[Z]. arXiv: 1602.07360, 2016. https://doi.org/10.48550/arXiv.1602.07360. |
[23] | Zhang X Y, Zhou X Y, Lin M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6848–6856. https://doi.org/10.1109/CVPR.2018.00716. |
[24] | Howard A G, Zhu M L, Chen B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[Z]. arXiv: 1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861. |
[25] | Tan M X, Le Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 6105–6114. |
[26] | Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165. |
[27] | 王舒梦, 徐慧英, 朱信忠, 等. 基于改进YOLOv8n航拍轻量化小目标检测算法: PECS-YOLO[J]. 计算机工程, 2024. doi: 10.19678/j.issn.1000-3428.0069353 Wang S M, Xu H Y, Zhu X Z, et al. Lightweight small object detection algorithm based on improved YOLOv8n aerial photography: PECS-YOLO[J]. Comput Eng, 2024. doi: 10.19678/j.issn.1000-3428.0069353 |
[28] | 张佳承, 韦锦, 陈义时. 改进YOLOv8的实时轻量化鲁棒绿篱检测算法[J]. 计算机工程, 2024. doi: 10.19678/j.issn.1000-3428.0069524 Zhang J C, Wei J, Chen Y S. Improved YOLOv8 real-time lightweight robust hedge detection algorithm[J]. Comput Eng, 2024. doi: 10.19678/j.issn.1000-3428.0069524 |
[29] | Gale T, Elsen E, Hooker S. The state of sparsity in deep neural networks[Z]. arxiv: 1902.09574, 2019. https://doi.org/10.48550/arXiv.1902.09574. |
[30] | Evci U, Gale T, Menick J, et al. Rigging the lottery: making all tickets winners[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 276. |
[31] | Lee J, Park S, Mo S, et al. Layer-adaptive sparsity for the magnitude-based pruning[C]//Proceedings of the 9th International Conference on Learning Representations, 2021. |
[32] | Feng Y F, Huang J G, Du S Y, et al. Hyper-YOLO: when visual object detection meets hypergraph computation[J]. IEEE Trans Pattern Anal Mach Intell, 2025, 47(4): 2388−2401. doi: 10.1109/TPAMI.2024.3524377 |
[33] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031 |
[34] | Ge Z, Liu S T, Wang F, et al. YOLOX: exceeding YOLO series in 2021[Z]. arXiv: 2107.08430, 2021. https://doi.org/10.48550/arXiv.2107.08430. |
[35] | Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721. |
[36] | Wang C Y, Yeh I H, Liao H Y M. YOLOv9: learning what you want to learn using programmable gradient information[C]//Proceedings of the 18th European Conference on Computer Vision, 2025: 1–21. https://doi.org/10.1007/978-3-031-72751-1_1. |
[37] | Wang A, Chen H, Liu L H, et al. YOLOv10: real-time end-to-end object detection[C]//Proceedings of the 38th Conference on Neural Information Processing Systems, 2024. |
[38] | Wang Z Y, Li C, Xu H Y, et al. Mamba YOLO: SSMs-based YOLO For object detection[Z]. arXiv: 2406.05835v1, 2024. https://doi.org/10.48550/arXiv.2406.05835. |
[39] | 孙佳宇, 徐民俊, 张俊鹏, 等. 优化改进YOLOv8无人机视角下目标检测算法[J]. 计算机工程与应用, 2025, 61(1): 109−120. doi: 10.3778/j.issn.1002-8331.2405-0030 Sun J Y, Xu M J, Zhang J P, et al. Optimized and improved YOLOv8 target detection algorithm from UAV perspective[J]. Comput Eng Appl, 2025, 61(1): 109−120. doi: 10.3778/j.issn.1002-8331.2405-0030 |
Aiming at the characteristics of UAV aerial images such as complex background, small target size and dense distribution due to high-angle shooting, as well as the common problems of insufficient accuracy and parameter redundancy in existing detection models, this paper proposes an efficient multi-scale feature transfer small target detection algorithm based on hypergraph computation. By systematically improving network architecture, feature fusion mechanism and model compression strategy, the algorithm achieves an effective balance between detection performance and computational efficiency. In terms of network architecture design, this study innovatively constructs a multi-scale feature pyramid network as a neck structure. Different from the traditional feature pyramid layer-by-layer transmission, this network transmits the features of the middle layer directly to the adjacent layers through the cross-layer feature aggregation mechanism, which significantly shortens the feature transmission path. Specifically, by integrating shallow high-resolution features and deep semantic features, the spatial information loss caused by long-distance transmission is effectively alleviated, so that the location information and texture features of small targets can be completely preserved. In the feature fusion stage, hypergraph is introduced to break through the limitation of binary relation of traditional graph neural networks. By connecting multiple feature nodes with hyperedge and establishing a high-order feature interaction model, the nonlinear correlation between the object and the complex background in UAV images can be accurately described. This hypergraph structure can not only capture the geometric correlation between objects but also model the potential relationship between the interference factors such as illumination change and occlusion and the object features. Secondly, a lightweight dynamic task-guided detection head is designed to effectively solve the problem of inaccurate detection targets caused by inconsistent classification and positioning task space in the traditional decoupling head with a small number of parameters by sharing mechanism. Finally, a layer adaptive pruning amplitude strategy is used to break through the limitation of the traditional global pruning threshold. By analyzing the weight distribution characteristics of each convolution layer, the calculation model of the pruning coefficient based on layer sensitivity is established. Experimental results show that the proposed algorithm performs better than other architectures on VisDrone2019 dataset, with an accuracy of 42.4% and many parameters of 4.8 M. Compared to the benchmark YOLOv8, the number of parameters has been reduced by 54.7%. This model achieves a good balance between detection performance and resource consumption.
Efficient transfer multi-scale feature small target detection algorithm based on hypergraph computing
HOAF module
Hyper computation module
Distance computation part of the execution process
Structure diagram of LDTG-Head and DTG module. (a) LDTG-Head; (b) DTG module
Comparison between ordinary convolution and group convolution. (a) Ordinary convolution; (b) Group convolution
Partial images of the VisDrone2019 dataset
Comparison of target detection effectiveness in complex scenes on VisDrone dataset. (a)(b) Night environment; (c)(d) High-altitude view; (e)(f) dense occlusion scenes