融合多尺度特征与局部细节强化的低照度目标检测

林珊玲; 刘梅; 林坚普; 林志贤; 郭太良; 孙磊

doi:10.12086/oee.2026.250241

融合多尺度特征与局部细节强化的低照度目标检测

Low-light object detection with multi-scale feature fusion and local detail enhancement

摘要: 暗夜环境中曝光不足导致所得图像受噪声影响严重，同时存在图像细节、纹理、轮廓信息丢失等问题，使得后续目标检测算法对低照度图像的检测效果不理想。为此，本文基于YOLOv11n模型，提出一种融合多尺度特征与局部细节强化的低照度目标检测算法LMD-YOLO。首先，提出MPSPPF模块，借助多级混合并行池化与特征强化机制，有效抑制背景与随机噪声的干扰，同时缓解低光照场景下图像对噪声的高敏感度，进一步抑制细节信息的丢失。其次，设计CSP-EDHAN模块实现多粒度特征融合以及高效特征提取，以减少低照度图像中纹理、边缘和轮廓等低级特征的丢失。接着，本文提出动态细节语义融合金字塔网络，在双向金字塔路径中实现多尺度特征的高效对齐与增强，提升低照度场景下的目标检测精度。最后，设计双组检测头，利用分组卷积与共享特征提取干实现轻量化，解耦分支设计提升检测精度。实验结果表明：LMD-YOLO在精确度、召回率、mAP50和mAP50-95等关键指标上均优于原始YOLOv11n，在Exdark数据集上分别提升了1.3%、2.3%、3.1%和2.2%；此外，与当前主流算法及针对低照度目标检测的改进模型相比，LMD-YOLO在mAP50和mAP50-95指标上分别实现了3%和3.4%的平均提升，证明了其在低照度图像检测任务中的优越性能。

Abstract:

Objective Nighttime low-light imaging is often affected by underexposure, limited dynamic range, and amplified sensor noise. These factors weaken edge, texture, and contour cues, and they increase background interference and feature ambiguity. Such degradation reduces detector reliability and causes missed detections, false positives, and localization bias, with more severe impact on small objects, distant targets, and dim instances. Based on YOLOv11n, LMD-YOLO was proposed as a low-light object detection method that integrates multi-scale features and local detail enhancement. Without substantially increasing model complexity, it strengthens multi-scale feature fusion and local detail extraction for low-light images and reduces high noise sensitivity in low-illumination scenes, thereby improving low-light detection accuracy.

Methods LMD-YOLO combined multi-scale feature fusion and local detail enhancement through four components. 1) A Multi-Pool Spatial Pyramid Fast Pooling module (MPSPPF) was inserted into the backbone. It contained parallel pooling branches and a gradient-enhanced concatenation part. The parallel pooling used MaxPool2d and AvgPool2d. Max pooling strengthened salient responses and benefited key-object extraction in low-contrast images. Average pooling produced smoother representations and reduced noise sensitivity. The pooled outputs were fused with weighted aggregation to preserve global structure while retaining local detail cues. MPSPPF also used repeated sequential pooling. The same pooling operation was applied stage by stage to the previous pooled output to form a deeper pyramid. This design enlarged the effective receptive field with limited cost and built a richer feature hierarchy. It captured local and global context that supported recognition of blurred, partially occluded, or weakly illuminated objects. The gradient-enhanced concatenation improved information flow across pooling stages and reduced detail attenuation in early feature extraction. 2) A Cross Stage Partial–Enhanced Dual Hybrid Attention Network module (CSP-EDHAN) was constructed for fine-grained low-light features. Depthwise separable convolution reduced computation while maintaining spatial sensitivity. Residual connections stabilized feature propagation under severe noise. A multi-path fusion structure increased cross-stage information flow and strengthened weak-target cues. A dual hybrid attention mechanism combined channel emphasis and spatial selection. Channel emphasis highlighted informative channels under illumination degradation. Spatial selection focused responses on object regions and suppressed random-noise activations. This module improved feature contrast for dim targets and reduced false activations caused by background clutter. 3) A Dynamic Detail–Semantic Fusion Pyramid Network (DDS-FPN) was designed for neck fusion. A channel–spatial attention unit from the SDFM module guided feature selection during cross-scale aggregation. Shallow detail cues and deep semantic cues were fused with dynamic weights, which raised responses in target regions and lowered responses in noisy background regions. An adaptive downsampling module (ADown) aligned deep semantics with shallow details during scale transitions. It reduced feature misalignment, improved feature consistency across pyramid levels, and compensated for detail loss caused by low illumination. A bidirectional pyramid path strengthened bottom-up detail propagation and top-down semantic guidance. Multi-scale features were aligned and enhanced for small, medium, and large objects under low-light conditions. 4) A dual-group detection head was proposed to balance accuracy and efficiency. Input multi-scale feature maps first passed through a shared feature-extraction stem composed of two grouped convolution layers. Grouped convolution increased feature diversity at low cost and improved localization and recognition. The classification and regression branches shared the stem parameters to reduce redundant computation. Two independent output branches were then applied. Each branch used a 1×1 convolution layer to predict class probabilities and bounding-box distributions. The decoupled branch design reduced task interference and improved training stability for noisy low-light inputs.

Results and Discussions Comparative experiments were conducted on EXDark and PASCAL VOC 2012 with YOLOv11n as the baseline. On EXDark, LMD-YOLO improved precision, recall, mAP50, and mAP50-95 by 1.3%, 2.3%, 3.1%, and 2.2%. On PASCAL VOC 2012, the same metrics increased by 1.4%, 1.5%, 1.8%, and 1.3%. Additional evaluation on a complex low-light scene dataset showed gains of 2.2% in mAP50 and 1.3% in mAP50-95 over the baseline. The results indicated that MPSPPF reduced early noise sensitivity through multi-branch pooling and context aggregation. Weighted fusion preserved structure cues while limiting noise amplification. Sequential pooling improved long-range context modeling, which benefited targets with weak boundaries. CSP-EDHAN strengthened weak-target representation through cross-stage fusion and attention-guided enhancement. Depthwise separable convolution preserved efficiency and improved feature robustness. DDS-FPN improved multi-scale alignment through dynamic detail–semantic fusion and attention-based selection. ADown reduced information loss during downsampling and improved semantic–detail consistency across pyramid levels. The dual-group head maintained lightweight computation and improved optimization through decoupled branches. Performance gains on PASCAL VOC 2012 suggested improved general feature quality rather than low-light domain bias. The method also showed stable behavior under mixed illumination and cluttered backgrounds, which are common in real nighttime scenes.

Conclusions LMD-YOLO improves low-light object detection through noise-robust feature extraction, weak-target enhancement, and dynamic multi-scale fusion while preserving a lightweight design. The method achieves consistent accuracy gains on low-light benchmarks and remains effective under normal illumination. Future work targets broader illumination distributions, stronger robustness to extreme noise, glare, and motion blur, and improved generalization in natural scenes.

融合多尺度特征与局部细节强化的低照度目标检测

Low-light object detection with multi-scale feature fusion and local detail enhancement

相关链接

目录

融合多尺度特征与局部细节强化的低照度目标检测

Low-light object detection with multi-scale feature fusion and local detail enhancement

相关链接

目录

微信二维码