SRL-DETR：多场景轻量化行人检测算法

张蕊; 李怡欣; 高张; 张上; 朱帅

doi:10.12086/oee.2026.250363

SRL-DETR：多场景轻量化行人检测算法

SRL-DETR: lightweight pedestrian detection algorithm for multiple scenarios

摘要: 针对小尺度和复杂场景下行人检测存在的精度不足、漏检和误检问题，以及推理效率和模型轻量化需求，提出一种改进RT-DETR的轻量化行人检测算法—SRL-DETR。首先，提出空间频率特征融合模块，以增强主干网络在空间与频率上的特征表达能力，提高对小目标的感知能力。其次，设计特征缩放精细增强模块，优化多尺度特征融合过程，从而提升模型对密集目标和复杂场景的检测性能。最后，引入基于通道重要性评估的自适应剪枝，在保留关键特征的前提下降低冗余计算。实验结果表明，所提算法在CityPersons数据集上mAP50提升至71.6%，相较于基准模型提升2.0%，参数量和计算量分别下降50.5%和42.1%；并进一步在TinyPerson与WiderPerson数据集上进行泛化性能验证，mAP50分别提升3.7%和1.0%，展现出较强的跨场景适应性。综上所述，所提算法在显著提升行人检测精度的同时，有效保持了模型轻量化，适用于多种复杂场景。

Abstract:

Objective Pedestrian detection faces challenges from small-scale targets, dense object distributions, and complex backgrounds, which frequently result in insufficient detection reliability, missed detections, and false positives. Under such conditions, variations in pedestrian scale and scene complexity place higher demands on feature perception and multi-scale modeling. Existing detection models still exhibit limited robustness under small-scale and complex scene conditions, particularly when computational efficiency and model complexity are constrained, making it difficult to achieve a balance between accuracy and efficiency. These limitations remain critical obstacles to reliable pedestrian detection in practical scenarios. The objective of this study is to improve detection reliability and robustness under small-scale and complex scene conditions while reducing computational cost and model size, thereby enhancing inference efficiency and facilitating deployment in real-world pedestrian detection applications.

Methods The proposed SRL-DETR is a lightweight pedestrian detection model developed based on the RT-DETR architecture. Firstly, a spatial-frequency feature fusion module was introduced for the backbone, integrating the SFSConv operator with a CSP structure. This design enhances the network’s perception of small-scale pedestrians by capturing both spatial and frequency features and facilitating more effective feature flow, enabling the model to extract richer and more discriminative information from low-resolution targets. Secondly, a refined feature scaling enhancement module was introduced to optimize multi-scale feature fusion in the neck network. SNI was introduced during upsampling to align high-level semantic features across scales, reducing information misalignment and preserving contextual consistency. GSConvE was introduced during downsampling to capture multi-scale textures and receptive fields, improving detection of dense pedestrians in complex scenes. These enhancements allow more accurate reconstruction and fusion of multi-scale features, strengthening the model’s robustness in challenging detection scenarios. Finally, LAMP was introduced to evaluate channel importance and remove less critical channels. This strategy effectively reduces model size and computation while preserving essential feature information, improving inference efficiency and making the model more suitable for deployment in resource-constrained environments. The proposed SRL-DETR model was evaluated on the CityPersons, TinyPerson, and WiderPerson datasets to validate both detection accuracy and generalization under varying pedestrian densities and scale distributions. The combination of enhanced small-target perception, refined multi-scale feature fusion, and adaptive pruning contributes to improved detection precision, robustness, and efficiency across different datasets.

Results and Discussions The proposed SRL-DETR was evaluated on multiple pedestrian detection datasets, achieving high detection accuracy while simultaneously reducing model size and computational cost. On the CityPersons dataset, SRL-DETR achieved an mAP50 of 71.6%, representing a 2.0% improvement over the baseline model, while reducing parameters and computation by 50.5% and 42.1%, respectively. These results demonstrate that the proposed model maintains strong detection performance even under substantially reduced computational complexity. Comparison experiments further show that SRL-DETR outperformed YOLO and other compared methods across multiple evaluation metrics, indicating more balanced overall detection performance. In particular, SRL-DETR exhibited more stable detection behavior, suggesting improved capability in distinguishing pedestrians from complex backgrounds and reduced performance degradation caused by scale variation and scene clutter. On the TinyPerson dataset, SRL-DETR improved mAP50 by 3.7% compared with the baseline model. This improvement is mainly attributed to enhanced sensitivity to small-scale pedestrians and more effective multi-scale feature interaction, which alleviated missed detections and false positives in dense pedestrian scenarios. The results indicate that SRL-DETR is more effective in handling large scale variations and dense pedestrian distributions, which are commonly encountered challenges in pedestrian detection tasks.Similarly, on the WiderPerson dataset, SRL-DETR achieved a 1.0% improvement in mAP50 over the baseline, indicating consistent robustness across varying pedestrian densities and complex scene configurations. The stable performance across datasets with different data distributions further confirms the generalization ability of the proposed model under diverse detection conditions.Furthermore, the introduction of adaptive channel pruning enabled a substantial reduction in parameters and computation while preserving detection accuracy, highlighting the effectiveness of the lightweight optimization strategy. Overall, the coordinated improvements in small-scale target perception, multi-scale feature interaction, and model compression jointly contributed to improved detection accuracy, stronger robustness in dense and complex scenes, and enhanced efficiency, providing a practical and effective solution for pedestrian detection in real-world environments.

Conclusions The proposed SRL-DETR model effectively balances detection accuracy, robustness, and computational efficiency for pedestrian detection under small-scale and complex scene conditions. Experimental results on the CityPersons, TinyPerson, and WiderPerson datasets demonstrate that SRL-DETR achieves consistent improvements in both accuracy and efficiency compared with the baseline and other compared methods, maintaining stable performance across varying pedestrian densities and scale distributions. The coordinated improvements enhance the model’s sensitivity to small-scale pedestrians, improve the handling of dense pedestrian distributions in complex scenes, and reduce model size and computation without compromising detection reliability. As a result, SRL-DETR maintains robust detection performance while satisfying efficiency requirements under practical constraints. In summary, SRL-DETR provides an effective pedestrian detection model that achieves a well-balanced trade-off between performance and computational efficiency, making it suitable for pedestrian detection tasks in small-scale and complex scene conditions.

SRL-DETR：多场景轻量化行人检测算法

SRL-DETR: lightweight pedestrian detection algorithm for multiple scenarios

相关链接

目录

SRL-DETR：多场景轻量化行人检测算法

SRL-DETR: lightweight pedestrian detection algorithm for multiple scenarios

相关链接

目录

微信二维码