Abstract:
Objective Pedestrian detection faces challenges from small-scale targets, dense object distributions, and complex backgrounds, which frequently result in insufficient detection reliability, missed detections, and false positives. Under such conditions, variations in pedestrian scale and scene complexity place higher demands on feature perception and multi-scale modeling. Existing detection models still exhibit limited robustness under small-scale and complex scene conditions, particularly when computational efficiency and model complexity are constrained, making it difficult to achieve a balance between accuracy and efficiency. These limitations remain critical obstacles to reliable pedestrian detection in practical scenarios. The objective of this study is to improve detection reliability and robustness under small-scale and complex scene conditions while reducing computational cost and model size, thereby enhancing inference efficiency and facilitating deployment in real-world pedestrian detection applications.
Methods The proposed SRL-DETR is a lightweight pedestrian detection model developed based on the RT-DETR architecture. Firstly, a spatial-frequency feature fusion module was introduced for the backbone, integrating the SFSConv operator with a CSP structure. This design enhances the network’s perception of small-scale pedestrians by capturing both spatial and frequency features and facilitating more effective feature flow, enabling the model to extract richer and more discriminative information from low-resolution targets. Secondly, a refined feature scaling enhancement module was introduced to optimize multi-scale feature fusion in the neck network. SNI was introduced during upsampling to align high-level semantic features across scales, reducing information misalignment and preserving contextual consistency. GSConvE was introduced during downsampling to capture multi-scale textures and receptive fields, improving detection of dense pedestrians in complex scenes. These enhancements allow more accurate reconstruction and fusion of multi-scale features, strengthening the model’s robustness in challenging detection scenarios. Finally, LAMP was introduced to evaluate channel importance and remove less critical channels. This strategy effectively reduces model size and computation while preserving essential feature information, improving inference efficiency and making the model more suitable for deployment in resource-constrained environments. The proposed SRL-DETR model was evaluated on the CityPersons, TinyPerson, and WiderPerson datasets to validate both detection accuracy and generalization under varying pedestrian densities and scale distributions. The combination of enhanced small-target perception, refined multi-scale feature fusion, and adaptive pruning contributes to improved detection precision, robustness, and efficiency across different datasets.
Results and Discussions The proposed SRL-DETR was evaluated on multiple pedestrian detection datasets, achieving high detection accuracy while simultaneously reducing model size and computational cost. On the CityPersons dataset, SRL-DETR achieved an mAP50 of 71.6%, representing a 2.0% improvement over the baseline model, while reducing parameters and computation by 50.5% and 42.1%, respectively. These results demonstrate that the proposed model maintains strong detection performance even under substantially reduced computational complexity. Comparison experiments further show that SRL-DETR outperformed YOLO and other compared methods across multiple evaluation metrics, indicating more balanced overall detection performance. In particular, SRL-DETR exhibited more stable detection behavior, suggesting improved capability in distinguishing pedestrians from complex backgrounds and reduced performance degradation caused by scale variation and scene clutter. On the TinyPerson dataset, SRL-DETR improved mAP50 by 3.7% compared with the baseline model. This improvement is mainly attributed to enhanced sensitivity to small-scale pedestrians and more effective multi-scale feature interaction, which alleviated missed detections and false positives in dense pedestrian scenarios. The results indicate that SRL-DETR is more effective in handling large scale variations and dense pedestrian distributions, which are commonly encountered challenges in pedestrian detection tasks.Similarly, on the WiderPerson dataset, SRL-DETR achieved a 1.0% improvement in mAP50 over the baseline, indicating consistent robustness across varying pedestrian densities and complex scene configurations. The stable performance across datasets with different data distributions further confirms the generalization ability of the proposed model under diverse detection conditions.Furthermore, the introduction of adaptive channel pruning enabled a substantial reduction in parameters and computation while preserving detection accuracy, highlighting the effectiveness of the lightweight optimization strategy. Overall, the coordinated improvements in small-scale target perception, multi-scale feature interaction, and model compression jointly contributed to improved detection accuracy, stronger robustness in dense and complex scenes, and enhanced efficiency, providing a practical and effective solution for pedestrian detection in real-world environments.
Conclusions The proposed SRL-DETR model effectively balances detection accuracy, robustness, and computational efficiency for pedestrian detection under small-scale and complex scene conditions. Experimental results on the CityPersons, TinyPerson, and WiderPerson datasets demonstrate that SRL-DETR achieves consistent improvements in both accuracy and efficiency compared with the baseline and other compared methods, maintaining stable performance across varying pedestrian densities and scale distributions. The coordinated improvements enhance the model’s sensitivity to small-scale pedestrians, improve the handling of dense pedestrian distributions in complex scenes, and reduce model size and computation without compromising detection reliability. As a result, SRL-DETR maintains robust detection performance while satisfying efficiency requirements under practical constraints. In summary, SRL-DETR provides an effective pedestrian detection model that achieves a well-balanced trade-off between performance and computational efficiency, making it suitable for pedestrian detection tasks in small-scale and complex scene conditions.