Abstract:
Objective Accurate railway track detection is a critical component of intelligent railway safety systems, including obstacle intrusion warning, autonomous train navigation, and infrastructure monitoring. Reliable detection improves operational safety and enables real-time decision-making in complex railway environments. However, railway track detection remains challenging due to various environmental and structural factors. Railway tracks are often affected by strong background interference caused by ballast, vegetation, shadows, and surrounding infrastructure. These background elements share similar visual characteristics with track structures, which reduces feature discriminability. In addition, railway tracks exhibit significant scale variation across image regions, ranging from large, clearly visible structures in near-view areas to thin and indistinct structures in distant regions. This variation requires the model to maintain strong multi-scale feature representation capability. Furthermore, railway scenes frequently contain complex geometric configurations, such as curved tracks, intersections, and switches, which require the model to capture long-range spatial dependencies and preserve structural continuity. Existing deep learning-based methods, particularly those derived from lane detection frameworks, improve detection performance by leveraging convolutional neural networks and structured prediction strategies. However, these methods often suffer from insufficient receptive fields, limited multi-scale context modeling capability, and weak suppression of background interference. These limitations reduce detection accuracy and robustness in challenging railway scenarios. Therefore, developing an efficient and robust railway track detection method capable of enhancing multi-scale feature representation and suppressing background interference is essential for improving intelligent railway safety systems.
Methods A railway track detection network named dilated and masked attention network (DMA-Net) was developed to address these challenges. The proposed method was built upon a row-based anchor framework, which efficiently localizes horizontal track positions and preserves structural consistency. Several key modules were introduced to improve feature representation and discrimination capability.A dilated multi-scale fusion module (DMSFM) was designed to enhance multi-scale feature extraction. This module employed parallel dilated convolutions with different dilation rates to capture contextual information at multiple spatial scales. Dilated convolutions expanded the receptive field without reducing spatial resolution or increasing computational complexity. The parallel structure enabled effective aggregation of both local structural features and global contextual information, improving representation of curved, distant, and intersecting tracks.An adaptive multi-head masked attention (AMMA) mechanism was introduced to suppress background interference and enhance track-related feature representation. This module applied a learnable mask to the attention map, which selectively suppressed irrelevant background regions and emphasized important track areas. The multi-head attention structure enabled the model to learn diverse feature relationships and improve discrimination capability in complex environments.In addition, the SimAM attention module and DropBlock regularization were incorporated to further improve feature quality and model robustness. SimAM enhanced feature discriminability by evaluating neuron importance without introducing additional parameters. DropBlock improved generalization capability by randomly suppressing contiguous feature regions during training. These modules improved robustness while maintaining a lightweight network structure.
Results and Discussions Experiments were conducted on the public Rail-DB benchmark dataset to evaluate the effectiveness of DMA-Net. The proposed method achieved a detection accuracy of 94.53%, which exceeded the original baseline rail detection method by 1.73%. Compared with other mainstream railway track detection methods, DMA-Net improved accuracy by an average of 8.07%. These results demonstrate the effectiveness of the proposed multi-scale fusion and masked attention mechanisms.The DMSFM module significantly improved the model’s ability to capture track features at different spatial scales. The enlarged receptive field enabled better representation of long-range contextual information, which improved detection performance in distant and curved track regions. The AMMA mechanism effectively suppressed background interference and enhanced attention to track-related features, which improved discrimination capability in complex environments containing ballast, vegetation, and structural interference.Visual analysis further demonstrated that DMA-Net produced smoother and more continuous detection results compared with existing methods. Performance improvements were particularly evident in challenging scenarios such as nighttime conditions, occlusion, and complex track intersections. These results confirm that the proposed modules effectively improve feature representation and detection robustness. DMA-Net maintained a lightweight design while achieving high detection accuracy. The proposed network achieved a processing speed of 268 f/s, which satisfies real-time processing requirements. The efficient architecture and lightweight attention mechanisms enable practical deployment in real-world railway safety systems.
Conclusions DMA-Net provides an effective and robust solution for railway track detection in complex environments. The DMSFM module enhances multi-scale feature representation and enlarges the receptive field, improving the model’s ability to capture track geometry across different spatial scales. The AMMA mechanism improves discrimination capability by suppressing background interference and emphasizing track-related features. The integration of SimAM attention and DropBlock regularization further improves robustness while maintaining computational efficiency. Experimental results demonstrate that DMA-Net achieves superior detection accuracy, robustness, and real-time performance compared with existing methods. The proposed network provides a reliable technical solution for intelligent railway monitoring, autonomous train navigation, and railway safety applications.