Abstract:
Objective Steel surface defects such as cracks, scratches, rolled-in scale, pitting, patches, and inclusions are common in industrial manufacturing and may cause severe degradation in corrosion resistance, wear resistance, and mechanical reliability, which makes accurate and efficient surface defect inspection essential for quality control. Although deep learning based detectors have achieved promising results, reliable detection of tiny defects remains difficult because these defects often present small size, low contrast, blurred boundaries, and large scale variations, while industrial backgrounds contain complex textures and noise that easily interfere with feature extraction. Two-stage detectors usually require heavy computation and fail to meet real-time requirements, and many one-stage detectors still suffer from high miss rates on small defects and limited cross-scale adaptability. RT-DETR introduces an end-to-end real-time detection framework by eliminating non-maximum suppression and leveraging Transformer-based global modeling, which improves dense target prediction, yet its parameter redundancy and insufficient cross-resolution feature alignment still restrict its efficiency and robustness in industrial scenarios.
Methods To address these limitations, this study proposes RMD-DETR, an improved RT-DETR based detection algorithm that enhances detection accuracy, cross-scale stability, and inference efficiency through three coordinated modifications. First, a lightweight FasterNet backbone is introduced to replace the original feature extractor, and the partial convolution in FasterNetBlock is further optimized with a reparameterization strategy to strengthen feature representation under limited computation. During training, multi-branch convolutional paths are used to enrich spatial diversity and improve learning capacity, while during inference these branches are equivalently fused into a single convolution operation, which simplifies the computational graph and accelerates deployment without sacrificing accuracy. Second, a dynamic multi-scale feature fusion network (DMSFPN) is designed as the neck to alleviate semantic inconsistency and misalignment between high-level semantic features and low-level detailed features, which are critical for localizing micro-defects. DMSFPN integrates a multi-scale convolution block (MSCB) to capture context information at different receptive fields, where heterogeneous kernel branches focus on tiny, medium, and large defect patterns, and a channel shuffle mechanism is employed to enhance cross-channel interaction and reduce feature redundancy. To improve detail recovery in cross-scale aggregation, an efficient upsampling convolution block (EUCB) is introduced, which combines lightweight convolution operations with deformable convolution based adaptive sampling, thereby mitigating the feature blurring problem caused by standard interpolation and preserving spatial continuity for small targets. A bidirectional feature fusion strategy is adopted to enable both top-down semantic propagation and bottom-up detail enhancement, improving cross-resolution consistency while avoiding redundant repeated computation. Third, to enhance feature interaction under cluttered backgrounds, a token statistics self-attention (TSSA) mechanism based on statistical feature compression and variational rate reduction is incorporated into the adaptive feature interaction module, forming an AIFI-TSSA enhancement block. Unlike conventional self-attention that computes pairwise token similarity with quadratic complexity and high memory cost, TSSA performs global context modeling with linear complexity through token statistics driven projection and denoising, which reduces computation overhead and improves robustness against redundant textures, occlusion, and noise. The proposed RMD-DETR is evaluated on three industrial datasets, including NEU-DET, the Tianchi Aluminum profile surface defect dataset, and GC10-DET, and comprehensive experiments are conducted to verify the effectiveness of each proposed component, including ablation studies, comparative experiments with representative detectors, cross-scale evaluation by resizing inputs to a smaller scale, and generalization tests across different industrial scenes.
Results and Discussions Experimental results on NEU-DET demonstrate that RMD-DETR achieves 76.7% mAP@0.5, outperforming the baseline RT-DETR by 2.6 percentage points, while precision and recall reach 77.8% and 72.6%, indicating improved localization reliability and reduced missed detections. The model simultaneously reduces parameters and computational cost by 39.9% and 39.3%, and the inference speed increases to 78.9 frames per second, satisfying the real-time inspection requirement of industrial production lines. In cross-scale detection experiments, the missed detection rate decreases from 55.4% to 50.9%, showing stronger adaptability under significant scale variation. Generalization experiments further confirm the transferability of the proposed design, where mAP@0.5 improves by 2% on the Tianchi dataset and by 3% on GC10-DET compared with the baseline model. Visualization analysis under Gaussian noise demonstrates that the proposed method maintains higher confidence and more complete defect coverage, especially for micro-scratches and fine cracks that are easily ignored in complex textures. These results indicate that the three improvements are complementary: the reparameterized lightweight backbone improves efficiency while retaining feature extraction capability, DMSFPN enhances multi-scale semantic alignment and detail preservation, and AIFI-TSSA provides stable global modeling with reduced complexity, enabling the detector to focus on discriminative defect regions rather than background patterns.
Conclusions In conclusion, RMD-DETR provides an effective balance between accuracy and speed for steel surface defect detection, achieving enhanced robustness and cross-scale performance across multiple industrial datasets, and it is suitable for practical quality inspection systems requiring high throughput and reliable detection. Future work will explore further compression for extremely resource-constrained edge devices and deployment optimization through quantization and acceleration tools to improve engineering applicability.