RMD-DETR:一种改进的钢材表面缺陷检测算法

蒋宏捷; 肖小玲; 颜昕

doi:10.12086/oee.2026.250181

RMD-DETR:一种改进的钢材表面缺陷检测算法

RMD-DETR: an improved steel surface defect detection algorithm

摘要: 针对钢材表面缺陷检测中存在的检测精度低、跨尺度适应性差等问题，本文提出一种基于改进RT-DETR的检测模型RMD-DETR。首先，引入轻量级FasterNet作为模型骨干网络，同时对FasterNetBlock中的PConv进行重参数化改进，在保持轻量化的同时增强特征表达能力，降低计算复杂度；其次，设计了动态多尺度特征融合网络(DMSFPN)，结合多尺度卷积块(MSCB)与高效上采样模块(EUCB)，增强跨分辨率特征的语义一致性；最后引入基于变分率降噪的TSSA机制，进一步提出AIFI-TSSA模块，提升复杂背景下的检测稳定性。实验结果表明，在NEU-DET数据集上，改进模型的mAP@0.5达到76.7%，较基准RT-DETR提升了2.6%。在跨尺度检测中，模型漏检率从基准的55.4%降至50.9%，降低了4.5%，同时参数量与计算量分别减少了39.9%和39.3%，帧率(FPS)提升了30.4% (78.9 f/s)，满足工业产线大于30 f/s的实时检测需求。在天池铝型材数据集和GC10-DET数据集上的泛化实验中，mAP@0.5分别提升了2%和3%，验证了模型在工业场景中的高效性与鲁棒性。

Abstract:

Objective Steel surface defects such as cracks, scratches, rolled-in scale, pitting, patches, and inclusions are common in industrial manufacturing and may cause severe degradation in corrosion resistance, wear resistance, and mechanical reliability, which makes accurate and efficient surface defect inspection essential for quality control. Although deep learning based detectors have achieved promising results, reliable detection of tiny defects remains difficult because these defects often present small size, low contrast, blurred boundaries, and large scale variations, while industrial backgrounds contain complex textures and noise that easily interfere with feature extraction. Two-stage detectors usually require heavy computation and fail to meet real-time requirements, and many one-stage detectors still suffer from high miss rates on small defects and limited cross-scale adaptability. RT-DETR introduces an end-to-end real-time detection framework by eliminating non-maximum suppression and leveraging Transformer-based global modeling, which improves dense target prediction, yet its parameter redundancy and insufficient cross-resolution feature alignment still restrict its efficiency and robustness in industrial scenarios.

Methods To address these limitations, this study proposes RMD-DETR, an improved RT-DETR based detection algorithm that enhances detection accuracy, cross-scale stability, and inference efficiency through three coordinated modifications. First, a lightweight FasterNet backbone is introduced to replace the original feature extractor, and the partial convolution in FasterNetBlock is further optimized with a reparameterization strategy to strengthen feature representation under limited computation. During training, multi-branch convolutional paths are used to enrich spatial diversity and improve learning capacity, while during inference these branches are equivalently fused into a single convolution operation, which simplifies the computational graph and accelerates deployment without sacrificing accuracy. Second, a dynamic multi-scale feature fusion network (DMSFPN) is designed as the neck to alleviate semantic inconsistency and misalignment between high-level semantic features and low-level detailed features, which are critical for localizing micro-defects. DMSFPN integrates a multi-scale convolution block (MSCB) to capture context information at different receptive fields, where heterogeneous kernel branches focus on tiny, medium, and large defect patterns, and a channel shuffle mechanism is employed to enhance cross-channel interaction and reduce feature redundancy. To improve detail recovery in cross-scale aggregation, an efficient upsampling convolution block (EUCB) is introduced, which combines lightweight convolution operations with deformable convolution based adaptive sampling, thereby mitigating the feature blurring problem caused by standard interpolation and preserving spatial continuity for small targets. A bidirectional feature fusion strategy is adopted to enable both top-down semantic propagation and bottom-up detail enhancement, improving cross-resolution consistency while avoiding redundant repeated computation. Third, to enhance feature interaction under cluttered backgrounds, a token statistics self-attention (TSSA) mechanism based on statistical feature compression and variational rate reduction is incorporated into the adaptive feature interaction module, forming an AIFI-TSSA enhancement block. Unlike conventional self-attention that computes pairwise token similarity with quadratic complexity and high memory cost, TSSA performs global context modeling with linear complexity through token statistics driven projection and denoising, which reduces computation overhead and improves robustness against redundant textures, occlusion, and noise. The proposed RMD-DETR is evaluated on three industrial datasets, including NEU-DET, the Tianchi Aluminum profile surface defect dataset, and GC10-DET, and comprehensive experiments are conducted to verify the effectiveness of each proposed component, including ablation studies, comparative experiments with representative detectors, cross-scale evaluation by resizing inputs to a smaller scale, and generalization tests across different industrial scenes.

Results and Discussions Experimental results on NEU-DET demonstrate that RMD-DETR achieves 76.7% mAP@0.5, outperforming the baseline RT-DETR by 2.6 percentage points, while precision and recall reach 77.8% and 72.6%, indicating improved localization reliability and reduced missed detections. The model simultaneously reduces parameters and computational cost by 39.9% and 39.3%, and the inference speed increases to 78.9 frames per second, satisfying the real-time inspection requirement of industrial production lines. In cross-scale detection experiments, the missed detection rate decreases from 55.4% to 50.9%, showing stronger adaptability under significant scale variation. Generalization experiments further confirm the transferability of the proposed design, where mAP@0.5 improves by 2% on the Tianchi dataset and by 3% on GC10-DET compared with the baseline model. Visualization analysis under Gaussian noise demonstrates that the proposed method maintains higher confidence and more complete defect coverage, especially for micro-scratches and fine cracks that are easily ignored in complex textures. These results indicate that the three improvements are complementary: the reparameterized lightweight backbone improves efficiency while retaining feature extraction capability, DMSFPN enhances multi-scale semantic alignment and detail preservation, and AIFI-TSSA provides stable global modeling with reduced complexity, enabling the detector to focus on discriminative defect regions rather than background patterns.

Conclusions In conclusion, RMD-DETR provides an effective balance between accuracy and speed for steel surface defect detection, achieving enhanced robustness and cross-scale performance across multiple industrial datasets, and it is suitable for practical quality inspection systems requiring high throughput and reliable detection. Future work will explore further compression for extremely resource-constrained edge devices and deployment optimization through quantization and acceleration tools to improve engineering applicability.

RMD-DETR:一种改进的钢材表面缺陷检测算法

RMD-DETR: an improved steel surface defect detection algorithm

相关链接

目录

RMD-DETR:一种改进的钢材表面缺陷检测算法

RMD-DETR: an improved steel surface defect detection algorithm

相关链接

目录

微信二维码