• 摘要: 针对工业钢材表面因缺陷种类多、缺陷尺寸跨度大,导致传统检测算法检测精度低、模型复杂度高、部署困难等问题,提出一种改进YOLOv11n的轻量化钢材表面缺陷检测算法RGCE-YOLO。首先,重构主干网络,结合轻量化网络HGNet (high performance GPU network)与改进重参数化RepLightConv将多分支卷积过程转换为单路结构,保证检测精度的同时压缩模型体积。其次,颈部嵌入轻量化GSConv (group shuffle convolution)替代原有下采样减少冗余信息,融合空间与通道协同注意力机制 (spatial and channel synergistic attention, SCSA)重新设计C3K2结构,提高多尺度目标检测精度。然后,结合倒置瓶颈结构与坐标注意力(coordinate attention,CA)机制设计CMBC_Detect模块引入回归分支中,压缩结构同时增强对微小目标检测能力。最后,通过逐通道蒸馏 (channel-wise knowledge distillation, cwd)使模型聚焦于显著信息,减少特征损失,无损提高精度。实验结果表明,RGCE-YOLO算法相比YOLOv11n基准模型精度提升了3.9%,达到了83.4%,mAP50和mAP50-95分别提升了3.9%和4.6%,参数量下降了34.5%,Flops下降了33.3%,实现了精度与轻量化的双重优化,在NEU-DET与GC10-DET缺陷数据集上也获得了优异的泛化性能,为工业实时检测提供了可靠的技术解决方案。

       

      Abstract:
      Objective Industrial steel surface defect detection is a core step of industrial quality inspection. However, traditional detection algorithms suffer from low accuracy, high model complexity, and difficult deployment due to the various types and large scale variations of steel surface defects. Existing deep learning-based detection algorithms have improved detection accuracy to a certain extent, but most of them feature large model size and high computational complexity, which makes them hard to deploy on resource-constrained industrial edge devices and thus restricts their practical application in real-time industrial detection scenarios. This study aims to propose a lightweight steel surface defect detection algorithm that balances high detection accuracy and low computational complexity, realizes efficient feature extraction for multi-scale steel surface defects, and meets the requirements of real-time detection and edge device deployment in industrial scenarios.
      Methods A lightweight steel surface defect detection algorithm named RGCE-YOLO was proposed based on an improved YOLOv11n. The backbone network was reconstructed with a novel HGRCBlock structure combining lightweight HGNet and improved reparameterized RepLightConv, and multi-branch convolution and normalization steps were fused during inference to reduce computational load while maintaining feature extraction capability. For the neck network, the lightweight GSConv replaced the original downsampling structure to reduce redundant information via mixed convolution and address the accuracy bottleneck of depthwise separable convolution; the C3K2 structure was redesigned by integrating the Spatial and Channel Synergistic Attention (SCSA) mechanism to alleviate sub-feature semantic differences and enhance multi-scale object feature fusion. The CMBC_Detect module, constructed based on EfficientNet's inverted bottleneck structure and combined with the Coordinate Attention (CA) mechanism, was introduced into the detection head’s regression branch; the SE attention was replaced with the CA mechanism to retain spatial detail information, enhance small object detection ability, and compress the structure simultaneously. Channel-wise Knowledge Distillation (CWD) was applied to the improved model: each channel’s feature map was normalized to a soft probability map, and the asymmetric KL divergence between the teacher and student models was minimized to make the student model focus on salient regions of defect features, reducing feature loss without increasing computational cost. A self-built steel surface defect dataset with four defect types (rust, scratch, pit, crack) was established and expanded via data augmentation strategies including random cropping, dynamic brightness adjustment, salt-and-pepper noise, and Gaussian noise, and then divided into training, validation, and test sets at an 8:1:1 ratio. Ablation experiments verified the effectiveness of each improved module, and comparative experiments on distillation methods and mainstream detection algorithms were conducted under the same conditions. Generalization experiments were carried out on the public NEU-DET and GC10-DET datasets, and the model was converted to ONNX format for deployment verification on an ARM industrial computer. Precision, mAP50, mAP50-95, parameters, FPS, and FLOPs were used as key evaluation indicators to comprehensively assess the algorithm's performance.
      Results and Discussions Ablation experiments confirmed the effective optimization role of each improved module: HGRCBlock reduced model parameters and FLOPs while enhancing multi-scale feature extraction; GSConv reduced the neck network’s redundant information and computational complexity; SCSA improved the model’s perception of multi-scale objects and detection robustness; CMBC_Detect strengthened the capture ability for small, easily missed defects and further compressed the model size; CWD improved model accuracy without changing model size and computational complexity. On the self-built dataset, RGCE-YOLO achieved a precision of 83.4% (3.9% higher than YOLOv11n), with mAP50 and mAP50-95 increased by 3.9% and 4.6% to 80.5% and 50.3%, respectively; parameters decreased by 34.5% to 1.69 M and FLOPs by 33.3% to 4.4 G, realizing dual optimization of accuracy and lightweight design. Distillation comparison experiments showed that CWD had a better optimization effect than other distillation methods, and YOLOv11m was selected as the optimal teacher model by balancing performance and complexity. Mainstream algorithm comparison experiments indicated that RGCE-YOLO outperformed traditional algorithms (Faster R-CNN, RetinaNet, SSD) and other lightweight algorithms (MobileNetV4, StarNet, NanoDet-Plus-m) in detection accuracy, and had obvious lightweight advantages over other YOLO series algorithms, with an FPS of 185.8 meeting industrial real-time detection needs. Generalization experiments showed that RGCE-YOLO achieved a precision of 81.2% (8.4% higher than YOLOv11n) on the NEU-DET dataset and 73.1% (4.2% higher than YOLOv11n) on the GC10-DET dataset, with stable mAP improvements, demonstrating excellent generalization ability. ONNX deployment verification on the ARM industrial computer showed that the model maintained a precision of 81.7% and an FPS of 35.1 with no significant performance decline, proving its adaptability to edge devices. Visual results showed that RGCE-YOLO had a lower missed detection rate, stronger small object feature extraction ability, and more accurate bounding box positioning than YOLOv11n. The excellent performance of RGCE-YOLO stemmed from the synergy of all improved modules: RepLightConv’s reparameterization balanced feature representation and computational efficiency; GSConv and SCSA solved feature redundancy and semantic differences in multi-scale defect detection; the CA mechanism in CMBC_Detect retained spatial details and solved small object detection information loss; CWD enhanced dense object feature capture and generalization without additional computational cost.
      Conclusions RGCE-YOLO effectively solves the problems of low accuracy and poor lightweight performance of existing steel surface defect detection algorithms, and the synergy of the improved modules achieves dual optimization of detection accuracy and model lightweight design. On the self-built dataset, the algorithm not only has significantly higher precision, mAP50, and mAP50-95 than the YOLOv11n benchmark model, but also greatly reduces parameters and computational complexity, with its inference speed meeting industrial real-time detection requirements. It exhibits excellent generalization performance on public steel surface defect datasets and can be stably deployed on edge devices with no obvious detection performance decline, providing a reliable technical solution for real-time industrial steel surface defect detection. The design idea of combining a lightweight network structure, an attention mechanism, and knowledge distillation in this study also provides a reference for the lightweight optimization of other industrial object detection algorithms. In the future, constructing a dataset with more steel surface defect types and conducting targeted improvements for tiny object defects will further enhance the model’s comprehensive detection ability, promoting its wider application in industrial quality inspection scenarios with more complex defect types and higher detection requirements.