Abstract:
Objective Industrial steel surface defect detection is a core step of industrial quality inspection. However, traditional detection algorithms suffer from low accuracy, high model complexity, and difficult deployment due to the various types and large scale variations of steel surface defects. Existing deep learning-based detection algorithms have improved detection accuracy to a certain extent, but most of them feature large model size and high computational complexity, which makes them hard to deploy on resource-constrained industrial edge devices and thus restricts their practical application in real-time industrial detection scenarios. This study aims to propose a lightweight steel surface defect detection algorithm that balances high detection accuracy and low computational complexity, realizes efficient feature extraction for multi-scale steel surface defects, and meets the requirements of real-time detection and edge device deployment in industrial scenarios.
Methods A lightweight steel surface defect detection algorithm named RGCE-YOLO was proposed based on an improved YOLOv11n. The backbone network was reconstructed with a novel HGRCBlock structure combining lightweight HGNet and improved reparameterized RepLightConv, and multi-branch convolution and normalization steps were fused during inference to reduce computational load while maintaining feature extraction capability. For the neck network, the lightweight GSConv replaced the original downsampling structure to reduce redundant information via mixed convolution and address the accuracy bottleneck of depthwise separable convolution; the C3K2 structure was redesigned by integrating the Spatial and Channel Synergistic Attention (SCSA) mechanism to alleviate sub-feature semantic differences and enhance multi-scale object feature fusion. The CMBC_Detect module, constructed based on EfficientNet's inverted bottleneck structure and combined with the Coordinate Attention (CA) mechanism, was introduced into the detection head’s regression branch; the SE attention was replaced with the CA mechanism to retain spatial detail information, enhance small object detection ability, and compress the structure simultaneously. Channel-wise Knowledge Distillation (CWD) was applied to the improved model: each channel’s feature map was normalized to a soft probability map, and the asymmetric KL divergence between the teacher and student models was minimized to make the student model focus on salient regions of defect features, reducing feature loss without increasing computational cost. A self-built steel surface defect dataset with four defect types (rust, scratch, pit, crack) was established and expanded via data augmentation strategies including random cropping, dynamic brightness adjustment, salt-and-pepper noise, and Gaussian noise, and then divided into training, validation, and test sets at an 8:1:1 ratio. Ablation experiments verified the effectiveness of each improved module, and comparative experiments on distillation methods and mainstream detection algorithms were conducted under the same conditions. Generalization experiments were carried out on the public NEU-DET and GC10-DET datasets, and the model was converted to ONNX format for deployment verification on an ARM industrial computer. Precision, mAP50, mAP50-95, parameters, FPS, and FLOPs were used as key evaluation indicators to comprehensively assess the algorithm's performance.
Results and Discussions Ablation experiments confirmed the effective optimization role of each improved module: HGRCBlock reduced model parameters and FLOPs while enhancing multi-scale feature extraction; GSConv reduced the neck network’s redundant information and computational complexity; SCSA improved the model’s perception of multi-scale objects and detection robustness; CMBC_Detect strengthened the capture ability for small, easily missed defects and further compressed the model size; CWD improved model accuracy without changing model size and computational complexity. On the self-built dataset, RGCE-YOLO achieved a precision of 83.4% (3.9% higher than YOLOv11n), with mAP50 and mAP50-95 increased by 3.9% and 4.6% to 80.5% and 50.3%, respectively; parameters decreased by 34.5% to 1.69 M and FLOPs by 33.3% to 4.4 G, realizing dual optimization of accuracy and lightweight design. Distillation comparison experiments showed that CWD had a better optimization effect than other distillation methods, and YOLOv11m was selected as the optimal teacher model by balancing performance and complexity. Mainstream algorithm comparison experiments indicated that RGCE-YOLO outperformed traditional algorithms (Faster R-CNN, RetinaNet, SSD) and other lightweight algorithms (MobileNetV4, StarNet, NanoDet-Plus-m) in detection accuracy, and had obvious lightweight advantages over other YOLO series algorithms, with an FPS of 185.8 meeting industrial real-time detection needs. Generalization experiments showed that RGCE-YOLO achieved a precision of 81.2% (8.4% higher than YOLOv11n) on the NEU-DET dataset and 73.1% (4.2% higher than YOLOv11n) on the GC10-DET dataset, with stable mAP improvements, demonstrating excellent generalization ability. ONNX deployment verification on the ARM industrial computer showed that the model maintained a precision of 81.7% and an FPS of 35.1 with no significant performance decline, proving its adaptability to edge devices. Visual results showed that RGCE-YOLO had a lower missed detection rate, stronger small object feature extraction ability, and more accurate bounding box positioning than YOLOv11n. The excellent performance of RGCE-YOLO stemmed from the synergy of all improved modules: RepLightConv’s reparameterization balanced feature representation and computational efficiency; GSConv and SCSA solved feature redundancy and semantic differences in multi-scale defect detection; the CA mechanism in CMBC_Detect retained spatial details and solved small object detection information loss; CWD enhanced dense object feature capture and generalization without additional computational cost.
Conclusions RGCE-YOLO effectively solves the problems of low accuracy and poor lightweight performance of existing steel surface defect detection algorithms, and the synergy of the improved modules achieves dual optimization of detection accuracy and model lightweight design. On the self-built dataset, the algorithm not only has significantly higher precision, mAP50, and mAP50-95 than the YOLOv11n benchmark model, but also greatly reduces parameters and computational complexity, with its inference speed meeting industrial real-time detection requirements. It exhibits excellent generalization performance on public steel surface defect datasets and can be stably deployed on edge devices with no obvious detection performance decline, providing a reliable technical solution for real-time industrial steel surface defect detection. The design idea of combining a lightweight network structure, an attention mechanism, and knowledge distillation in this study also provides a reference for the lightweight optimization of other industrial object detection algorithms. In the future, constructing a dataset with more steel surface defect types and conducting targeted improvements for tiny object defects will further enhance the model’s comprehensive detection ability, promoting its wider application in industrial quality inspection scenarios with more complex defect types and higher detection requirements.