自适应多尺度融合的Transformer锻件表面缺陷检测算法

梁丹; 张上; 管慧攀

doi:10.12086/oee.2026.250242

自适应多尺度融合的Transformer锻件表面缺陷检测算法

Adaptive multi-scale fusion transformer for forging surface defect detection

摘要: 针对复杂背景下锻件表面缺陷形态多样、尺度多变及检测精度与效率难以兼顾等问题，本文提出一种基于RT-DETR架构改进的自适应多尺度融合的上下文检测模型 (MCF-RTDETR)。首先，采集重型卡车转向节生产流水线中的磁粉探测图像，构建锻件表面裂纹缺陷数据集。其次，在骨干网络中引入多向梯度融合卷积 (MGFConv)，通过多方向卷积自适应增强细长裂纹及微小缺陷的方向感知与边缘提取能力；同时，在颈部网络中嵌入上下文感知递归特征金字塔 (CARFPN)，根据局部上下文语义强度为不同尺度特征分配动态权重，实现对尺度变化与背景复杂性的自适应融合。最后，采用Focaler-MPDIoU边界框回归损失函数，对样本难度与几何比例进行自适应调节，提高长条状缺陷的定位精度与边界拟合能力。实验结果表明，MCF-RTDETR相较基准模型mAP提升2.3%，计算复杂度由58.3 G降低至33.5 G，推理速度达到137.2 f/s；在GC10-DET数据集上mAP提升了2.8%，验证了其检测精度与泛化性能。

Abstract:

Objective Forged parts are widely used as key load-bearing components in automotive, aerospace, shipbuilding, and heavy equipment industries. Their surface quality directly affects structural reliability and service safety. During forging and subsequent processing, surface defects such as cracks, inclusions, pores, oxide layers, and spot-like imperfections are frequently generated due to material heterogeneity, mold wear, and process instability. Magnetic particle inspection is commonly adopted for ferromagnetic forged parts because of its high sensitivity to surface and near-surface cracks. However, magnetic particle inspection images usually exhibit complex backgrounds, uneven illumination, magnetic trace interference, and weak texture contrast. These characteristics cause surface defects to present blurred boundaries, irregular shapes, and large scale variation. Slender cracks and tiny defects are especially difficult to detect accurately under real-time constraints. Existing convolutional neural network–based and Transformer-based detection models often show insufficient directional perception, limited contextual representation, and inadequate geometric sensitivity in bounding-box regression. As a result, detection accuracy and computational efficiency are difficult to balance in practical industrial inspection scenarios. The objective of this study was to develop a real-time forged-part surface defect detection model that enhances directional feature perception and contextual fusion capability, improves multi-scale feature alignment, and strengthens localization robustness for elongated defects, while maintaining high inference efficiency suitable for industrial deployment.

Methods A real-time surface defect detection framework, named MCF-RTDETR, was constructed based on an improved RT-DETR architecture. Magnetic particle inspection images of heavy-duty truck steering knuckles were collected from an industrial production environment and manually annotated to establish a forged-part surface crack dataset. The dataset contained complex background structures and fine-scale defect patterns, providing realistic samples for model evaluation.

In the backbone network, a multi-gradient fusion convolution module was introduced to enhance directional edge representation. This module integrated four directional difference convolutions with one standard convolution to capture gradient responses along multiple orientations. Such design improved feature sensitivity to slender cracks and weakly textured defects. To avoid increased inference cost, convolutional re-parameterization was adopted during the inference stage, allowing the multi-branch structure to be equivalently transformed into a single convolution layer while preserving feature enhancement capability.

In the neck network, a context-aware recursive feature pyramid network was designed to address insufficient spatial–semantic alignment in conventional multi-scale fusion schemes. A rectangular self-calibration module was first employed to model directional contextual information by expanding receptive fields along horizontal and vertical directions, which strengthened anisotropic defect representation. Subsequently, a FuseBlockMulti module was used to realize semantic-guided alignment between shallow detail features and deep semantic features, enabling adaptive information interaction across different feature levels. Furthermore, a dynamic interpolation fusion mechanism was introduced to assign adaptive fusion weights to multi-scale features according to local contextual semantic strength. Through recursive fusion, feature consistency across scales was progressively refined, improving robustness under complex backgrounds and large scale variation.

For bounding-box regression, the original loss function was replaced with a Focaler-MPDIoU loss. This loss combined difficulty-aware sample weighting with multi-point distance geometric constraints, enhancing localization accuracy and boundary fitting ability for elongated and irregular defects. The entire detection framework was optimized in an end-to-end manner and evaluated under unified training and inference configurations.

Results and Discussions Experimental results demonstrated that the proposed MCF-RTDETR achieved superior performance in forged-part surface defect detection. On the magnetic particle inspection dataset of heavy-duty truck steering knuckles, the proposed model obtained a mean average precision of 88.5%, which exceeded the RTDETR-r18 baseline by 2.3%. Meanwhile, computational complexity was reduced from 58.3 G to 33.5 G, and inference speed reached 137.2 f/s, indicating an effective balance between detection accuracy and computational efficiency.

Performance improvement was particularly evident for slender cracks and small-scale defects. Enhanced directional gradient perception reduced missed detections caused by weak edge responses, while context-aware recursive fusion suppressed background interference and improved defect-background discrimination. Ablation experiments confirmed the effectiveness of each proposed component. The multi-gradient fusion Convolution strengthened edge sensitivity without increasing inference overhead. The Context-Aware Recursive Feature Pyramid Network improved multi-scale feature consistency and spatial-semantic alignment. The Focaler-MPDIoU loss further enhanced bounding-box regression accuracy for defects with large aspect ratios.

Generalization capability was validated on the public GC10-DET dataset. The proposed model achieved a 2.8% increase in mean average precision compared with the baseline detector, demonstrating that the proposed feature enhancement and fusion strategies are transferable to different industrial surface defect detection scenarios. These results indicate that the proposed framework effectively integrates accuracy improvement with computational reduction, which is essential for real-time industrial inspection systems.

Conclusions A context-aware multi-scale fusion real-time detection model for forged-part surface defects is presented. Directional gradient fusion, adaptive contextual feature integration, and geometry-sensitive regression jointly enhance detection robustness under complex backgrounds and severe scale variation. Experimental results confirm that the proposed approach improves detection accuracy while reducing computational complexity and maintaining high inference speed. The model shows strong generalization capability and practical applicability, providing an effective solution for automated forged-part surface defect inspection in industrial environments.

自适应多尺度融合的Transformer锻件表面缺陷检测算法

Adaptive multi-scale fusion transformer for forging surface defect detection

相关链接

目录

自适应多尺度融合的Transformer锻件表面缺陷检测算法

Adaptive multi-scale fusion transformer for forging surface defect detection

相关链接

目录

微信二维码