• 摘要: 露天矿区非结构化路面环境复杂多变,特别是在中远距离视角下,车辙、水坑等小尺度区域在图像中呈现为低分辨率细节目标,其纹理与边界信息在特征提取过程中严重丢失,极易导致小目标漏检与分割边缘模糊,对无人驾驶车辆的路径规划安全构成严峻挑战。针对该问题,提出一种基于空间深度转换的露天矿卡行车可行驶区域实例分割模型(SCG-YOLO)。首先构建空间到深度变换与可变形卷积(space-to-depth and deformable convolution, SDC)模块,通过空间深度解耦重组保留车辙纹理、水坑边缘等细粒度信息,再结合可变形卷积的动态采样特性,适配非结构化路面中目标的非刚性形变,加强模型对路面纹理细节和地形边缘的感知能力。其次引入坐标注意力(coordinate attention, CA)机制,通过抑制无关信息并增强车辙、水坑等关键区域的特征响应,进一步改善分割掩膜边缘清晰度与区域完整性。最后采用Ghost模块对Neck层进行轻量化重构,作为工程约束以确保模型在维持高精度的前提下满足实时性需求。基于现场采集的露天矿区道路数据集进行实验验证,结果表明:模型mAP50达到了82.6%,较基准模型提高了11.7%,且分割边缘清晰度得到改善;同时保持425.31 f/s实时性,为露天矿复杂环境下无人驾驶提供了可行的环境感知方案。

       

      Abstract:
      Objective The unstructured road environment in open-pit mining areas is characterized by complex topography, variable lighting conditions, and irregular boundaries. In autonomous driving scenarios for mining trucks, particularly under medium-to-long-distance viewing perspectives, critical obstacles such as ruts and puddles often appear as low-resolution, small-scale targets. Conventional convolutional neural networks typically employ strided convolutions or pooling operations for downsampling, which inevitably leads to the loss of fine-grained texture and boundary information. Consequently, existing segmentation methods frequently suffer from missed detections of small targets and blurred segmentation edges, posing severe safety risks to the path planning of unmanned vehicles. To address these challenges, this study aimed to develop a high-precision, real-time instance segmentation model, named SCG-YOLO, specifically optimized for the drivable areas of open-pit mine roads. The primary goal was to enhance the perception of low-resolution details and irregular boundaries while strictly adhering to the computational constraints of onboard embedded devices.
      Methods The proposed SCG-YOLO model was constructed based on the YOLOv8-seg architecture, integrating three targeted improvement modules to handle the unique constraints of mining environments.
      First, a space-to-depth and deformable convolution (SDC) module was designed and embedded into the feature extraction network. To mitigate the information loss associated with traditional downsampling, the Space-to-Depth layer decoupled and reorganized the spatial dimensions of input feature maps into depth dimensions. This operation effectively preserved fine-grained information, such as the shallow textures of ruts and the irregular edges of puddles, which are typically discarded by standard pooling layers. Furthermore, Deformable Convolution layers were incorporated within the SDC module. Unlike standard convolutions with fixed geometric structures, deformable convolutions learned offset fields to dynamically adjust the sampling grid. This capability allowed the network to adaptively model the non-rigid deformations of targets on unstructured road surfaces, ensuring accurate feature alignment for distorted shapes.
      Second, a coordinate attention (CA) mechanism was introduced into the network neck. Distinct from global average pooling which compresses spatial information, the CA mechanism performed feature encoding along two spatial directions (horizontal and vertical) respectively. This process captured long-range dependencies while preserving precise positional information. By re-weighting the feature maps, the mechanism enhanced the network's focus on key drivable regions and suppressed interference from complex background factors common in mining areas, such as ore piles, dust, and equipment shadows.
      Third, to address the increased computational overhead introduced by the SDC and CA modules, a Ghost module was adopted for the lightweight reconstruction of the neck layer. Standard Convolution and C2f modules were replaced with GhostConv and C3Ghost modules. The Ghost module generated a portion of the feature maps using standard convolutions and then produced redundant features via computationally inexpensive linear operations. This strategy significantly reduced the number of parameters and floating point operations (FLOPs) without compromising the representational power of the model, thereby serving as an engineering constraint to ensure real-time performance.
      Results and Discussions Experimental validation was conducted using a self-collected dataset of open-pit mine roads, comprising 565 images covering various seasons and lighting conditions. The dataset included three annotated categories: drivable areas, ruts, and puddles. The proposed SCG-YOLO model demonstrated superior performance across multiple evaluation metrics.
      Quantitative analysis revealed that the model achieved a mean average precision (mAP50) of 82.6%, representing a substantial increase of 11.7% compared to the baseline YOLOv8-seg model. In terms of inference speed, the model maintained a frame rate of 425.31 f/s, satisfying the stringent real-time requirements of autonomous mining operations.
      Ablation studies provided deep insights into the contribution of each component. The integration of the SDC module alone yielded a 9.4% improvement in mAP50, validating its efficacy in recovering fine-grained details lost during downsampling. The addition of the CA mechanism contributed a further 2.2% increase, confirming its ability to filter background noise and sharpen focus on relevant features. The implementation of the Ghost module successfully reduced the parameter count to 11.2 M, offsetting the computational cost of the accuracy-enhancing modules.
      Comparative experiments conducted with various instance segmentation algorithms, including Mask R-CNN, YOLACT, FastInst, Mask2Former, and various YOLO iterations (v5, v6, v11), highlight the advantages of SCG-YOLO. While heavy models like Mask2Former offered competitive accuracy, their excessive computational demands (286.5 G FLOPs) rendered them unsuitable for edge deployment. Conversely, lighter models like YOLACT failed to achieve sufficient segmentation precision for small targets. SCG-YOLO achieved the optimal balance between accuracy and efficiency.
      Visual qualitative analysis showed that the model produced significantly sharper segmentation masks. In scenarios with dusty backgrounds or water-filled potholes, SCG-YOLO accurately delineated boundaries that appeared blurred in baseline predictions. Generalization tests performed on the public RUGD dataset further demonstrated the model's robustness, where it outperformed the baseline by 5.2% in mAP50, proving its adaptability to diverse unstructured terrains beyond the specific training site.
      Conclusions The study presents a robust solution for environmental perception in open-pit mining based on spatial depth conversion. The investigation confirms that the proposed SCG-YOLO model effectively resolves the core issues of fine-grained feature loss and edge blurring associated with low-resolution imaging of small targets in unstructured road environments. The synergistic combination of the SDC module and CA mechanism significantly enhances the model's ability to perceive rut textures and puddle boundaries under complex conditions. Simultaneously, the lightweight design utilizing Ghost modules ensures that the high computational efficiency required for autonomous driving systems is maintained. The model achieves a high segmentation precision of 82.6% mAP50 while running at speeds exceeding 400 f/s, providing a viable and safe technical foundation for the deployment of unmanned mining trucks. Future research directions include expanding the dataset to encompass extreme weather conditions and multi-vehicle interaction scenarios to further improve the model's generalization capabilities in dynamic mining environments.