Drivable area segmentation model for open-pit mine roads based on spatial depth conversion

Zhao Ruixiang; Gu Qinghua; Li Shaobo; Li Binyang

doi:10.12086/oee.2026.250216

Abstract

Abstract

Objective The unstructured road environment in open-pit mining areas is characterized by complex topography, variable lighting conditions, and irregular boundaries. In autonomous driving scenarios for mining trucks, particularly under medium-to-long-distance viewing perspectives, critical obstacles such as ruts and puddles often appear as low-resolution, small-scale targets. Conventional convolutional neural networks typically employ strided convolutions or pooling operations for downsampling, which inevitably leads to the loss of fine-grained texture and boundary information. Consequently, existing segmentation methods frequently suffer from missed detections of small targets and blurred segmentation edges, posing severe safety risks to the path planning of unmanned vehicles. To address these challenges, this study aimed to develop a high-precision, real-time instance segmentation model, named SCG-YOLO, specifically optimized for the drivable areas of open-pit mine roads. The primary goal was to enhance the perception of low-resolution details and irregular boundaries while strictly adhering to the computational constraints of onboard embedded devices.
Methods The proposed SCG-YOLO model was constructed based on the YOLOv8-seg architecture, integrating three targeted improvement modules to handle the unique constraints of mining environments.
First, a space-to-depth and deformable convolution (SDC) module was designed and embedded into the feature extraction network. To mitigate the information loss associated with traditional downsampling, the Space-to-Depth layer decoupled and reorganized the spatial dimensions of input feature maps into depth dimensions. This operation effectively preserved fine-grained information, such as the shallow textures of ruts and the irregular edges of puddles, which are typically discarded by standard pooling layers. Furthermore, Deformable Convolution layers were incorporated within the SDC module. Unlike standard convolutions with fixed geometric structures, deformable convolutions learned offset fields to dynamically adjust the sampling grid. This capability allowed the network to adaptively model the non-rigid deformations of targets on unstructured road surfaces, ensuring accurate feature alignment for distorted shapes.
Second, a coordinate attention (CA) mechanism was introduced into the network neck. Distinct from global average pooling which compresses spatial information, the CA mechanism performed feature encoding along two spatial directions (horizontal and vertical) respectively. This process captured long-range dependencies while preserving precise positional information. By re-weighting the feature maps, the mechanism enhanced the network's focus on key drivable regions and suppressed interference from complex background factors common in mining areas, such as ore piles, dust, and equipment shadows.
Third, to address the increased computational overhead introduced by the SDC and CA modules, a Ghost module was adopted for the lightweight reconstruction of the neck layer. Standard Convolution and C2f modules were replaced with GhostConv and C3Ghost modules. The Ghost module generated a portion of the feature maps using standard convolutions and then produced redundant features via computationally inexpensive linear operations. This strategy significantly reduced the number of parameters and floating point operations (FLOPs) without compromising the representational power of the model, thereby serving as an engineering constraint to ensure real-time performance.
Results and Discussions Experimental validation was conducted using a self-collected dataset of open-pit mine roads, comprising 565 images covering various seasons and lighting conditions. The dataset included three annotated categories: drivable areas, ruts, and puddles. The proposed SCG-YOLO model demonstrated superior performance across multiple evaluation metrics.
Quantitative analysis revealed that the model achieved a mean average precision (mAP50) of 82.6%, representing a substantial increase of 11.7% compared to the baseline YOLOv8-seg model. In terms of inference speed, the model maintained a frame rate of 425.31 f/s, satisfying the stringent real-time requirements of autonomous mining operations.
Ablation studies provided deep insights into the contribution of each component. The integration of the SDC module alone yielded a 9.4% improvement in mAP50, validating its efficacy in recovering fine-grained details lost during downsampling. The addition of the CA mechanism contributed a further 2.2% increase, confirming its ability to filter background noise and sharpen focus on relevant features. The implementation of the Ghost module successfully reduced the parameter count to 11.2 M, offsetting the computational cost of the accuracy-enhancing modules.
Comparative experiments conducted with various instance segmentation algorithms, including Mask R-CNN, YOLACT, FastInst, Mask2Former, and various YOLO iterations (v5, v6, v11), highlight the advantages of SCG-YOLO. While heavy models like Mask2Former offered competitive accuracy, their excessive computational demands (286.5 G FLOPs) rendered them unsuitable for edge deployment. Conversely, lighter models like YOLACT failed to achieve sufficient segmentation precision for small targets. SCG-YOLO achieved the optimal balance between accuracy and efficiency.
Visual qualitative analysis showed that the model produced significantly sharper segmentation masks. In scenarios with dusty backgrounds or water-filled potholes, SCG-YOLO accurately delineated boundaries that appeared blurred in baseline predictions. Generalization tests performed on the public RUGD dataset further demonstrated the model's robustness, where it outperformed the baseline by 5.2% in mAP50, proving its adaptability to diverse unstructured terrains beyond the specific training site.
Conclusions The study presents a robust solution for environmental perception in open-pit mining based on spatial depth conversion. The investigation confirms that the proposed SCG-YOLO model effectively resolves the core issues of fine-grained feature loss and edge blurring associated with low-resolution imaging of small targets in unstructured road environments. The synergistic combination of the SDC module and CA mechanism significantly enhances the model's ability to perceive rut textures and puddle boundaries under complex conditions. Simultaneously, the lightweight design utilizing Ghost modules ensures that the high computational efficiency required for autonomous driving systems is maintained. The model achieves a high segmentation precision of 82.6% mAP50 while running at speeds exceeding 400 f/s, providing a viable and safe technical foundation for the deployment of unmanned mining trucks. Future research directions include expanding the dataset to encompass extreme weather conditions and multi-vehicle interaction scenarios to further improve the model's generalization capabilities in dynamic mining environments.