• 摘要: 针对皮肤病变分割任务中因边界模糊、形态多变、毛发遮挡等复杂特征干扰导致病灶区域定位困难,以及现有方法参数量大、计算复杂度高的问题,本文提出一种膨胀门控与自适应桥接的轻量化皮肤病变分割网络。首先,设计膨胀门控感知增强模块,结合差异化膨胀策略、门控注意机制与渐进感知结构,强化病变区域的边界表征与上下文感知。其次,构建自适应特征桥接模块,利用通道-空间双注意机制与跨粒度协同模块实现跨层级通道校准,提升对病灶尺度变化与毛发干扰的鲁棒性。然后,设计轻量初始结构引导模块,融合标准卷积与大核深度可分卷积构建长程依赖,提供高质量特征基元。最后,引入高效通道注意力,利用自适应卷积核缓解上采样过程中的信息损失。在ISIC2016、ISIC2017与ISIC2018数据集上进行验证,平均交并比(mean intersection over union, mIoU)分别达到85.78%、79.81%、80.65%,Dice相似系数(dice similarly coefficient, DSC)分别达到91.90%、88.73%、89.37%,且参数量仅为0.30 M,计算复杂度为0.22 GFLOPs。同当前主流方法相比,本文方法在具备更优分割精度的同时实现了模型的轻量化设计,具有良好的实际部署价值与临床应用前景。

       

      Abstract:
      Objective Skin lesions are often accompanied by complex features such as blurred boundaries, variable shapes, and hair occlusion, which makes it difficult to accurately locate and finely characterize the lesion area in the segmentation task. At the same time, existing methods often have problems such as large number of parameters and high computational complexity, making them difficult to deploy on resource-constrained terminal devices. To this end, this paper proposes a novel lightweight skin lesion segmentation network based on dilated gating and adaptive bridging mechanisms.
      Methods The proposed method is designed to strike a balance between efficiency and segmentation performance, addressing the need for high-precision medical image analysis under resource-constrained environments.Specifically, a Dilated Gated Perceptual Enhancement (DGPE) module is introduced to improve both local detail modeling and global contextual understanding. This module employs a channel decoupling strategy combined with parallel depthwise separable convolutions of varying dilation rates to extract features at multiple scales. By integrating a gating attention mechanism and a progressive perception structure, the model effectively suppresses redundant background information and enhances edge structure preservation. To further bridge the semantic gap between encoder and decoder features, an Adaptive Feature Bridging (AFB) module is constructed. This module incorporates a dual attention mechanism—including Channel Attention Bridge (CAB) and Spatial Attention Bridge (SAB)—to perform cross-level feature calibration and establish global receptive field associations. In addition, a Cross-Granularity Feature Integration (CGFI) block is introduced to merge multi-resolution semantic features, enhancing the model’s capability to handle complex lesion structures with varying shapes and sizes. Furthermore, a Lightweight Initial Structure-Guided (LISG) module is developed to enhance shallow feature extraction. It integrates standard convolutions with large-kernel depthwise separable convolutions, forming long-range spatial dependencies in the early stages of the network. This is further supported by residual connections and cascaded small-kernel convolutions, ensuring effective gradient flow and feature stability across network layers. Lastly, an Efficient Channel Attention (ECA) mechanism is incorporated to model channel-wise dependencies using lightweight, adaptive one-dimensional convolution, effectively mitigating information loss during upsampling and preserving feature consistency throughout the decoding process.
      Results and Discussions Taking the ISIC2017 dataset as an example, our proposed method achieves mIoU and DSC metrics of 79.81% and 88.73%, respectively, representing performance improvements of 2.83% and 1.74% compared to the classic U-Net model. Simultaneously, the number of model parameters is significantly reduced by 25 times, and the computational complexity is reduced by 62 times, fully demonstrating its excellent computational efficiency while maintaining accuracy. Experimental results show that TransFuse and MHorUNet achieve mIoU of 79.21% and 79.37%, and DSC metrics of 88.40% and 87.99%, respectively, performing better than other methods compared in our paper. However, this is inseparable from their enormous computational burden. Compared to TransFuse, our proposed method reduces the number of parameters and computational complexity by 87 times and 51 times, respectively, significantly alleviating the hardware burden. Compared to MSS-UNet, another lightweight model, our proposed method reduces the number of parameters by 0.03M, a negligible difference, but achieves a 72-fold reduction in computational complexity. Even with the advancements in parameter compression achieved through dual-space shift MLP modules and external attention, MSS-UNet's computational complexity remains significantly higher than our proposed method, indicating room for improvement in computational efficiency. Furthermore, its high computational load may lead to overfitting, impacting model performance. In contrast, while UNeXt-S boasts a lower computational complexity of only 0.10 GFLOPs, its mIoU and DSC metrics are only 78.26% and 87.80%, respectively, showing a significant gap in segmentation accuracy. Its performance in other metrics is also inferior to our proposed algorithm. Our proposed method, in comparison, improves mIoU by 1.55% and DSC by 0.93%, demonstrating a superior balance between performance and efficiency.Comparative experiments show that our proposed method achieves an advanced balance between parameter count, computational complexity, and segmentation accuracy. Specifically, compared to traditional CNN-based medical segmentation models, our proposed method significantly reduces model redundancy and achieves more efficient feature extraction and fusion through structural design optimization. Compared to Transformer-based hybrid models, our method improves accuracy while significantly reducing computational resource consumption, thus alleviating deployment challenges. Furthermore, compared to similar lightweight models, our method not only ranks among the best in efficiency but also maintains a higher level of segmentation accuracy.
      Conclusions To address the common problems of insufficient segmentation accuracy and redundant model parameters in current skin lesion image segmentation methods, this paper proposes a lightweight skin lesion segmentation network based on dilatational gating and adaptive bridging. A dilatational gating perception enhancement module (DGPE) is designed, combining split dilatation units, gated attention units, and progressive perception enhancement units to fully mine multi-scale features and co-model global-local information, effectively mitigating the loss of feature details. An adaptive feature bridging module (AFB) is constructed, utilizing channel attention, spatial attention, and cross-granularity feature collaboration mechanisms to enhance effective interaction and fusion between cross-layer features, improving the accuracy of semantic information representation. A lightweight initial structure guidance module (LISG) is proposed in the initial stage of the network to improve the spatial representation ability and low-level semantic perception of shallow features, providing more robust guidance information for deep representation. Finally, efficient channel attention (ECA) is embedded in the upsampling stage, capturing the correlation features between key channels without increasing model complexity, and strengthening the localization ability of edge regions. This paper conducts a systematic evaluation on four publicly available datasets. Extensive experiments conducted on the ISIC2016, ISIC2017, and ISIC2018 datasets demonstrate the superior performance of the proposed model. It achieves mIoU scores of 85.78%, 79.81%, and 80.65%, and DSC scores of 91.90%, 88.73%, and 89.37%, respectively. With only 0.30M parameters and 0.22 GFLOPs, the model significantly outperforms existing methods in terms of accuracy and efficiency, showcasing strong potential for real-world deployment in mobile medical applications and intelligent diagnostic systems, especially under constrained computing environments and limited storage conditions. Experimental results show that the proposed method significantly reduces the number of parameters and computational overhead while ensuring high segmentation accuracy. Its overall performance outperforms existing mainstream comparative models. Ablation experiments further validate the effectiveness and synergistic advantages of each module. This study provides a more reliable solution for the analysis of skin lesion images in clinical settings. Future work will explore integrating more feature enhancement strategies and structural optimization methods to further improve the model's generalization ability and make it applicable to research on other diseases.