分组通道Transformer与调制融合的视网膜血管分割算法

梁礼明; 王灵亮; 蔡锦辉; 王成斌

doi:10.12086/oee.2026.250264

分组通道Transformer与调制融合的视网膜血管分割算法

Grouped channel Transformer and modulation fusion retinal vessel segmentation algorithm

摘要: 针对视网膜血管分割存在微细血管表征能力有限、复杂背景干扰和血管结构断裂等问题，提出一种分组通道Transformer与调制融合的视网膜血管分割算法。首先设计双维协同注意力模块于编码部分，增强微细血管的特征表达，有效提升血管分割能力。其次在编码端构建分组通道Transformer模块，旨在捕获血管之间的长程依赖关系，精确捕捉血管纹理信息。最后在解码端引入调制融合模块，有效融合编解码特征信息，抑制病灶区域噪声干扰。在数据集DRIVE、CHASE_DB1和STARE上进行实验，所提算法准确率分别为0.9711、0.9762和0.9758，灵敏度分别为0.8026、0.8219和0.8042，特异性分别为0.9873、0.9866和0.9899。该方法在整体分割效果方面优于其他算法，展现出更好的分割性能。

Abstract:

Objective Changes in retinal vascular morphology serve as crucial indicators for diagnosing various diseases, such as diabetic retinopathy, hypertension, and cataracts. Traditional diagnosis of retinal diseases primarily relies on manual assessment by clinicians, typically involving the analysis of retinal vascular curvature, width, and branching patterns based on imaging results from fundus examinations, optical coherence tomography, and fluorescein angiography. However, these imaging outcomes are susceptible to interference from factors such as illumination and noise, which increase the difficulty for clinicians in distinguishing retinal vessels from the background. Furthermore, inherent ocular structures like the optic cup and optic disc regions, as well as pathological alterations such as hard exudates, soft exudates, and the macular area, can also impede clinicians' judgment. Diagnostic results often vary subjectively among different physicians, potentially leading to misdiagnosis due to insufficient experience or fatigue. Automated retinal vessel segmentation technology can not only significantly reduce the time and effort required for manual annotation by clinicians and improve diagnostic efficiency but also minimize the impact of human factors to the greatest extent, providing more objective results to assist physicians in decision-making. Therefore, developing an accurate and automated retinal vessel segmentation technique holds substantial practical significance.

Methods In the field of retinal vessel segmentation, existing methods can be broadly categorized into unsupervised and supervised approaches. Unsupervised methods do not rely on manually annotated data and primarily achieve segmentation by analyzing the morphological features of vessels. Common techniques include matched filtering-based, edge detection-based, and model-based methods. Matched filtering methods mainly utilize Gaussian kernel functions or their variants, which resemble the cross-sectional shape of blood vessels, to construct two-dimensional linear filter banks. These methods enhance tubular structures in retinal images through multi-scale and multi-directional filter responses, combined with thresholding for preliminary detection of vessel pixels. Edge detection-based methods capture boundary features between vessels and the background by calculating gradient changes in pixel intensity or using differential operators such as Canny and Sobel, often integrated with local or global thresholding strategies to extract vascular contours with continuous topological structures. Model-based methods rely on geometric or statistical models, such as active contour models, level set methods, or vascular tubular models, to mathematically describe vessel structure and distribution. They iteratively optimize energy functionals or probabilistic fitting processes to gradually evolve fine segmentation boundaries that adaptively conform to vessel morphology. However, such methods often depend on manually designed feature extraction pipelines, making it difficult to fully capture the complex topological structures of vessels, thereby resulting in limited segmentation performance. In contrast, supervised methods require expert-annotated data as training samples to achieve higher accuracy in vessel segmentation tasks. These methods can be further divided into traditional machine learning methods and deep learning methods. Traditional machine learning methods typically involve two steps: manual feature extraction and classification. Handcrafted features mainly include edge responses, wavelet responses, principal component information, and color intensity information. Based on the choice of classifier, they can be roughly categorized into k-nearest neighbors, random forests, support vector machines, and Gaussian mixture models. Traditional machine learning methods extract segmentation targets based on vascular features and fundus image characteristics. Compared to manual screening, they offer advantages such as shorter processing time and faster segmentation. However, their feature extraction process still requires manual design and cannot adequately represent the multi-scale structural features of vessels. With the continuous advancement and development of deep learning, it has gradually been adopted across various tasks in computer vision. In the field of medical image analysis, segmentation algorithms have been continuously updated and iterated. The introduction of various deep learning models and modules has further improved the accuracy and efficiency of fundus vessel segmentation algorithms. Compared to traditional machine learning methods, deep learning methods can automatically learn optimal features and achieve predictive modeling without human intervention, offering higher segmentation efficiency. Nevertheless, challenges such as limited representation of fine vessels, interference from complex backgrounds, and discontinuities in vessel structures remain. To address the aforementioned issues, this paper proposes a retinal vessel segmentation algorithm based on Grouped Channel Transformer and Modulated Fusion. First, a Dual-Dimensional Cooperative Attention module is designed in the encoder to enhance the distinguishability of vessel features and improve the model's accuracy in segmenting microvessels. Second, a Grouped Channel Transformer module is applied to construct a dual-encoder architecture, establishing global dependencies to effectively capture detailed vessel information. Finally, a Modulated Fusion module is introduced at the decoder to enhance feature fusion between the encoder and decoder, thereby improving the model's robustness against noise interference.

Results and Discussions Experiments conducted on the DRIVE, CHASE_DB1, and STARE datasets demonstrate that the proposed algorithm achieves accuracy rates of 0.9711, 0.9762, and 0.9758, sensitivities of 0.8026, 0.8219, and 0.8042, and specificities of 0.9873, 0.9866, and 0.9899, respectively. The proposed method outperforms other algorithms in overall segmentation performance, demonstrating superior segmentation capability.

Conclusions GTMF-Net demonstrates robust vessel segmentation capabilities on the DRIVE, CHASE_DB1, and STARE datasets. It achieves a favorable balance between model complexity and segmentation performance, offering both relatively low computational cost and high segmentation accuracy. However, the algorithm currently exhibits limitations in feature extraction within low-contrast regions, which can lead to mis-segmentation. Future work will focus on optimizing the algorithm's design to improve the accuracy of vessel identification in pathological areas.

分组通道Transformer与调制融合的视网膜血管分割算法

Grouped channel Transformer and modulation fusion retinal vessel segmentation algorithm

相关链接

目录

分组通道Transformer与调制融合的视网膜血管分割算法

Grouped channel Transformer and modulation fusion retinal vessel segmentation algorithm

相关链接

目录

微信二维码