Fusion dual-attention retinal disease grading algorithm with PVTv2 and DenseNet121

Liang Liming; Zhong Yi; Chen Kangquan; Wang Chengbin

doi:10.12086/oee.2025.240273

Article navigation > Opto-Electronic Engineering > 2025 Vol. 52 > No. 4 > 240273

Next Article Previous Article

Liang L M, Zhong Y, Chen K Q, et al. Fusion dual-attention retinal disease grading algorithm with PVTv2 and DenseNet121[J]. Opto-Electron Eng, 2025, 52(4): 240273. doi: 10.12086/oee.2025.240273

Citation:

Liang L M, Zhong Y, Chen K Q, et al. Fusion dual-attention retinal disease grading algorithm with PVTv2 and DenseNet121[J]. Opto-Electron Eng, 2025, 52(4): 240273. doi: 10.12086/oee.2025.240273

Fusion dual-attention retinal disease grading algorithm with PVTv2 and DenseNet121

School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China

Fund Project: National Natural Science Foundation of China (51365017, 61463018), Jiangxi Provincial Natural Science Foundation (20192BAB205084), and Jiangxi Provincial Department of Education Science and Technology Research Youth Project (GJJ2200848)

More Information

^*Corresponding author: zy037210@163.com
CSTR: 32245.14.oee.2025.240273

Received Date 22 November 2024

Revised Date 23 January 2025

Accepted Date 23 January 2025

Published Date 25 April 2025

Abstract

Abstract

To address the challenges of uneven inter-class distribution and difficulty in lesion area recognition in retinal fundus image datasets, this paper proposes a fusion dual-attention retinal disease grading algorithm with PVTv2 and DenseNet121. First, retinal images are preliminarily processed through a dual-branch network of PVTv2 and DenseNet121 to extract global and local information. Next, spatial-channel synergistic attention modules and multi-frequency multi-scale attention modules are applied to PVTv2 and DenseNet121, respectively. These modules refine local feature details, highlight subtle lesion features, and enhance the model's sensitivity to complex micro-lesions and its spatial perception of lesions areas. Subsequently, a neuron cross-fusion module is designed to establish long-range dependencies between the macroscopic layout and microscopic texture information of lesion areas, thereby improving the accuracy of retinal disease grading. Finally, a hybrid loss function is employed to mitigate the imbalance in model attention across grades caused by uneven sample distribution. Experimental validation on the IDRID and APTOS 2019 datasets yields quadratic weighted kappa scores of 90.68% and 90.35%, respectively. The accuracy on the IDRID dataset and the area under the ROC curve on the APTOS 2019 dataset reached 80.58% and 93.22%, respectively. The experimental results demonstrate that the proposed algorithm holds significant potential for application in retinal disease grading.
- retinal disease grading /
- spatial-channel synergistic attention module /
- multi-frequency multi-scale attention module /
- neuron cross fusion module

FullText(HTML)

References

[1]	Che H X, Cheng Y H, Jin H B, et al. Towards generalizable diabetic retinopathy grading in unseen domains[C]//Proceedings of the 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, 2023: 430–440. https://doi.org/10.1007/978-3-031-43904-9_42. Google Scholar
[2]	Sumathi K, Sendhil Kumar K S. A systematic review of fundus image analysis for diagnosing diabetic retinopathy[J]. Int J Intell Syst Appl Eng, 2024, 12(16s): 167−181. Google Scholar
[3]	Huang J J, Fan J Y, He Y, et al. Physical compensation method for dispersion of multiple materials in swept source optical coherence tomography[J]. J Biophotonics, 2023, 16(10): e202300167. doi: 10.1002/jbio.202300167 CrossRef Google Scholar
[4]	Ge X, Chen S, Lin K, et al. Deblurring, artifact-free optical coherence tomography with deconvolution-random phase modulation[J]. Opto-Electron Sci, 2024, 3(1): 230020. doi: 10.29026/oes.2024.230020 CrossRef Google Scholar
[5]	Wang J, Zong Y, He Y, et al. Domain adaptation-based automated detection of retinal diseases from optical coherence tomography images[J]. Curr Eye Res, 2023, 48(9): 836−842. doi: 10.1080/02713683.2023.2212878 CrossRef Google Scholar
[6]	Chen Y W, He Y, Ye H, et al. Unified deep learning model for predicting fundus fluorescein angiography image from fundus structure image[J]. J Innov Opt Health Sci, 2024, 17(3): 2450003. doi: 10.1142/S1793545824500032 CrossRef Google Scholar
[7]	Xu X B, Liu D H, Huang G H, et al. Computer aided diagnosis of diabetic retinopathy based on multi-view joint learning[J]. Comput Biol Med, 2024, 174: 108428. doi: 10.1016/j.compbiomed.2024.108428 CrossRef Google Scholar
[8]	杨建文, 黄江杰, 何益, 等. 线聚焦谱域光学相干层析成像的分段色散补偿像质优化方法[J]. 光电工程, 2024, 51(6): 240042. doi: 10.12086/oee.2024.240042 CrossRef Google Scholar Yang J W, Huang J J, He Y, et al. Image quality optimization of line-focused spectral domain optical coherence tomography with subsection dispersion compensation[J]. Opto-Electron Eng, 2024, 51(6): 240042. doi: 10.12086/oee.2024.240042 CrossRef Google Scholar
[9]	Yue G H, Li Y, Zhou T W, et al. Attention-driven cascaded network for diabetic retinopathy grading from fundus images[J]. Biomed Signal Process Control, 2023, 80: 104370. doi: 10.1016/j.bspc.2022.104370 CrossRef Google Scholar
[10]	Khanna M, Singh L K, Thawkar S, et al. Deep learning based computer-aided automatic prediction and grading system for diabetic retinopathy[J]. Multimed Tools Appl, 2023, 82(25): 39255−39302. doi: 10.1007/s11042-023-14970-5 CrossRef Google Scholar
[11]	Durai D B J, Jaya T. Automatic severity grade classification of diabetic retinopathy using deformable ladder Bi attention U-net and deep adaptive CNN[J]. Med Biol Eng Comput, 2023, 61(8): 2091−2113. doi: 10.1007/s11517-023-02860-9 CrossRef Google Scholar
[12]	Wang Y P, Wang L J, Guo Z Q, et al. A graph convolutional network with dynamic weight fusion of multi-scale local features for diabetic retinopathy grading[J]. Sci Rep, 2024, 14(1): 5791. doi: 10.1038/s41598-024-56389-4 CrossRef Google Scholar
[13]	欧阳继红, 郭泽琪, 刘思光. 糖尿病视网膜病变分期双分支混合注意力决策网络[J]. 吉林大学学报(工学版), 2022, 52(3): 648−656. doi: 10.13229/j.cnki.jdxbgxb20200813 CrossRef Google Scholar Ouyang J H, Guo Z Q, Liu S G. Dual-branch hybrid attention decision net for diabetic retinopathy classification[J]. J Jilin Univ (Eng Technol Ed), 2022, 52(3): 648−656. doi: 10.13229/j.cnki.jdxbgxb20200813 CrossRef Google Scholar
[14]	Vij R, Arora S. A novel deep transfer learning based computerized diagnostic systems for Multi-class imbalanced diabetic retinopathy severity classification[J]. Multimed Tools Appl, 2023, 82(22): 34847−34884. doi: 10.1007/s11042-023-14963-4 CrossRef Google Scholar
[15]	Wang W H, Xie E Z, Li X, et al. Pvt v2: improved baselines with pyramid vision transformer[J]. Comput Visual Med, 2022, 8(3): 415−424. doi: 10.1007/s41095-022-0274-8 CrossRef Google Scholar
[16]	Vellaichamy A S, Swaminathan A, Varun C, et al. Multiple plant leaf disease classification using densenet-121 architecture[J]. Int J Electr Eng Technol, 2021, 12(5): 38−57 doi: 10.34218/IJEET.12.5.2021.005 CrossRef Google Scholar
[17]	Si Y Z, Xu H Y, Zhu X Z, et al. SCSA: exploring the synergistic effects between spatial and channel attention[J]. arXiv: 2407.05128, 2024. https://doi.org/10.48550/arXiv.2407.05128 Google Scholar
[18]	Nam J H, Syazwany N S, Kim S J, et al. Modality-agnostic domain generalizable medical image segmentation by multi-frequency in multi-scale attention[C]//Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2024: 11480–11491. https://doi.org/10.1109/CVPR52733.2024.01091. Google Scholar
[19]	Zhou H, Luo F L, Zhuang H P, et al. Attention multihop graph and multiscale convolutional fusion network for hyperspectral image classification[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5508614. doi: 10.1109/TGRS.2023.3265879 CrossRef Google Scholar
[20]	Yang L X, Zhang R Y, Li L D, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the 38th International Conference on Machine Learning, Oxford, UK, 2021: 11863–11874. Google Scholar
[21]	Rezaei-Dastjerdehei M R, Mijani A, Fatemizadeh E. Addressing imbalance in multi-label classification using weighted cross entropy loss function[C]//Proceedings of the 2020 27th National and 5th International Iranian Conference on Biomedical Engineering, Tehran, Iran, 2020: 333–338. https://doi.org/10.1109/ICBME51989.2020.9319440. Google Scholar
[22]	Mukhoti J, Kulharia V, Sanyal A, et al. Calibrating deep neural networks using focal loss[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 2020: 1282. Google Scholar
[23]	梁礼明, 金家新, 冯耀, 等. 融合坐标感知与混合提取的视网膜病变分级算法[J]. 光电工程, 2024, 51(1): 230276. doi: 10.12086/oee.2024.230276 CrossRef Google Scholar Liang L M, Jin J X, Feng Y, et al. Retinal lesions graded algorithm that integrates coordinate perception and hybrid extraction[J]. Opto-Electron Eng, 2024, 51(1): 230276. doi: 10.12086/oee.2024.230276 CrossRef Google Scholar
[24]	Shi L, Wang B, Zhang J X. A multi-stage transfer learning framework for diabetic retinopathy grading on small data[C]//Proceedings of IEEE International Conference on Communications, Rome, Italy, 2023: 3388–3393. https://doi.org/10.1109/ICC45041.2023.10279479. Google Scholar
[25]	Bhardwaj C, Jain S, Sood M. Transfer learning based robust automatic detection system for diabetic retinopathy grading[J]. Neural Comput Appl, 2021, 33(20): 13999−14019. doi: 10.1007/s00521-021-06042-2 CrossRef Google Scholar
[26]	Liu D C, Zhao L J, Wang Y, et al. Learn from each other to Classify better: cross-layer mutual attention learning for fine-grained visual classification[J]. Pattern Recogn, 2023, 140: 109550. doi: 10.1016/j.patcog.2023.109550 CrossRef Google Scholar
[27]	Song J W, Yang R Y. Feature boosting, suppression, and diversification for fine-grained visual classification[C]//Proceedings of 2021 International Joint Conference on Neural Networks, Shenzhen, China, 2021: 1–8. https://doi.org/10.1109/IJCNN52387.2021.9534004. Google Scholar
[28]	Shaik N S, Cherukuri T K. Lesion-aware attention with neural support vector machine for retinopathy diagnosis[J]. Mach Vis Appl, 2021, 32(6): 126. doi: 10.1007/s00138-021-01253-y CrossRef Google Scholar
[29]	Kobat S G, Baygin N, Yusufoglu E, et al. Automated diabetic retinopathy detection using horizontal and vertical patch division-based pre-trained DenseNET with digital fundus images[J]. Diagnostics, 2022, 12(8): 1975. doi: 10.3390/diagnostics12081975 CrossRef Google Scholar
[30]	Oulhadj M, Riffi J, Chaimae K, et al. Diabetic retinopathy prediction based on deep learning and deformable registration[J]. Multimed Tools Appl, 2022, 81(20): 28709−28727. doi: 10.1007/s11042-022-12968-z CrossRef Google Scholar

Overview

Overview

Diabetic retinopathy (DR) is a retinal disease caused by microvascular leakage and obstruction resulting from chronic diabetes. Delayed treatment can lead to irreversible vision impairment. However, the number of diabetic patients is increasing year by year, and the retinal fundus lesions are complex and diverse, which makes accurate diagnosis difficult. Even though retinal imaging can reveal structural changes in the retina, screening for ocular lesions remains time-consuming and labor-intensive for experienced clinicians. Therefore, developing an automated DR grading algorithm is of great significance for clinical medical diagnosis. In recent years, deep learning has made significant progress in the field of diabetic retinopathy grading, especially with the widespread application of convolutional neural networks (CNN) in image processing. CNNs can automatically extract multi-level features from images, thus improving the accuracy of retinal disease detection. These advancements not only enhance the grading accuracy of diabetic retinopathy but also provide ophthalmologists with more efficient diagnostic tools, promoting the application of intelligent diagnostic systems in clinical settings. However, there are still some shortcomings in the retinal disease grading task: the class distribution in datasets is imbalanced, and the lesion features in retinal images often present small and complex shapes, making them difficult to identify. Additionally, it is challenging to balance both macro and micro features simultaneously. To address these issues, this paper proposes a retinal disease grading algorithm that integrates PVTv2 and DenseNet121 with dual attention mechanisms. The algorithm first uses a dual-branch network consisting of PVTv2 and DenseNet121 to extract global and local information from retinal images. Then, spatial-channel collaborative attention modules and multi-frequency multi-scale modules are applied at the outputs of PVTv2 and DenseNet121 to optimize local feature details, highlight micro-lesion features, and improve the model's sensitivity to complex micro-lesion characteristics and its ability to locate lesions. Furthermore, a neuron-cross-fusion module is designed to establish long-range dependencies between macroscopic lesion layout and microscopic texture information, thus improving the grading accuracy of retinal diseases. Finally, a hybrid loss function is used to mitigate the imbalance in model attention across different grades caused by uneven sample distribution. The algorithm is experimentally validated on the IDRID and APTOS 2019 datasets. On the IDRID dataset, the secondary weighted coefficient, accuracy, sensitivity, and specificity are 90.68%, 80.58%, 95.65%, and 97.06%, respectively. On the APTOS 2019 dataset, the secondary weighted coefficient, accuracy, sensitivity, and area under the ROC curve are 90.35%, 84.83%, 87.94%, and 93.22%, respectively. The experimental results show that the proposed algorithm has significant application value in retinal disease grading and provides a new approach for intelligent grading and clinical diagnosis assistance for retinal diseases.