-
Graphical Abstract
-
Abstract
To address the issues of significant sample imbalance among different quality levels and low grading efficiency in retinal image quality grading tasks, this paper proposes a multi-frequency Transformer-guided graph-based feature aggregation method for retinal image quality grading. First, contrast-limited adaptive histogram equalization (CLAHE) is applied to enhance key details in the images. Then, a ResNet50 network is employed for multi-level feature extraction. Next, a frequency-channel transformer module is designed, which incorporates frequency-domain information to assist in global feature modeling, thereby optimizing the balance between international and local features. Subsequently, a graph cross-feature aggregation module is introduced, leveraging a cross-scale cross-attention mechanism to guide image aggregation, aligning multi-source features, and enhancing the model’s sensitivity to multi-level features. Finally, a weighted loss function increases the model’s attention to minority-class samples. Experiments conducted on the Eye-Quality and RIQA-RFMiD datasets achieved accuracy rates of 88.71% and 84.95%, with precision rates of 87.78% and 74.22%, respectively. The experimental results demonstrate that the proposed algorithm holds significant application value in retinal image quality assessment.
-
-
-
-