Multi-frequency Transformer-guided graph-based feature aggregation for retinal image quality grading

Abstract

To address the issues of significant sample imbalance among different quality levels and low grading efficiency in retinal image quality grading tasks, this paper proposes a multi-frequency Transformer-guided graph-based feature aggregation method for retinal image quality grading. First, contrast-limited adaptive histogram equalization (CLAHE) is applied to enhance key details in the images. Then, a ResNet50 network is employed for multi-level feature extraction. Next, a frequency-channel transformer module is designed, which incorporates frequency-domain information to assist in global feature modeling, thereby optimizing the balance between international and local features. Subsequently, a graph cross-feature aggregation module is introduced, leveraging a cross-scale cross-attention mechanism to guide image aggregation, aligning multi-source features, and enhancing the model’s sensitivity to multi-level features. Finally, a weighted loss function increases the model’s attention to minority-class samples. Experiments conducted on the Eye-Quality and RIQA-RFMiD datasets achieved accuracy rates of 88.71% and 84.95%, with precision rates of 87.78% and 74.22%, respectively. The experimental results demonstrate that the proposed algorithm holds significant application value in retinal image quality assessment.