基于混合蒸馏的弱监督病理图像识别研究

陈金令; 刘蓉; 唐卓葳; 柯琦; 季语祝; 高子清

doi:10.12086/oee.2026.250258

基于混合蒸馏的弱监督病理图像识别研究

Research on whole slide image recognition based on weakly supervised learning and mixed distillation mechanism

摘要: 在医学图像智能诊断领域中，深度学习模型的检测性能往往依赖大规模高质量的人工标注数据集。鉴于数字病理全切片扫描图像 (whole slide image, WSI)多尺度、大尺寸和标注困难等特点，现有算法大多基于多示例学习 (multiple instance learning, MIL)方法，该方法只需图像级标签就能实现WSI分类。然而，MIL依然面临着注意力分配过于集中和模型过拟合等问题。本文提出了一种基于混合蒸馏训练策略的多示例学习模型。首先基于自监督学习的Vision Transformer-S/16进行预训练，降低对标注数据的依赖。然后提出了一种双路自适应注意力和随机示例屏蔽相结合的混合蒸馏模型：该模型利用自适应注意力模块实现了对纹理特征关注程度的动态调整，并对纹理间的相关性进行分析；采用随机示例屏蔽方法，随机屏蔽具有高注意力分数的示例，提高了对关键示例的关注程度；引入混合蒸馏机制，该机制融合随机屏蔽后剩余示例的特征信息。最后，利用分类器对整个包的阴阳性进行预测，实现乳腺癌全切片病理图像的高精度分类。实验结果表明，该模型在Camelyon16数据集上，其Acc、AUC、F1-score值分别达到98.445%、0.984、0.983，在BRACS数据集上达到77.852%、0.905、0.764，显著提升了检测性能。

Abstract:

Objective In clinical practice, pathological examination is considered the "gold standard" for diagnosing various types of cancer. In the field of pathological image intelligent diagnosis, the detection performance of deep learning models typically relies on large-scale and high-quality manually annotated datasets. Due to the high cost and expertise required in annotating medical images, large-scale annotation is often infeasible, resulting in challenges such as limited training data and significant domain variation in clinical scenarios. Therefore, achieving accurate, efficient, and generalizable medical image recognition under limited annotation conditions has become a important challenge in the field of intelligent medical image processing. Mainstream methods adopt multiple instance learning (MIL), which enables whole slide image(WSI) classification using only slide-level labels. However, MIL-based approaches still suffer from issues such as overly concentrated attention allocation and model overfitting when handling complex pathological patterns.

Methods To address these challenges, this study proposes a MIL framework based on a mixed distillation training strategy. First, Starting from publicly available datasets, the processes of image segmentation, image partitioning, and image feature extraction for digital pathology slides have been implemented. The OTSU segmentation method is used to remove the background images that do not contain any pathological information. Each WSI in the dataset is divided into non-overlapping image patchs to facilitate model training and feature extraction. These preprocessing steps help improve the efficiency and accuracy of the model, enabling it to focus more on learning the key pathological features. The self-supervised Vision Transformer-S/16(ViT-S/16) was used to train the model on unlabeled pathological image data. The ViT-S/16, which is a self-supervised learning method based on contrastive learning, can learn key feature representations from images. This process enables the model to learn transferable features from unlabeled data and provide better initialization for the downstream classification task. Then, a mixed distillation mechanism multiple instance learning (MDM-MIL) model integrating dual-path adaptive attention and random instance concealing is designed. The adaptive attention module dynamically adjusts the degree of focus on different texture features and captures semantic relationships between them to further enhance the classification accuracy. The first path adaptive gated attention submodule calculates the attention scores of all instances, and then aggregates the attention scores adjusted by random instance masking to integrate the remaining instance information. The second path uses an adaptive multi-head attention submodule to calculate the attention scores of all instances, and then aggregates the attention scores adjusted by random instance masking to integrate the remaining instance information. This adaptive method not only enhances the model's ability to recognize local features but also significantly improves its understanding of global context. Meanwhile, the random instance concealing (RIC) mechanism randomly masks instances with high attention scores during training, forcing the model to explore additional informative regions and mitigating overfitting to a few critical patches. Mixed distillation mechanism (MDM) integrates the feature information of the remaining instances from the two paths after being processed by the RIC method, thereby enhancing the learning efficiency and classification accuracy of the model. This method captures semantic relationships between different texture features of an image to further increase the classification accuracy. Finally, the classifier achieved accurate classification results on a dataset of breast cancer whole slide pathological images. This paper proposes a multi-instance learning model based on a mixed distillation strategy to address the issues of excessive attention allocation and model overfitting in existing multi instance learning. This model combines a collaborative training dual-path adaptive attention model with a random instance masking method, and ultimately uses a mixed distillation strategy to concatenate the aggregation results of the dual paths. The model can fully learn more key instance feature information while considering the correlation between various types of instances, ultimately improving the performance of the model and alleviating model overfitting.

Results and Discussions This study has the characteristics of uneven distribution of categories and weakened sample labels. Therefore, AUC and F1-score are comprehensively adopted as the main evaluation indicators to ensure the scientificity and comprehensiveness of the performance assessment. Experimental results demonstrate the superiority of the proposed method. On the Camelyon16 dataset, the model achieves an accuracy of 98.445%, an AUC of 0.984, and an F1-score of 0.983. On the BRACS dataset, the corresponding values reach 77.852%, 0.905, and 0.764, respectively, showing significant improvements over existing MIL approaches. The experimental results on the Camelyon16 and BRACS datasets show that compared with other advanced multiple instance learning models, This model has significantly better classification metrics, indicating that our model has more advantages in pathological image classification.

Conclusions This thesis proposes novel algorithm designs and conducts comparative experimental evaluations on corresponding public datasets, achieving superior results compared to the current state-of-the-art methods. This paper contributes to the advancement of automatic analysis techniques for pathological images based on deep learning, which holds significance in assisting cancer diagnosis, and improving the efficiency of clinical analysis.

基于混合蒸馏的弱监督病理图像识别研究

Research on whole slide image recognition based on weakly supervised learning and mixed distillation mechanism

相关链接

目录

基于混合蒸馏的弱监督病理图像识别研究

Research on whole slide image recognition based on weakly supervised learning and mixed distillation mechanism

相关链接

目录

微信二维码