Abstract:
Objective In clinical practice, pathological examination is considered the "gold standard" for diagnosing various types of cancer. In the field of pathological image intelligent diagnosis, the detection performance of deep learning models typically relies on large-scale and high-quality manually annotated datasets. Due to the high cost and expertise required in annotating medical images, large-scale annotation is often infeasible, resulting in challenges such as limited training data and significant domain variation in clinical scenarios. Therefore, achieving accurate, efficient, and generalizable medical image recognition under limited annotation conditions has become a important challenge in the field of intelligent medical image processing. Mainstream methods adopt multiple instance learning (MIL), which enables whole slide image(WSI) classification using only slide-level labels. However, MIL-based approaches still suffer from issues such as overly concentrated attention allocation and model overfitting when handling complex pathological patterns.
Methods To address these challenges, this study proposes a MIL framework based on a mixed distillation training strategy. First, Starting from publicly available datasets, the processes of image segmentation, image partitioning, and image feature extraction for digital pathology slides have been implemented. The OTSU segmentation method is used to remove the background images that do not contain any pathological information. Each WSI in the dataset is divided into non-overlapping image patchs to facilitate model training and feature extraction. These preprocessing steps help improve the efficiency and accuracy of the model, enabling it to focus more on learning the key pathological features. The self-supervised Vision Transformer-S/16(ViT-S/16) was used to train the model on unlabeled pathological image data. The ViT-S/16, which is a self-supervised learning method based on contrastive learning, can learn key feature representations from images. This process enables the model to learn transferable features from unlabeled data and provide better initialization for the downstream classification task. Then, a mixed distillation mechanism multiple instance learning (MDM-MIL) model integrating dual-path adaptive attention and random instance concealing is designed. The adaptive attention module dynamically adjusts the degree of focus on different texture features and captures semantic relationships between them to further enhance the classification accuracy. The first path adaptive gated attention submodule calculates the attention scores of all instances, and then aggregates the attention scores adjusted by random instance masking to integrate the remaining instance information. The second path uses an adaptive multi-head attention submodule to calculate the attention scores of all instances, and then aggregates the attention scores adjusted by random instance masking to integrate the remaining instance information. This adaptive method not only enhances the model's ability to recognize local features but also significantly improves its understanding of global context. Meanwhile, the random instance concealing (RIC) mechanism randomly masks instances with high attention scores during training, forcing the model to explore additional informative regions and mitigating overfitting to a few critical patches. Mixed distillation mechanism (MDM) integrates the feature information of the remaining instances from the two paths after being processed by the RIC method, thereby enhancing the learning efficiency and classification accuracy of the model. This method captures semantic relationships between different texture features of an image to further increase the classification accuracy. Finally, the classifier achieved accurate classification results on a dataset of breast cancer whole slide pathological images. This paper proposes a multi-instance learning model based on a mixed distillation strategy to address the issues of excessive attention allocation and model overfitting in existing multi instance learning. This model combines a collaborative training dual-path adaptive attention model with a random instance masking method, and ultimately uses a mixed distillation strategy to concatenate the aggregation results of the dual paths. The model can fully learn more key instance feature information while considering the correlation between various types of instances, ultimately improving the performance of the model and alleviating model overfitting.
Results and Discussions This study has the characteristics of uneven distribution of categories and weakened sample labels. Therefore, AUC and F1-score are comprehensively adopted as the main evaluation indicators to ensure the scientificity and comprehensiveness of the performance assessment. Experimental results demonstrate the superiority of the proposed method. On the Camelyon16 dataset, the model achieves an accuracy of 98.445%, an AUC of 0.984, and an F1-score of 0.983. On the BRACS dataset, the corresponding values reach 77.852%, 0.905, and 0.764, respectively, showing significant improvements over existing MIL approaches. The experimental results on the Camelyon16 and BRACS datasets show that compared with other advanced multiple instance learning models, This model has significantly better classification metrics, indicating that our model has more advantages in pathological image classification.
Conclusions This thesis proposes novel algorithm designs and conducts comparative experimental evaluations on corresponding public datasets, achieving superior results compared to the current state-of-the-art methods. This paper contributes to the advancement of automatic analysis techniques for pathological images based on deep learning, which holds significance in assisting cancer diagnosis, and improving the efficiency of clinical analysis.