He Y H, Chen Y W, Fan J Y, et al. An instrument detection method for complex retinal microsurgery[J]. Opto-Electron Eng, 2025, 52(2): 240269. doi: 10.12086/oee.2025.240269
Citation: He Y H, Chen Y W, Fan J Y, et al. An instrument detection method for complex retinal microsurgery[J]. Opto-Electron Eng, 2025, 52(2): 240269. doi: 10.12086/oee.2025.240269

An instrument detection method for complex retinal microsurgery

    Fund Project: National Key R&D Program of China (2021YFF0700503, 2022YFC2404201), CAS Project for Young Scientists in Basic Research (YSBR-067), Jiangsu Science and Technology Plan Program (BK20220263), Gusu Innovation and Entrepreneurship Leading Talents in Suzhou City (ZXL2021425), and Suzhou Basic Research Pilot Project (SSD2023018)
More Information
  • To address the challenges posed by complex interference in retinal microsurgery, this study presents a deep learning-based algorithm for surgical instrument detection. The RET1 dataset was first constructed and meticulously annotated to provide a reliable basis for training and evaluation. Building upon the YOLO framework, this study introduces the SGConv and RGSCSP feature extraction modules, specifically designed to enhance the model's capability to capture fine-grained image details, especially in scenarios involving degraded image quality. Furthermore, to address the issues of slow convergence in IoU loss and inaccuracies in bounding box regression, the DeltaIoU bounding box loss function was proposed to improve both detection precision and training efficiency. Additionally, the integration of dynamic and decoupled heads optimizes feature fusion, further enhancing the detection performance. Experimental results demonstrate that the proposed method achieves 72.4% mAP50-95 on the RET1 dataset, marking a 3.8% improvement over existing algorithms. The method also exhibits robust performance in detecting surgical instruments under various complex surgical scenarios, underscoring its potential to support automatic tracking in surgical microscopes and intelligent surgical navigation systems.
  • 加载中
  • [1] Ma L, Fei B W. Comprehensive review of surgical microscopes: technology development and medical applications[J]. J Biomed Opt, 2021, 26(1): 010901. doi: 10.1117/1.JBO.26.1.010901

    CrossRef Google Scholar

    [2] Ehlers J P, Dupps W J, Kaiser P K, et al. The prospective intraoperative and perioperative ophthalmic ImagiNg with optical CoherEncE TomogRaphy (PIONEER) study: 2-year results[J]. Am J Ophthalmol, 2014, 158(5): 999−1007. e1 doi: 10.1016/j.ajo.2014.07.034

    CrossRef Google Scholar

    [3] Ravasio C S, Pissas T, Bloch E, et al. Learned optical flow for intra-operative tracking of the retinal fundus[J]. Int J Comput Assist Radiol Surg, 2020, 15(5): 827−836. doi: 10.1007/s11548-020-02160-9

    CrossRef Google Scholar

    [4] 李云耀, 樊金宇, 蒋天亮, 等. 光学相干层析技术在眼科手术导航方面的研究进展[J]. 光电工程, 2023, 50(1): 220027. doi: 10.12086/oee.2023.220027

    CrossRef Google Scholar

    Li Y Y, Fan J Y, Jiang T L, et al. Review of the development of optical coherence tomography imaging navigation technology in ophthalmic surgery[J]. Opto-Electron Eng, 2023, 50(1): 220027. doi: 10.12086/oee.2023.220027

    CrossRef Google Scholar

    [5] 杨建文, 黄江杰, 何益, 等. 线聚焦谱域光学相干层析成像的分段色散补偿像质优化方法[J]. 光电工程, 2024, 51(6): 240042. doi: 10.12086/oee.2024.240042

    CrossRef Google Scholar

    Yang J W, Huang J J, He Y, et al. Image quality optimization of line-focused spectral domain optical coherence tomography with subsection dispersion compensation[J]. Opto-Electron Eng, 2024, 51(6): 240042. doi: 10.12086/oee.2024.240042

    CrossRef Google Scholar

    [6] Bouget D, Allan M, Stoyanov D, et al. Vision-based and marker-less surgical tool detection and tracking: a review of the literature[J]. Med Image Anal, 2017, 35: 633−654. doi: 10.1016/j.media.2016.09.003

    CrossRef Google Scholar

    [7] Allan M, Ourselin S, Thompson S, et al. Toward detection and localization of instruments in minimally invasive surgery[J]. IEEE Trans Biomed Eng, 2013, 60(4): 1050−1058. doi: 10.1109/TBME.2012.2229278

    CrossRef Google Scholar

    [8] Alsheakhali M, Yigitsoy M, Eslami A, et al. Real time medical instrument detection and tracking in microsurgery[C]//Proceedings of the Algorithmen-Systeme-Anwendungen on Bildverarbeitung für die Medizin, Lübeck, 2015: 185–190. https://doi.org/10.1007/978-3-662-46224-9_33.

    Google Scholar

    [9] Sznitman R, Richa R, Taylor R H, et al. Unified detection and tracking of instruments during retinal microsurgery[J]. IEEE Trans Pattern Anal Mach Intell, 2013, 35(5): 1263−1273. doi: 10.1109/TPAMI.2012.209

    CrossRef Google Scholar

    [10] Sun Y W, Pan B, Fu Y L. Lightweight deep neural network for articulated joint detection of surgical instrument in minimally invasive surgical robot[J]. J Digit Imaging, 2022, 35(4): 923−937. doi: 10.1007/s10278-022-00616-9

    CrossRef Google Scholar

    [11] Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.

    Google Scholar

    [12] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031

    CrossRef Google Scholar

    [13] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. https://doi.org/10.1109/CVPR.2016.91.

    Google Scholar

    [14] Sarikaya D, Corso J J, Guru K A. Detection and localization of robotic tools in robot-assisted surgery videos using deep neural networks for region proposal and detection[J]. IEEE Trans Med Imaging, 2017, 36(7): 1542−1549. doi: 10.1109/TMI.2017.2665671

    CrossRef Google Scholar

    [15] Zhang B B, Wang S S, Dong L Y, et al. Surgical tools detection based on modulated anchoring network in laparoscopic videos[J]. IEEE Access, 2020, 8: 23748−23758. doi: 10.1109/ACCESS.2020.2969885

    CrossRef Google Scholar

    [16] Pan X Y, Bi M R, Wang H, et al. DBH-YOLO: a surgical instrument detection method based on feature separation in laparoscopic surgery[J]. Int J Comput Assist Radiol Surg, 2024, 19(11): 2215−2225. doi: 10.1007/s11548-024-03115-0

    CrossRef Google Scholar

    [17] Zhao Z J, Chen Z R, Voros S, et al. Real-time tracking of surgical instruments based on spatio-temporal context and deep learning[J]. Comput Assist Surg, 2019, 24(S1): 20−29. doi: 10.1080/24699322.2018.1560097

    CrossRef Google Scholar

    [18] Dai X Y, Chen Y P, Xiao B, et al. Dynamic head: unifying object detection heads with attentions[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 7373–7382. https://doi.org/10.1109/CVPR46437.2021.00729.

    Google Scholar

    [19] Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020: 1580–1589. https://doi.org/10.1109/CVPR42600.2020.00165.

    Google Scholar

    [20] Li H L, Li J, Wei H B, et al. Slim-neck by GSConv: a lightweight-design for real-time detector architectures[J]. J Real Time Image Process, 2024, 21(3): 62. doi: 10.1007/s11554-024-01436-6

    CrossRef Google Scholar

    [21] Ding X H, Zhang X Y, Ma N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 13733–13742. https://doi.org/10.1109/CVPR46437.2021.01352.

    Google Scholar

    [22] Ma S L, Xu Y. MPDIoU: a loss for efficient and accurate bounding box regression[Z]. arXiv: 2307.07662, 2023. https://doi.org/10.48550/arXiv.2307.07662.

    Google Scholar

    [23] Zhao Y, Lv W Y, Xu S L, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2024: 16965–16974. https://doi.org/10.1109/CVPR52733.2024.01605.

    Google Scholar

    [24] Redmon J, Farhadi A. YOLOv3: an incremental improvement[Z]. arXiv: 1804.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767.

    Google Scholar

    [25] Li C Y, Li L L, Jiang H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[Z]. arXiv: 2209.02976, 2022. https://doi.org/10.48550/arXiv.2209.02976.

    Google Scholar

    [26] Wang C Y, Yeh I H, Liao H Y M. YOLOv9: learning what you want to learn using programmable gradient information[C]//Proceedings of the 18th European Conference on Computer Vision, Milan, 2025. https://doi.org/10.1007/978-3-031-72751-1_1.

    Google Scholar

    [27] Wang A, Chen H, Liu L H, et al. YOLOv10: real-time end-to-end object detection[Z]. arXiv: 2405.14458, 2024. https://doi.org/10.48550/arXiv.2405.14458.

    Google Scholar

  • The integration of computer vision into ophthalmic surgical procedures, particularly in digital navigation microscopes, has opened new avenues for real-time instrument tracking. Accurate localization of surgical instruments during retinal surgery presents unique challenges, such as reflections, motion artifacts, and obstructions, which impede precise detection. To address these challenges, this study introduces RM-YOLO, a specialized deep learning-based detection algorithm tailored for retinal microsurgery. The model is designed to ensure accurate instrument localization in real-time, offering substantial advancements over existing approaches.

    Given the scarcity of annotated data specific to retinal microsurgery, the RET1 dataset was constructed, derived from high-resolution surgical videos and manually annotated for three primary instruments: vitrectomy cutter, light pipe, and peeling forceps. This dataset encompasses various surgical conditions, including occlusions, low-light environments, and reflections, ensuring robust model training and evaluation.

    The proposed algorithm leverages a customized YOLO framework and incorporates novel modules to enhance performance. The SGConv and RGSCSP modules were specifically designed to improve feature extraction capabilities, addressing the limitations of conventional convolutional layers by employing channel shuffling and re-parameterization techniques to maximize feature diversity and minimize parameter count. Additionally, a dynamic head architecture was implemented to integrate multi-scale, spatial, and task-specific attention mechanisms, enhancing the model's ability to capture complex features across varying scales. For bounding box regression, DeltaIoU loss was introduced as a refined metric that improves convergence speed and accuracy, particularly in ambiguous annotation scenarios.

    Extensive experiments on the RET1 dataset demonstrate that RM-YOLO achieves an mAP50-95 of 72.4%, outperforming existing models in precision and recall with only 7.4 million parameters and 20.7 GFLOPs. Comparative analysis with traditional and modern detection models, including Faster R-CNN, YOLO series, and RT-DETR, reveals that RM-YOLO not only achieves superior accuracy but also addresses the high rate of missed detections common in retinal microsurgery applications.

    The ablation studies underscore the contributions of each module, with dynamic head and RSGCSP modules providing significant boosts in model performance by enhancing the robustness of feature representation. DeltaIoU loss further complements these improvements by ensuring precise bounding box regression in challenging visual conditions.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(14)

Tables(3)

Article Metrics

Article views() PDF downloads() Cited by()

Access History

Other Articles By Authors

Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint