• 摘要: 基于时空记忆模型的视频目标分割方法是当前视频目标分割领域的主流方法,但因其仅在不同的视频帧之间执行密集的内存匹配,通常会导致模型重点关注目标的细节,从而丢失目标的整体信息。针对这一问题,本文提出了一种基于自支持分组匹配的视频目标分割方法。首先,使用语义增强模块捕获目标的整体信息,接着利用自支持机制在查询帧内部执行自支持匹配增强匹配的准确性。最后,传统时空记忆模型对计算量的需求庞大,本文针对时空记忆模型的内存匹配设计了一种分组匹配机制,在减少计算量的同时避免干扰特征影响匹配结果。本文提出的算法在STM、STCN和XMem三种主流时空记忆模型上进行了实现,并在多个公开视频目标分割数据集上进行了验证。实验结果表明,本文提出的算法相比于STCN在DAVIS 2017数据集上的J&F精度提升了1.5%,达到了86.9%,并且FPS从25提升到了30。

       

      Abstract: Memory networks are currently the mainstream approach for video object segmentation. However, because they only perform dense memory matching between different video frames, this tends to cause the model to focus on the details of the target and lose the global information about the target. We propose a video object segmentation method based on self-support grouped matching to address this issue. Firstly, we design a semantic enhancement module to capture the global information of the target, and then we design a self-support module to enhance the matching accuracy. Moreover, memory networks have a high computational cost. We propose a group matching mechanism for memory matching in memory networks, which reduces computational cost while avoiding interference from features that affect the matching results. The algorithm has been implemented on three mainstream spatiotemporal memory models: STM, STCN, and XMem, and has been extensively validated on multiple publicly available video object segmentation datasets. The experimental results show that the algorithm achieves a 1.5% improvement in J&F accuracy compared to STCN on the DAVIS 2017 dataset, reaching 86.9%, and the FPS has been increased from 25 to 30.