• 摘要: 人-物交互检测任务的目标是定位并且识别图像中人与其周围物体的交互关系。该任务的挑战在于机器无法知道人具体和哪些物体存在交互关系,现有方法大多对人和物进行完全配对来解决这个问题。与他们不同,本文提出了一种基于关系推理的交互实例推荐网络来适应人-物交互检测任务,主要想法是利用人和物体的视觉关系中潜在的交互关系来推荐人-物对。此外,本文还设计了一个跨模态信息融合模块,对不同的上下文信息根据其对检测结果的影响程度进行融合,以此提高检测精度。本文在HICO-DET和V-COCO数据集上进行了充分的实验来验证所提出的方法,结果表明,本文方法在HICO-DET和V-COCO数据集上的mAP达到了19.90%和50.3%,分别比基准网络高了4.5%和2.8%。

       

      Abstract: Human-object interaction detection is to locate and identify the interactive relationship between humans and objects in an image. The challenge is that the machine cannot know which object the person is interacting in. Most existing methods try to solve this problem by matching humans and objects exactly. Different from them, this paper proposes an interactive instance proposal network based on relational reasoning to adapt to the task. Our main idea is to recommend human-object pairs by using the potential interaction relationships in the visual relationship between humans and objects. In addition, a cross-modal information fusion module is designed to fuse different context information according to its influence on the detection result, so as to improve the detection accuracy. To evaluate the proposed method, we performed sufficient experiments on two large-scale datasets: HICO-DET and V-COCO. Results show that our method achieves 19.90% and 50.3% mAP on HICO-DET and V-COCO, which are 4.5% and 2.8% higher than our baseline, respectively.