New website getting online, testing
    • 摘要: 针对人群计数在密集场景下存在背景复杂、尺度变化大等问题,提出了一种结合全局-局部注意力的弱监督密集场景人群计数模型——GLCrowd。首先,设计了一种结合深度卷积的局部注意力模块,通过上下文权重增强局部特征,同时结合特征权重共享获得高频局部信息。其次,利用Vision Transformer (ViT)的自注意力机制捕获低频全局信息。最后,将全局与局部注意力有效融合,并通过回归令牌来完成计数。在Shanghai Tech PartA、Shanghai Tech PartB、UCF-QNRF以及UCF_CC_50数据集上进行了模型测试,MAE分别达到了64.884、8.958、95.523、209.660,MSE分别达到了104.411、16.202、173.453、282.217。结果表明,提出的GLCrowd网络模型在密集场景下的人群计数中具有较好的性能。

       

      Abstract: To address the challenges of crowd counting in dense scenes, such as complex backgrounds and scale variations, we propose a weakly supervised crowd counting model for dense scenes, named GLCrowd, which integrates global and local attention mechanisms. First, we design a local attention module combined with deep convolution to enhance local features through context weights while leveraging feature weight sharing to capture high-frequency local information. Second, the Vision Transformer (ViT) self-attention mechanism is used to capture low-frequency global information. Finally, the global and local attention mechanisms are effectively fused, and counting is accomplished through a regression token. The model was tested on the Shanghai Tech Part A, Shanghai Tech Part B, UCF-QNRF, and UCF_CC_50 datasets, achieving MAE values of 64.884, 8.958, 95.523, and 209.660, and MSE values of 104.411, 16.202, 173.453, and 282.217, respectively. The results demonstrate that the proposed GLCrowd model exhibits strong performance in crowd counting within dense scenes.