• 摘要: 针对火焰烟雾检测复杂性高,精度偏低,并且经常会产生漏检误检等问题,提出了轻量化RT-DETR与多尺度特征融合的火焰烟雾检测算法。该方法首先以RT-DETR-r18为基础架构,在主干网络中引入了一种轻量化网络结构EfficientNetV1,降低了模型参数量;接着引入了VoVGSCSP模块,有效减少了特征冗余,并且增强了模型特征提取能力;最后结合双向特征融合金字塔网络BiFPN和ELU激活函数的优点,在BiFPN权重归一化过程中引入ELU激活函数,设计了BiFPN-E结构,提高了模型的跨尺度交互能力,有效提升了模型的检测精度。根据实验结果显示,改进的RT-DETR模型相较于原模型在公开数据集NEWFire和自建数据集E-fire上mAP50分别提高了1.1%和1.8%,mAP50-95分别提高了1.7%和2.2%,参数量下降32%,浮点计算量下降65%。说明改进模型在提高检测精度的同时保持了轻量化要求,满足了在火焰烟雾检测时的精度和效率的实际需求。

       

      Abstract:
      Objective With the advancement of modernization and the development of urbanization, fire safety issues in various public venues, commercial buildings, and residential homes have become increasingly prominent. Fire incidents not only directly threaten lives and property but also generate smoke that often causes irreversible pollution and damage to the ecological environment. Traditional fire detection methods primarily rely on physical devices such as sensors, yet these approaches have limitations—including insufficient sensitivity, frequent false alarms and missed detections, susceptibility to environmental interference, and difficulty in achieving early warning. Consequently, there is an urgent need for more efficient fire detection algorithms. In recent years, rapid advancements in artificial intelligence have driven breakthroughs in deep learning within computer vision, particularly demonstrating exceptional performance in object detection. This has opened new avenues and research perspectives for fire detection studies.
      Methods Addressing current limitations in flame and smoke detection technologies—such as insufficient accuracy in complex scenarios, susceptibility to false negatives and positives, and high computational costs—this research innovatively applies the cutting-edge RT-DETR model from computer vision to fire detection tasks. The aim is to thoroughly explore this model's capability to capture and recognize flame and smoke features in complex, dynamic environments, striving to develop a fire detection algorithm with high detection accuracy, low false alarm rates, and the ability to meet real-time early warning requirements. RT-DETR is an efficient real-time end-to-end detector based on the Transformer architecture. It organically combines the local feature extraction advantages of convolutional neural networks (CNNs) with the global modeling capabilities of Transformers through the introduction of a hybrid encoder. By employing an innovative IoU-aware query selection mechanism to precisely filter initial target queries, it achieves a good balance between detection accuracy and inference speed while significantly reducing computational latency. However, when handling complex, multi-scale, and morphologically variable targets like flames and smoke, this model still suffers from insufficient capture of local information and limited extraction of fine-grained features. This often leads to false negatives and false positives for small targets, limiting its application in scenarios demanding high accuracy. To address this, this paper proposes a lightweight RT-DETR-based flame and smoke detection algorithm with multi-scale feature fusion to enhance the model's perception and discrimination capabilities for such targets. We first adopt RT-DETR-r18 as the baseline model, introducing the lightweight EfficientNetV1 architecture into the backbone network. Through evaluation and experimentation with mainstream lightweight networks like GhostNet, RepViT, and EfficientViT, results demonstrate that EfficientNetV1 achieves the optimal balance between accuracy and efficiency via its composite scaling strategy. It significantly reduces parameter count and computational burden while maintaining robust feature extraction capabilities, providing the optimal feature foundation for subsequent detection tasks. Second, the VoVGSCSP module is designed at the model's neck. This architecture integrates the efficient feature reuse of VoVNet with the gradient splitting concept of CSPNet, effectively enhancing the model's expression capabilities for key features like flame-smoke texture and color while suppressing feature redundancy. Finally, addressing the core challenge of extreme scale variations in flame-smoke targets, this paper introduces the more efficient multi-scale feature fusion mechanism bidirectional feature pyramid network (BiFPN)-E. During the weighted fusion process in the BiFPN, it innovatively employs the ELU activation function for weight normalization. The non-zero saturation property of the ELU function when handling negative inputs generates smoother and more discriminative fusion weights, thereby strengthening the interaction and complementarity between feature maps at different scales. This significantly improves the model's perception and discrimination accuracy in complex scenarios such as large-scale flames, faint smoke, and occluded objects.
      Results and Discussions The final experimental results fully validate the comprehensive effectiveness of the proposed method. On the NEWFire dataset, the improved model achieved mAP50 and mAP50-95 scores of 84.5% and 57.0%, respectively, representing improvements of 1.1% and 1.7% over the baseline RT-DETR model. On the self-built E-fire dataset, the accuracy metrics mAP50 and mAP50-95 improved by 1.8% and 2.2%, respectively. More importantly, while achieving these accuracy gains, the model demonstrated significant lightweighting: model parameters were reduced by 32%, and floating-point operations were drastically reduced by 65%. In-depth ablation experiments further confirm that the BiFPN-E architecture is the most critical contributor to accuracy gains, while the EfficientNetV1 backbone serves as the core for achieving efficient model compression. Visual analysis also reveals that the improved model's feature heatmap responses are more concentrated on actual flame-smoke regions, demonstrating stronger suppression of fire-like distractors and complex backgrounds, higher recognition accuracy, and effectively reducing both false negatives and false positives.
      Conclusions Through systematic enhancements to the RT-DETR model, this research achieves a synergistic breakthrough in accuracy, efficiency, and feature discriminative power for fire detection. The EfficientNetV1 backbone enables efficient model compression, while the introduced VoVGSCSP module enhances the model's ability to capture irregular fire shapes and dynamic changes. The BiFPN-E architecture significantly improves detection accuracy through optimized multi-scale feature fusion. The synergistic effect of these three components enables the model to maintain high accuracy while reducing parameters by 32% and computational load by 65%. This breakthrough overcomes computational resource constraints while enhancing performance, providing a practical solution for achieving high-precision real-time fire alerts on resource-constrained edge devices.