轻量型Swin Transformer与多尺度特征融合相结合的人脸表情识别方法

李艳秋; 李胜赵; 孙光灵; 颜普

doi:10.12086/oee.2025.240234

轻量型Swin Transformer与多尺度特征融合相结合的人脸表情识别方法

Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition

摘要: 针对Swin Transformer模型应用在表情识别上参数量过大、实时性较差和对表情中存在的复杂且微小的表情变化特征捕捉能力有限的问题，提出了一个轻量型Swin Transformer和多尺度特征融合 (EMA)模块相结合的人脸表情识别方法。该方法首先利用提出的SPST模块替换掉原Swin Transformer模型第四个stage中的Swin Transformer block模块，来降低模型的参数量，实现模型的轻量化。然后在轻量型模型的第二个stage后嵌入了多尺度特征融合 (EMA)模块，通过多尺度特征提取和跨空间信息聚合，有效地增强了模型对人脸表情细节的捕捉能力，从而提高人脸表情识别的准确性和鲁棒性。实验结果表明，所提方法在JAFFE、FERPLUS、RAF-DB和FANE这4个公共数据集上分别达到了97.56%、86.46%、87.29%和70.11%的识别准确率，且相比于原Swin Transformer模型，改进后的模型参数量下降了15.8%，FPS提升了9.6%，在保持模型较低参数量的同时，显著增强了模型的实时性。

Abstract: A lightweight Swin Transformer and multi-scale feature fusion (EMA) module combination is proposed for face expression recognition, which addresses the problems of the Swin Transformer model, such as excessive parameter quantity, poor real-time performance, and limited ability to capture the complex and small expression change features present in the expressions. The method first uses the proposed SPST module to replace the Swin Transformer block module in the fourth stage of the original Swin Transformer model to reduce the number of parameters of the model and realize the lightweight model. Then, the multi-scale feature fusion (EMA) module is embedded after the second stage of the lightweight model, which effectively improves the model's ability to capture the details of facial expressions through multi-scale feature extraction and cross-space information aggregation, thus improving the accuracy and robustness of facial expression recognition. The experimental results show that the proposed method achieves 97.56%, 86.46%, 87.29%, and 70.11% recognition accuracy on four public datasets, namely, JAFFE, FERPLUS, RAF-DB, and FANE, respectively. Compared with the original Swin Transformer model, the number of parameters of the improved model is decreased by 15.8% and the FPS is improved by 9.6%, which significantly enhances the real-time performance of the model while keeping the number of parameters of the model low.

轻量型Swin Transformer与多尺度特征融合相结合的人脸表情识别方法

Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition

相关链接

目录

轻量型Swin Transformer与多尺度特征融合相结合的人脸表情识别方法

Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition

相关链接

目录

微信二维码