动态通道特征校准的池化残差图像分类网络

姜文涛; 赵文雯

doi:10.12086/oee.2026.250309

动态通道特征校准的池化残差图像分类网络

Pooling residual image classification network with dynamic channel feature calibration

摘要: 针对通道特征交互能力不足，导致难以筛选关键特征的问题，本文提出了动态通道特征校准的池化残差分类网络 (pooling residual image classification network with dynamic channel feature calibration, DCPRNet)。DCPRNet以ResNet-34残差网络为基础，首先修改残差网络的首层，将7×7卷积替换为3×3，删除最大池化，提高深层网络特征的表达能力。其次提出分块加权注意力 (chunked weighted attention, CWA)机制，通过特征分块、评分生成等操作，对通道特征分块的重要性权重进行校准，抑制干扰信息，提高关键区域的贡献度。然后，提出动态通道特征校准 (dynamic channel feature calibration, DCFC)模块，该模块由随机子组置换模块和分块加权注意力操作组成，使通道动态交互，增强跨组信息交流，校准特征权重，筛选关键特征。最后，在动态通道特征校准模块的残差分支上，加入平均池化操作，实现平滑下采样，保留更多有效特征。本文方法在数据集CIFAR-10、CIFAR-100、SVHN、Imagenette、Imagewoof上分别达到了96.41%、80.36%、96.97%、91.59%、80.61%的分类准确率，对比现有的主流模型，该网络能提高通道间的信息交互，增强特征表达能力和对关键特征的提取能力，提升模型的分类性能。

Abstract:

Objective Image classification, a foundational core task in computer vision, supports diverse critical applications including clinical medical diagnosis, intelligent security surveillance, remote sensing image interpretation, and autonomous driving scene recognition. Deep learning models such as ResNet, DenseNet, and ViT have notably advanced classification performance in recent years, but traditional networks still bear inherent flaws that restrict their practical applicability in complex real-world scenarios. ResNet, a widely adopted architecture for image classification tasks due to its gradient preservation advantage, lacks sufficient channel feature interaction, which directly impedes the effective selection of key discriminative features. Its initial 7×7 convolution and max-pooling layers trigger excessive downsampling at the early feature extraction stage, resulting in the loss of fine-grained spatial details that are vital for accurate classification of complex images with overlapping objects or subtle differences. Fixed channel grouping in traditional convolutions further isolates cross-channel feature information, limiting the model’s ability to integrate multi-dimensional feature cues and capture inter-channel correlations. Existing attention mechanisms like SE and CBAM only focus on single dimensions—either channel or spatial—failing to achieve dynamic and comprehensive feature calibration varying image contents. This study addresses these critical limitations, enhances cross-channel information interaction, strengthens feature representation and key feature extraction capabilities, and ultimately improves the overall classification performance and generalization ability of the model.

Methods To achieve this objective, a pooling residual image classification network with dynamic channel feature calibration (DCPRNet) was proposed, which is optimized and improved based on the ResNet-34 architecture to address its inherent deficiencies. Three core targeted improvements were integrated into the network to overcome the drawbacks of traditional models while maintaining computational feasibility. First, the initial layer was optimized: the 7×7 convolution was replaced with a 3×3 convolution to reduce parameter complexity by nearly half while retaining robust feature extraction capability, and the max-pooling layer was removed to fully preserve fine-grained spatial information for subsequent feature processing, fusion, and calibration. Second, a dynamic channel feature calibration (DCFC) module was designed, which combines a random subgroup shuffle module and chunked weighted attention (CWA) mechanism. The random subgroup shuffle module breaks the isolation between channel groups via dual-layer grouping and random shuffling, effectively promoting cross-group information interaction and dynamic channel communication without introducing excessive computation. The CWA mechanism calibrates the weights of feature chunks through feature chunking, local importance score generation, mean pooling-based global information aggregation, and sigmoid normalization, thereby suppressing redundant background information and enhancing the contribution of critical feature regions related to target classification. Third, an average pooling-based residual module was added to the DCFC branch, using 1×1 convolution and 2×2 average pooling instead of traditional stride convolution to realize low-cost smooth downsampling, reduce feature distortion, and better preservation of effective feature information.

Results and Discussions Comprehensive experiments were conducted on five representative datasets (CIFAR-10, CIFAR-100, SVHN, Imagenette, Imagewoof) that cover simple object, complex category, low-resolution, and real-world natural image scenarios, ensuring the model’s performance is validated under diverse conditions. DCPRNet achieved classification accuracies of 96.41%, 80.36%, 96.97%, 91.59%, and 80.61% on these datasets respectively, showing stable and reliable performance across different data types, complexity levels, and feature distributions. Ablation studies further confirmed that input layer optimization, the DCFC module, and the average pooling residual module each boost model performance significantly, and the synergistic effects among the three components yield the optimal overall result by complementing each other’s advantages. Comparative experiments against 12 mainstream models, including ResNet-34, DenseNet-121, ViT-B/16, and MobileNet-V2, demonstrated DCPRNet’s consistent superiority: it achieved a 2.15% accuracy gain over the baseline ResNet-34 on the CIFAR-100 dataset and a 4.39% improvement on the ImageNet dataset, while maintaining comparable computational cost. Heatmap-based visual analysis further verified that DCPRNet can accurately focus on critical fine-grained features of target objects, effectively avoiding interference from irrelevant background regions and exhibiting excellent feature calibration capability. These results confirm that the proposed improvements effectively enhance cross-channel information interaction, resolve the inherent limitations of traditional networks, and strengthen the model’s ability to extract discriminative features.

Conclusions DCPRNet was successfully developed by optimizing the ResNet-34 architecture and introducing the DCFC module and average pooling residual module. The model effectively enhances cross-channel information interaction, significantly strengthens feature representation ability and key feature extraction efficiency, and achieves a balanced trade-off between classification accuracy, computational cost, and model robustness. Experimental results and visual analysis fully validate its superiority over mainstream models on multiple datasets, confirming its good robustness and adaptability to different image scenarios and data distributions. By addressing the core flaws of traditional image classification networks—such as insufficient channel interaction, excessive feature loss during downsampling, and limited feature calibration capability—DCPRNet provides a reliable technical solution for image classification tasks in multiple fields including medical imaging, intelligent surveillance, and remote sensing detection. It also offers valuable theoretical and practical reference for the further optimization of deep learning-based image classification models, laying a solid foundation for the development of more efficient, accurate, and lightweight computer vision algorithms suitable for edge computing devices.

动态通道特征校准的池化残差图像分类网络

Pooling residual image classification network with dynamic channel feature calibration

相关链接

目录

动态通道特征校准的池化残差图像分类网络

Pooling residual image classification network with dynamic channel feature calibration

相关链接

目录

微信二维码