• 摘要: 为了增强对重点特征的动态感知能力,增强空间和通道间的交互作用,提出空间动态感知的图像分类网络 (spatial dynamic perception image classification network ,SDPNet)。此网络以ResNet-34残差网络为基础,首先引入负值指数抑制模块 (negative exponent suppression,NES)强化对图像空间特征的感知,增强网络对判别性特征的学习能力;然后采用了空间动态感知模块 (spatial dynamic perception,SDP),将负值指数抑制模块与并行结构的特征旋转 (feature rotation,FR)与动态扩张 (dynamic expansion,DE)相融合,捕捉图像空间对称性,动态选择空间特征中的判别性信息,实现空间特征增强。最后引入了双维度融合残差模块 (dual-dimension fusion residual,DFR),将SDP模块嵌入到残差块中并将条件卷积嵌入到降维后的残差分支中,动态调整空间和通道的交互作用,优化特征表达,减少网络过拟合,提高网络的泛化能力。在CIFAR-100、CIFAR-10、SVHN、Flowers-102、NWPU-RESISC45数据集上准确率分别达到了80.63%、96.95%、97.66%、90.34%、92.46%,与当前先进的方法相比分别平均提高了3.60%、2.52%、2.09%、7.29%、2.36%,与现有主流网络模型比较,SDPNet能够对原始特征图中重要的空间位置和通道位置进行增强,提高关键特征的贡献度,有效提高神经网络的图像分类能力。

       

      Abstract:
      Objective Image classification serves as a fundamental task in computer vision, aiming to automatically assign semantic labels to input images through algorithmic models. Although deep learning, particularly convolutional neural networks (CNNs), has achieved remarkable progress in this domain, existing approaches still face significant challenges. Traditional attention mechanisms often overemphasize global relationships while sacrificing the perception of local fine-grained features and textures. Furthermore, these methods struggle to adapt to variations in object scale and spatial relationships, failing to fully integrate global spatial contextual information. The residual connections in deep networks can amplify noise and adversarial perturbations, leading to feature dilution in identity mappings. Additionally, most current architectures lack dynamic weight assignment capabilities, resulting in limited interpretability and flexible adaptability. To address these limitations, this paper proposes a Spatial Dynamic Perception Image Classification Network (SDPNet) that enhances the dynamic perception of critical features, strengthens spatial-channel interactions, improves robustness to geometric transformations, and increases overall network generalization capability.
      Methods The proposed SDPNet is constructed upon the ResNet-34 architecture with three key innovations. First, the Negative Exponent Suppression (NES) module is introduced as a novel nonlinear feature transformation technique. This module applies an absolute value transformation to remove polarity differences while preserving feature continuity, followed by an exponential decay suppression operation that replaces negative values with their absolute values multiplied by a decay factor of exp(-6), compressing negative responses to approximately 0.25% of their original magnitude. This approach enhances feature contrast, highlights edge information, and strengthens the network's ability to learn discriminative features while maintaining a balance between information preservation and discriminative enhancement. Second, the Spatial Dynamic Perception (SDP) module is developed, integrating the NES module with three parallel processing branches: Feature Rotation (FR), Dynamic Expansion (DE), and Channel Shuffle (CS). The FR branch performs 180-degree center-symmetric rotation of feature maps followed by 1×1 depthwise separable convolutions and Tanh activation, capturing spatial symmetry and improving robustness to object orientation changes. The DE branch employs convolutional and deconvolutional operations with 3×3 and 2×2 kernels respectively, dynamically expanding receptive fields while preserving spatial resolution through nearest-neighbor interpolation upsampling, thereby capturing both fine details and holistic contours. The CS branch utilizes two 1×1 convolutions with channel shuffling between them to break fixed channel connection patterns, enhancing feature diversity and enabling cross-channel information exchange. These three branches are fused through spatial and channel average pooling followed by Tanh activation, enabling adaptive feature enhancement and dynamic selection of discriminative spatial information. Third, the Dual-Dimension Fusion Residual (DFR) module is proposed, which embeds the SDP attention mechanism into residual blocks and incorporates conditional convolutions into the dimension-reduced residual branches. The conditional convolution dynamically adjusts kernel weights based on input features through a routing function, transforming static convolution into dynamic, condition-dependent operations. This design enhances spatial and channel interactions, optimizes feature representation, reduces overfitting, and improves generalization capability. Additionally, the initial feature extraction layer is modified by replacing the 7×7 convolutional kernel with a 3×3 kernel and removing the max pooling layer to better preserve local details and original spatial information.
      Results and Discussions Comprehensive experiments were conducted on five benchmark datasets: CIFAR-100, CIFAR-10, SVHN, Flowers-102, and NWPU-RESISC45. SDPNet achieved classification accuracies of 80.63%, 96.95%, 97.66%, 90.34%, and 92.46% respectively, outperforming eleven state-of-the-art models including ResNet-34, EfficientNets, GhostNet, QKFormer, TLENet, ATONet, Couplformer, SSLLNet, Multi-ResNet, DCDENet, and MobileNet-LAM. Compared with these competing methods, SDPNet demonstrated average accuracy improvements of 3.60%, 2.52%, 2.09%, 7.29%, and 2.36% across the respective datasets. The model's parameter count (21.93 M) remains comparable to ResNet-34 (21.35 M) while being substantially lower than EfficientNets (53.68 M) and Multi-ResNet (52.03 M). The computational complexity (7.5 GFLOPs) and inference speed (2.55 f/s) demonstrate practical deployability in resource-constrained scenarios. Ablation studies confirm the complementary contributions of each module: SDP alone improves ResNet-34 by up to 4.14% on Flowers-102, DFR alone improves by up to 2.65%, and their combination yields synergistic gains of up to 7.59%, validating the effectiveness of the integrated design. Attention mechanism comparisons against SE, SCConv, CA, CBAM, and ADWM across CIFAR-100, Flowers-102, and STL-10 datasets show that SDP consistently achieves superior or comparable performance, with average improvements of 0.6%, 1.27%, and 1.56% respectively. The 180° rotation angle in the FR branch was empirically determined as optimal through systematic parameter analysis. Experiments on SDP module combinations reveal that the full NES+FR+DE+CS configuration achieves the highest accuracy, with NES identified as the core effective component. Position sensitivity analysis indicates that placing SDP at deeper network layers (position 1) maximizes its effectiveness by leveraging larger receptive fields for hierarchical spatial-channel interaction capture. Optimal DFR module placement involves embedding three D-Blocks within the residual connections. Visualization analysis through heatmaps demonstrates that SDPNet effectively suppresses noise and redundant information while focusing on critical regions, exhibiting more refined and comprehensive feature extraction capabilities compared to MobileNet_V2, EfficientNet, and ResNet-34.
      Conclusions This paper presents SDPNet, a novel image classification network that addresses fundamental limitations in existing approaches through three synergistic contributions. The NES module enhances spatial feature perception through nonlinear feature transformation, the SDP module dynamically captures discriminative spatial information through integrated rotation, expansion, and shuffling operations, and the DFR module optimizes spatial-channel interactions through conditional convolutions. The comprehensive experimental validation across multiple datasets demonstrates SDPNet's superior classification performance, robust generalization capability, and practical efficiency. Future work will extend this spatial-domain approach by incorporating frequency-domain perception to further enhance classification performance.