Abstract:
Objective Image classification serves as a fundamental task in computer vision, aiming to automatically assign semantic labels to input images through algorithmic models. Although deep learning, particularly convolutional neural networks (CNNs), has achieved remarkable progress in this domain, existing approaches still face significant challenges. Traditional attention mechanisms often overemphasize global relationships while sacrificing the perception of local fine-grained features and textures. Furthermore, these methods struggle to adapt to variations in object scale and spatial relationships, failing to fully integrate global spatial contextual information. The residual connections in deep networks can amplify noise and adversarial perturbations, leading to feature dilution in identity mappings. Additionally, most current architectures lack dynamic weight assignment capabilities, resulting in limited interpretability and flexible adaptability. To address these limitations, this paper proposes a Spatial Dynamic Perception Image Classification Network (SDPNet) that enhances the dynamic perception of critical features, strengthens spatial-channel interactions, improves robustness to geometric transformations, and increases overall network generalization capability.
Methods The proposed SDPNet is constructed upon the ResNet-34 architecture with three key innovations. First, the Negative Exponent Suppression (NES) module is introduced as a novel nonlinear feature transformation technique. This module applies an absolute value transformation to remove polarity differences while preserving feature continuity, followed by an exponential decay suppression operation that replaces negative values with their absolute values multiplied by a decay factor of exp(-6), compressing negative responses to approximately 0.25% of their original magnitude. This approach enhances feature contrast, highlights edge information, and strengthens the network's ability to learn discriminative features while maintaining a balance between information preservation and discriminative enhancement. Second, the Spatial Dynamic Perception (SDP) module is developed, integrating the NES module with three parallel processing branches: Feature Rotation (FR), Dynamic Expansion (DE), and Channel Shuffle (CS). The FR branch performs 180-degree center-symmetric rotation of feature maps followed by 1×1 depthwise separable convolutions and Tanh activation, capturing spatial symmetry and improving robustness to object orientation changes. The DE branch employs convolutional and deconvolutional operations with 3×3 and 2×2 kernels respectively, dynamically expanding receptive fields while preserving spatial resolution through nearest-neighbor interpolation upsampling, thereby capturing both fine details and holistic contours. The CS branch utilizes two 1×1 convolutions with channel shuffling between them to break fixed channel connection patterns, enhancing feature diversity and enabling cross-channel information exchange. These three branches are fused through spatial and channel average pooling followed by Tanh activation, enabling adaptive feature enhancement and dynamic selection of discriminative spatial information. Third, the Dual-Dimension Fusion Residual (DFR) module is proposed, which embeds the SDP attention mechanism into residual blocks and incorporates conditional convolutions into the dimension-reduced residual branches. The conditional convolution dynamically adjusts kernel weights based on input features through a routing function, transforming static convolution into dynamic, condition-dependent operations. This design enhances spatial and channel interactions, optimizes feature representation, reduces overfitting, and improves generalization capability. Additionally, the initial feature extraction layer is modified by replacing the 7×7 convolutional kernel with a 3×3 kernel and removing the max pooling layer to better preserve local details and original spatial information.
Results and Discussions Comprehensive experiments were conducted on five benchmark datasets: CIFAR-100, CIFAR-10, SVHN, Flowers-102, and NWPU-RESISC45. SDPNet achieved classification accuracies of 80.63%, 96.95%, 97.66%, 90.34%, and 92.46% respectively, outperforming eleven state-of-the-art models including ResNet-34, EfficientNets, GhostNet, QKFormer, TLENet, ATONet, Couplformer, SSLLNet, Multi-ResNet, DCDENet, and MobileNet-LAM. Compared with these competing methods, SDPNet demonstrated average accuracy improvements of 3.60%, 2.52%, 2.09%, 7.29%, and 2.36% across the respective datasets. The model's parameter count (21.93 M) remains comparable to ResNet-34 (21.35 M) while being substantially lower than EfficientNets (53.68 M) and Multi-ResNet (52.03 M). The computational complexity (7.5 GFLOPs) and inference speed (2.55 f/s) demonstrate practical deployability in resource-constrained scenarios. Ablation studies confirm the complementary contributions of each module: SDP alone improves ResNet-34 by up to 4.14% on Flowers-102, DFR alone improves by up to 2.65%, and their combination yields synergistic gains of up to 7.59%, validating the effectiveness of the integrated design. Attention mechanism comparisons against SE, SCConv, CA, CBAM, and ADWM across CIFAR-100, Flowers-102, and STL-10 datasets show that SDP consistently achieves superior or comparable performance, with average improvements of 0.6%, 1.27%, and 1.56% respectively. The 180° rotation angle in the FR branch was empirically determined as optimal through systematic parameter analysis. Experiments on SDP module combinations reveal that the full NES+FR+DE+CS configuration achieves the highest accuracy, with NES identified as the core effective component. Position sensitivity analysis indicates that placing SDP at deeper network layers (position 1) maximizes its effectiveness by leveraging larger receptive fields for hierarchical spatial-channel interaction capture. Optimal DFR module placement involves embedding three D-Blocks within the residual connections. Visualization analysis through heatmaps demonstrates that SDPNet effectively suppresses noise and redundant information while focusing on critical regions, exhibiting more refined and comprehensive feature extraction capabilities compared to MobileNet_V2, EfficientNet, and ResNet-34.
Conclusions This paper presents SDPNet, a novel image classification network that addresses fundamental limitations in existing approaches through three synergistic contributions. The NES module enhances spatial feature perception through nonlinear feature transformation, the SDP module dynamically captures discriminative spatial information through integrated rotation, expansion, and shuffling operations, and the DFR module optimizes spatial-channel interactions through conditional convolutions. The comprehensive experimental validation across multiple datasets demonstrates SDPNet's superior classification performance, robust generalization capability, and practical efficiency. Future work will extend this spatial-domain approach by incorporating frequency-domain perception to further enhance classification performance.