Abstract:
Objective Image classification serves as a fundamental component in computer vision and supports a wide range of applications including object recognition, scene understanding, and intelligent perception systems. Deep convolutional neural networks have made substantial progress through hierarchical feature learning and residual architectures. Despite these advances, conventional convolution operations rely on fixed sampling grids and rigid kernel geometry, which limits their ability to adapt to geometric transformations such as rotation, scale variation, and orientation changes. When object orientation varies, features extracted by fixed convolution kernels become spatially misaligned, leading to inconsistent representations within the same category and reduced classification robustness. Deformable convolution partially alleviates this limitation by learning adaptive sampling offsets, yet fully data-driven offsets lack explicit geometric constraints and often result in offset divergence, unstable sampling patterns, and weak sensitivity to directional structures in deeper layers. In addition, existing convolution-based networks tend to assign similar importance to foreground and background features, which amplifies the influence of noise and reduces discriminative capability. The objective of this work was to construct an image classification network that simultaneously enhances rotation robustness and improves feature selection stability under complex geometric variations.
Methods A rotational deformable convolution-based image classification network, named RoDeNet, was developed on the basis of the ResNet-34 backbone. The core component of RoDeNet was the rotational deformable convolution module (RDCM), which integrated explicit geometric priors with adaptive offset learning. Instead of relying solely on unconstrained learned offsets, RDCM introduced predefined rotational offsets corresponding to four base orientations, namely 0°, 45°, 90°, and 135°. These offsets provided stable directional references for convolutional sampling. A compact offset prediction network was employed to generate dynamic offsets from feature responses. The predefined offsets and learned offsets were combined through element-wise addition, which constrained the deformation process within a geometrically meaningful range while preserving data-driven adaptability. RDCM adopted a dual-path structure. One path employed standard convolution to preserve stable base features and ensure consistent low-level representations. The other path applied rotational deformable convolution to capture orientation-adaptive features through direction-aware sampling. Feature outputs from both paths were concatenated along the channel dimension to form enriched representations. To prevent information loss during feature enhancement, a feature preserve module (FPM) was integrated after feature fusion. FPM consisted of an attention-guided enhancement branch and a feature retention branch. The attention branch amplified discriminative spatial responses, while the retention branch maintained the original structural information, ensuring balanced feature optimization. To further improve feature discrimination and noise suppression, a hierarchical gated attention (HGA) mechanism was incorporated into residual blocks. HGA divided feature channels into multiple groups and processed them through two cooperative branches. The spatial pooling branch extracted horizontal and vertical contextual information to model long-range spatial dependencies. The depthwise separable convolution branch focused on local spatial patterns with reduced computational cost. The outputs of the two branches were fused to generate adaptive attention weights, which were used to reweight feature responses across spatial and channel dimensions. In the overall architecture, RDCM replaced the initial convolution layer and the first residual block of each stage, while HGA was inserted after the second convolution within residual blocks to avoid interference with early geometric correction.
Results and Discussions Extensive experiments were conducted on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof datasets to evaluate classification accuracy, robustness, and efficiency. RoDeNet achieved classification accuracies of 96.87% on CIFAR-10, 81.81% on CIFAR-100, 94.72% on Imagenette, and 89.69% on Imagewoof. Compared with the baseline ResNet-34, accuracy improvements of 0.58%, 3.01%, 4.74%, and 5.09% were achieved, respectively. The performance gains were more pronounced on Imagenette and Imagewoof, which contain significant orientation variation and background complexity, indicating improved geometric robustness. Parameter sensitivity experiments demonstrated that moderate predefined rotation offset strength produced the most favorable performance balance. Excessively strong geometric priors constrained adaptive learning, while purely data-driven offsets failed to provide stable rotational modeling. Ablation studies confirmed that RDCM and HGA each contributed independently to performance improvement. RDCM enhanced direction-sensitive feature extraction, while HGA improved discriminative feature selection and suppressed background noise. Their combination produced greater performance gains than either module alone, indicating complementary functionality. Computational analysis showed that RoDeNet introduced a moderate increase in parameter count and computational complexity compared with ResNet-34, while remaining significantly more efficient than many Transformer-based classification models. Visualization analysis further revealed that RoDeNet focused more accurately on object regions and structural contours, whereas baseline models exhibited dispersed or background-dominated activation patterns. These observations confirmed the effectiveness of the proposed geometric adaptation and hierarchical attention mechanisms.
Conclusions A rotational deformable convolution-based image classification network was presented to address the limitations of fixed convolutional structures under geometric transformations. By integrating predefined rotational priors with dynamically learned offsets, RDCM enabled stable and direction-aware adaptive sampling. The feature preserve module balanced feature enhancement and information retention. The hierarchical gated attention mechanism improved multi-level feature discrimination through grouped attention modeling. Experimental results demonstrate that the proposed network achieves superior classification accuracy and robustness under rotation and orientation variations while maintaining acceptable computational efficiency. The method provides an effective solution for rotation-sensitive image classification tasks. Future work will focus on reducing computational overhead and extending the framework to continuous rotation modeling and to scenarios involving non-rigid geometric deformations.