• 摘要: 针对传统卷积操作在面对旋转、缩放等几何变换时特征提取能力有限,以及难以有效区分关键特征与噪声的问题,本文提出了旋转可变形卷积的图像分类网络(image classification network with rotational deformable convolution, RoDeNet)。该网络基于ResNet-34进行改进,首先引入旋转可变形卷积模块(rotational deformable convolution module, RDCM),通过结合预定义旋转偏移量与动态学习偏移量,增强模型对方向敏感特征的捕捉能力。RDCM采用双路径结构,分别提取基础特征和旋转形变特征,并通过特征保留模块(feature preserve module, FPM)平衡特征选择与信息保留。此外,提出层次门控注意力机制(hierarchical gated attention, HGA),通过多粒度特征选择机制和动态权重优化,提升模型对特征与噪声的辨别能力,同时保持特征表达的丰富性。RoDeNet在CIFAR-10、CIFAR-100、Imagenette和Imagewoof数据集上的准确率达到了96.87%、81.81%、94.72%和89.69%,对比ResNet-34准确率分别提高了0.58%、3.01%、4.74%和5.09%。该网络模型通过旋转可变形卷积与层次门控注意力的协同作用,有效解决了分类网络在几何变换下的特征提取不足和噪声敏感问题,提升了对几何变换的鲁棒性和特征选择能力。

       

      Abstract:
      Objective Image classification serves as a fundamental component in computer vision and supports a wide range of applications including object recognition, scene understanding, and intelligent perception systems. Deep convolutional neural networks have made substantial progress through hierarchical feature learning and residual architectures. Despite these advances, conventional convolution operations rely on fixed sampling grids and rigid kernel geometry, which limits their ability to adapt to geometric transformations such as rotation, scale variation, and orientation changes. When object orientation varies, features extracted by fixed convolution kernels become spatially misaligned, leading to inconsistent representations within the same category and reduced classification robustness. Deformable convolution partially alleviates this limitation by learning adaptive sampling offsets, yet fully data-driven offsets lack explicit geometric constraints and often result in offset divergence, unstable sampling patterns, and weak sensitivity to directional structures in deeper layers. In addition, existing convolution-based networks tend to assign similar importance to foreground and background features, which amplifies the influence of noise and reduces discriminative capability. The objective of this work was to construct an image classification network that simultaneously enhances rotation robustness and improves feature selection stability under complex geometric variations.
      Methods A rotational deformable convolution-based image classification network, named RoDeNet, was developed on the basis of the ResNet-34 backbone. The core component of RoDeNet was the rotational deformable convolution module (RDCM), which integrated explicit geometric priors with adaptive offset learning. Instead of relying solely on unconstrained learned offsets, RDCM introduced predefined rotational offsets corresponding to four base orientations, namely 0°, 45°, 90°, and 135°. These offsets provided stable directional references for convolutional sampling. A compact offset prediction network was employed to generate dynamic offsets from feature responses. The predefined offsets and learned offsets were combined through element-wise addition, which constrained the deformation process within a geometrically meaningful range while preserving data-driven adaptability. RDCM adopted a dual-path structure. One path employed standard convolution to preserve stable base features and ensure consistent low-level representations. The other path applied rotational deformable convolution to capture orientation-adaptive features through direction-aware sampling. Feature outputs from both paths were concatenated along the channel dimension to form enriched representations. To prevent information loss during feature enhancement, a feature preserve module (FPM) was integrated after feature fusion. FPM consisted of an attention-guided enhancement branch and a feature retention branch. The attention branch amplified discriminative spatial responses, while the retention branch maintained the original structural information, ensuring balanced feature optimization. To further improve feature discrimination and noise suppression, a hierarchical gated attention (HGA) mechanism was incorporated into residual blocks. HGA divided feature channels into multiple groups and processed them through two cooperative branches. The spatial pooling branch extracted horizontal and vertical contextual information to model long-range spatial dependencies. The depthwise separable convolution branch focused on local spatial patterns with reduced computational cost. The outputs of the two branches were fused to generate adaptive attention weights, which were used to reweight feature responses across spatial and channel dimensions. In the overall architecture, RDCM replaced the initial convolution layer and the first residual block of each stage, while HGA was inserted after the second convolution within residual blocks to avoid interference with early geometric correction.
      Results and Discussions Extensive experiments were conducted on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof datasets to evaluate classification accuracy, robustness, and efficiency. RoDeNet achieved classification accuracies of 96.87% on CIFAR-10, 81.81% on CIFAR-100, 94.72% on Imagenette, and 89.69% on Imagewoof. Compared with the baseline ResNet-34, accuracy improvements of 0.58%, 3.01%, 4.74%, and 5.09% were achieved, respectively. The performance gains were more pronounced on Imagenette and Imagewoof, which contain significant orientation variation and background complexity, indicating improved geometric robustness. Parameter sensitivity experiments demonstrated that moderate predefined rotation offset strength produced the most favorable performance balance. Excessively strong geometric priors constrained adaptive learning, while purely data-driven offsets failed to provide stable rotational modeling. Ablation studies confirmed that RDCM and HGA each contributed independently to performance improvement. RDCM enhanced direction-sensitive feature extraction, while HGA improved discriminative feature selection and suppressed background noise. Their combination produced greater performance gains than either module alone, indicating complementary functionality. Computational analysis showed that RoDeNet introduced a moderate increase in parameter count and computational complexity compared with ResNet-34, while remaining significantly more efficient than many Transformer-based classification models. Visualization analysis further revealed that RoDeNet focused more accurately on object regions and structural contours, whereas baseline models exhibited dispersed or background-dominated activation patterns. These observations confirmed the effectiveness of the proposed geometric adaptation and hierarchical attention mechanisms.
      Conclusions A rotational deformable convolution-based image classification network was presented to address the limitations of fixed convolutional structures under geometric transformations. By integrating predefined rotational priors with dynamically learned offsets, RDCM enabled stable and direction-aware adaptive sampling. The feature preserve module balanced feature enhancement and information retention. The hierarchical gated attention mechanism improved multi-level feature discrimination through grouped attention modeling. Experimental results demonstrate that the proposed network achieves superior classification accuracy and robustness under rotation and orientation variations while maintaining acceptable computational efficiency. The method provides an effective solution for rotation-sensitive image classification tasks. Future work will focus on reducing computational overhead and extending the framework to continuous rotation modeling and to scenarios involving non-rigid geometric deformations.