Abstract:
Objective Corn, as one of the world’s most vital staple crops, plays an indispensable role in agricultural production, industrial applications, and economic development. In China, corn planting area and yield currently account for over 40% of total grain output, a proportion that continues to grow steadily, underscoring its strategic significance for national food security and economic stability. However, throughout the growth cycle, corn is persistently threatened by various pests and diseases, including corn borer, leaf spot, rust, stalk rot, and armyworm, which severely restrict both yield and quality. Traditional pest and disease detection methods rely heavily on manual field inspection—a labor-intensive, time-consuming, and inefficient approach that struggles to meet the demands of large-scale cultivation. Moreover, manual detection is inherently subjective and discontinuous, often resulting in missed diagnoses and misjudgments. Consequently, developing an automated, efficient, and accurate detection method for corn pests and diseases has become an urgent necessity.
In recent years, deep learning-based object detection technologies have achieved remarkable success across various domains, with the YOLO (You Only Look Once) series of algorithms gaining particular prominence due to their exceptional balance of detection speed and accuracy. YOLOv11, the latest iteration in this series, incorporates further optimizations in training strategies, network architecture, and detection precision, providing a solid technical foundation for agricultural pest identification tasks. Nevertheless, when applied to corn pest detection in real-world field scenarios, mainstream models including YOLOv11 continue to exhibit critical limitations: 1) Complex environmental interference—the intricate backgrounds, variable lighting conditions, diverse pest morphologies, and frequent occlusions characteristic of field environments substantially impair the model’s ability to accurately localize and recognize targets; 2) Small-target detection difficulty—early-stage or physically small pests and disease lesions occupy extremely limited pixel regions in images and display indistinct features, leading to elevated rates of missed detections and false negatives.
Methods To address these challenges, this paper proposes an improved YOLOv11 model designated as MSAF-YOLO (Multi-Scale Adaptive Feedback YOLO). The core innovation lies in reconstructing the backbone network using a novel MSAF-Net (Multi-Scale Adaptive Feedback Network) architecture, which synergistically integrates two specially designed modules: the Hierarchical Feature Feedback Module (HFFM) and the Adaptive Dynamic Receptive Field Module (ADRF).
The Hierarchical Feature Feedback Module addresses the fundamental problem of semantic information deficiency in shallow feature layers and the progressive loss of small-target features during repeated downsampling operations. Conventional feedforward convolutional networks propagate information unidirectionally from shallow to deep layers; while deep layers acquire rich semantic representations, shallow layers remain dominated by fine-grained spatial details but lack high-level semantic context. HFFM introduces residual-style feedback pathways coupled with channel attention mechanisms to establish cross-layer feature calibration and enhancement. Specifically, deep feature maps containing abstracted semantic information are selectively fed back and fused with corresponding shallow feature maps through learnable gating mechanisms. The channel attention component adaptively recalibrates feature responses, emphasizing informative channels while suppressing less useful ones. This feedback-driven feature refinement significantly enhances the model’s capacity to extract and preserve discriminative characteristics of small pests and incipient disease symptoms, effectively mitigating the information attenuation typically observed across deep convolutional architectures.
The Adaptive Dynamic Receptive Field Module confronts the inherent inflexibility of standard convolutional neural networks, which employ fixed geometric receptive fields that remain constant regardless of input content. Such rigidity proves particularly disadvantageous when processing agricultural images containing objects at vastly different scales—from millimeter-scale rust pustules to centimeter-scale expanded leaf spot lesions—under heterogeneous background conditions. ADRF achieves pixel-level receptive field adaptation through an elegantly designed integration of spatially adaptive weights and multi-branch dilated convolutions. For each spatial location in the feature map, the module dynamically computes context-dependent weights that govern the relative contributions of parallel convolutional branches employing different dilation rates. This mechanism enables the network to expand its receptive field when processing large contextual regions while maintaining fine resolution for small-target localization, all modulated by the local structural characteristics of the input features. The resulting adaptive receptive field substantially improves model robustness and generalization across diverse pest and disease categories, growth stages, and environmental conditions.
Results and Discussions The proposed MSAF-YOLO model was rigorously evaluated on two datasets: a self-constructed corn pest and disease dataset (CORN) encompassing representative categories including corn borer, leaf spot, and rust under various field conditions, and the publicly available general plant disease dataset (diseases) to assess cross-domain generalization capability. Experimental results demonstrate substantial and consistent improvements across multiple evaluation metrics. On the CORN dataset, MSAF-YOLO achieved mean Average Precision at 50% IoU (mAP50) of 85.7% and mAP50:95 of 68.9%, representing significant gains of 3.2 and 2.7 percentage points respectively over the baseline YOLOv11 model. On the diseases dataset, mAP50 and mAP50:95 reached 80.9% and 62.8%, improving by 3.7% and 3.1% respectively. Notably, these accuracy enhancements were accomplished alongside parameter optimization, demonstrating that MSAF-YOLO achieves a favorable trade-off between detection precision and computational efficiency.
Conclusions In conclusion, this research makes three principal contributions: 1) it systematically identifies and characterizes the specific challenges confronting corn pest and disease detection in complex field environments, particularly the coupled difficulties of small-target detection and background interference; 2) it proposes two novel modules—HFFM and ADRF—that respectively address semantic feature deficiency and fixed receptive field limitations through innovative feedback mechanisms and dynamic convolution strategies; 3) it delivers a practically deployable model that achieves state-of-the-art detection performance on corn pest and disease tasks while maintaining architectural efficiency suitable for real-time agricultural applications. The proposed MSAF-YOLO framework not only advances the methodological frontier of object detection in agricultural contexts but also offers a viable technical pathway toward intelligent, automated crop health monitoring systems.