Abstract:
Objective With the rapid development of global air transportation system and unmanned aerial vehicle technologies, all kinds of aircraft are increasingly being applied for use in the security of civil and airspace monitoring and public safety. So, we need better and more correct aerial target recognition. Thanks to the development of deep learning and computer vision, Convolutional neural network based detection methods are being used for recognition of aircraft. Among them, the semantic segmentation can complete the localization of the pixel level target, therefore, it can provide much more detailed information to extract the aircraft contour and analyze the posture of the aircraft. However, compared to the medium and large aircraft, the small aircraft has fewer pixels in the image and its discriminative features are weaker, which is easily affected by the complex background and is hard to be precisely and steadily segmented by the current model. At the same time, in the practical application scenario, because of the difference in the use of imaging devices, the environment, and the distribution of data, it is easy to produce the domain shift phenomenon, and the model’s generalization ability is greatly reduced. And all these high-accuracy segmentation methods need very complex network structure and occupy lots of computational resources, which is hard to satisfy the real-time requirement of practical systems. So combining domain adaptation and light-weighting modeling approach to get both high accurate real-time small aircraft semantic segmentation of background. This is worth researching and putting into practical engineering.
Methods The paper focuses on the problem of poor segmentation accuracy of small aircraft in cross-domain situation, and carries out some model design and exploration within the scope of a deep learning-based semantic segmentation framework. The classical U-Net plus an additional feature pyramid network (FPN) which is a multi-scale feature fusion structure to improve the model's ability to represent small aircraft targets. In addition to this, a Squeeze-and-Excitation(SE)channel attention is added to the network bottleneck such that it can learn to adjust the weight of each feature channel adaptively, thereby increasing the models ability to react to important discriminative features. To solve the different feature distribution caused by the different scenes and imaging conditions, we proposed a method to align the deep feature distribution between the source and target domains by the MMD, which can also enhance the model’s generalization to the target domain. On this basis, we use a teacher-student network and introduce a knowledge distillation approach, where the high-accuracy model with domain adaptation serves as the teacher. Using the teacher’s soft label to supervise the training of a light student model, we can greatly reduce the number of parameters and computational cost of the student model, but still have good segmentation effect. Experiment is conducted on the multi-scene small-aircraft dataset with explicit source-target domain split. To validate the effectiveness and real-time performance of the above proposed method through comparison experiment and ablation experiment respectively.
Results and Discussions Small aircraft cross domain semantic segmentation performance, experimental results show that the proposed FPN-SE-DA-UNet is good at both accuracy and generalization. Feature pyramid network comes in to play in order to integrate information from different scale, and solve the loss of fine-grained features caused by the repeated down-sampling process. SE channel attention makes model better at noticing important discriminative features and leads to better target completeness and stability in segmentation results. Cross domain experiment shows that using the maximum mean discrepancy based domain adaptation to align the source and target domain feature can get much higher IOU and DICE score on the target domain test set, which is 0.5229 and 0.6866 respectively. Both is much better than all multiple semantic segmentation model. Also with the same model, it is possible to make a small student model by adding knowledge distillation. The student model is scored at 0.4557 IoU and 0.6260 Dice. The student model gets an average inference speed of 59.5 f/s on an NVIDIA GeForce RTX 3090, there is a slight drop in segmentation accuracy, this is about a 49.1% increase in real-time performance over the teacher model, it is a reasonable trade-off for accuracy vs. real-time performance. From the above we can see that the proposed method is able to obtain a stable small aircraft segmentation for cross-domain scenarios without sacrificing real-time application requirements.
Conclusions To solve the problems of low segmentation accuracy, poor generalization ability, and weak real-time performance in cross-domain semantic segmentation of small aircraft, this study proposes a domain adaptation based semantic segmentation model, FPN-SE-DA-UNet, it introduces the FPN module to obtain richer multi-scale feature information, which can enhance the ability to capture small targets. On the other hand, the SE attention module is used to highlight more informative channel information in the network, and it introduces a domain adaptation strategy to align the features of the source domain and target domain, which can improve the model’s ability to generalize. On this basis, a lightweight student model is constructed with the help of a knowledge distillation strategy. It has good segmentation results and good inference efficiency, and the computational complexity has been greatly reduced. Experiments verify the effectiveness and practicability of the proposed method in cross domain semantic segmentation of small aircraft.