面向大规模LEO卫星网络的深度图强化学习路由算法

黄思博; 杨明川; 张祯昊

doi:10.12086/oee.2026.250386

面向大规模LEO卫星网络的深度图强化学习路由算法

Deep graph reinforcement learning routing algorithm for large-scale LEO satellite networks

摘要: 大规模低地球轨道 (LEO)卫星网络路由技术面临着网络拓扑动态变化导致的路由传输性能不佳、算法模型可扩展性不足等关键挑战。为此，利用改进的消息传递神经网络(MPNN)和深度Q网络 (DQN)，本文提出了一种基于MPNN和DQN的大规模LEO卫星网络路由算法 (MDSR)。其中，改进的MPNN模块提取网络中卫星和星间链路的状态特征，DQN模块基于这些特征实现路由决策。MDSR算法在极轨道大规模LEO卫星网络上训练并测试。结果表明，训练时MDSR模型的收敛速度和初始奖励均优于未改进MPNN模型。测试时MDSR的平均丢包率比其他强化学习算法降低约48%，比Dijkstra最短路径 (DSP)算法降低约35%；平均端到端时延比DSP、Q学习 (Q-learning)、DQN和深度确定性策略性梯度 (DDPG)算法分别降低了约20.5 %、7.9 %、2.7 %和5.4 %；平均吞吐量比DSP、Q-learning、DQN和DDPG分别提升了约21.7 %、7.6 %、5 %和1.2 %。同时，MDSR无需额外训练即可适应不同星座网络构型，具备较强的拓扑泛化能力。

Abstract:

Objective Large-scale low earth orbit (LEO) satellite networks can provide strong coverage, high data transmission rates and low latency. This makes them an important part of future 6G communication systems. Routing technology is a critical component of large-scale LEO satellite networks, providing high-quality and low-latency services for network communications through packet transmission mechanisms. However, routing technology for large-scale LEO satellite networks faces critical challenges, including poor routing transmission performance due to dynamic changes in network topology and insufficient scalability of algorithmic models. In recent years, the development of artificial intelligence (AI) has provided new avenues for addressing these challenges. To address these challenges, this study utilizes AI technology, combining graph neural networks (GNNs) and deep reinforcement learning (DRL) to propose a routing algorithm that enhances transmission performance across diverse network topologies while demonstrating robust topological generalization capabilities.

Methods This study utilized message passing neural networks (MPNNs) from GNN models and deep Q-networks (DQNs) from DRL models, and proposed a large-scale LEO satellite network routing algorithm based on MPNN and DQN (MDSR). Firstly, the MDSR algorithm employed a virtual topology approach to partition large-scale LEO satellite networks into fixed time intervals and construct them as graph-structured data for input into MDSR. Secondly, we designed the MDSR framework. The MDSR framework included two parts: an improved MPNN module and a DQN module. The improved MPNN module included a message processing stage and a readout stage. The message processing stage consisted of a message processing function (MPF), an information aggregation function (IAF), and a hidden state update function (HSUF). The readout stage consisted of a readout function (RF). Specifically, we improved the HSUF by replacing the original recurrent neural networks (RNNs) with long short-term memory (LSTM) networks. This aimed to enhance the model's ability to learn long-range dependencies and mitigate vanishing and exploding gradients. The improved MPNN module extracted the satellites and inter satellite links state features and their interconnections from the networks. The DQN module then used these features to make routing decisions. Finally, in order to compare the routing and transmission performance of different algorithms in different satellite network configurations, this study selected the HSUF as the RNN for the MPNN + DQN (HSUF-RNN) based algorithms, the Dijkstra shortest path (DSP) algorithm, the DQN based algorithm, the Q-learning-based algorithm, and the deep deterministic policy gradient (DDPG)-based algorithm for comparison with MDSR. MDSR was trained alongside other reinforcement learning algorithms used for comparison in a routing scenario involving a large-scale LEO satellite network in a polar orbit with a routing update interval of 30 s. Simultaneously, MDSR and other comparison algorithms underwent routing transmission performance testing under 10 s, 30 s, 60 s and 300 s routing update intervals for large-scale LEO satellite networks in polar and inclined orbits. The experimental environment included the Windows 11 operating system, an NVIDIA GeForce RTX 4050 GPU, the PyTorch 1.13.1 + CUDA 11.6 deep learning framework, and the Python 3.9 programming language.

Results and Discussions The results demonstrate that, during training, MDSR achieved faster convergence and a higher initial reward than HSUF-RNN. During testing, in polar large-scale LEO satellite networks, MDSR achieved an average packet loss rate approximately 48% lower than those of the other RL algorithms and 35% lower than that of DSP. The average end-to-end delay of MDSR was approximately 20.5%, 7.9%, 2.7%, and 5.4% lower than that of DSP, Q-learning, DQN, and DDPG, respectively. The average throughput of MDSR was approximately 21.7%, 7.6%, 5%, and 1.2% higher than that of DSP, Q-learning, DQN, and DDPG, respectively. In inclined large-scale LEO satellite networks, MDSR adapted to different constellation network configurations without requiring additional training, demonstrating strong topological generalization capabilities. MDSR significantly outperformed static routing algorithms like DSP in terms of routing transmission performance. Compared to routing algorithms based on non-GNN+DRL paradigms, MDSR can accurately perceive the global network state to make routing decisions, thereby improving key routing metrics such as end-to-end delay, throughput, and packet loss rates. Therefore, in our experiments, MDSR enhanced routing and transmission performance. Without additional training, the MDSR algorithm demonstrated stable routing performance across different constellation network configurations, exhibiting strong topological generalization capabilities.

Conclusions This study proposes the MDSR algorithm, which has been developed to address the issues of degraded transmission performance and insufficient topological generalization capability in routing algorithms. MDSR utilizes the improved MPNN module to extract network state features, and the DQN module learns the best routing strategies through these features. MDSR is subjected to experimental evaluation in both polar and inclined orbit large-scale LEO satellite networks. It is also compared with the HSUF-RNN, DSP, DQN, Q-learning and DDPG algorithms. The results show that, during training, MDSR exhibits faster convergence and higher initial rewards compared to the HSUF-RNN model. During testing, MDSR outperforms other comparison algorithms in both routing transmission performance metrics and topological generalization capabilities. In future research, we will explore optimization schemes such as random link connections and traffic partitioning to further enhance the robustness of MDSR while also achieving more balanced network load distribution.

面向大规模LEO卫星网络的深度图强化学习路由算法

Deep graph reinforcement learning routing algorithm for large-scale LEO satellite networks

相关链接

目录

面向大规模LEO卫星网络的深度图强化学习路由算法

Deep graph reinforcement learning routing algorithm for large-scale LEO satellite networks

相关链接

目录

微信二维码