中国科技核心期刊

中文核心期刊

CSCD来源期刊

空间控制技术与应用 ›› 2023, Vol. 49 ›› Issue (2): 1-9.doi: 10.3969/j.issn.1674 1579.2023.02.001

• 论文与报告 •    下一篇

基于深度强化学习的航天器多约束规避动作快速规划

  

  1. 北京控制工程研究所
  • 出版日期:2023-04-26 发布日期:2023-05-15
  • 基金资助:
    国家自然科学基金(62203046、U21B6001)、航天领域基金(2022 JCJQ JJ 0660)、空间智能控制技术重点实验室基金(2022 JCJQ LB 010 01)、中国航天科技集团有限公司钱学森青年创新基金、中国航天科技集团有限公司自主研发项目和中国博士后科学基金(2022M713006)

Spacecraft Multi Constraint Rapid Avoidance Motion Planning Based on Deep Reinforcement Learning

  • Online:2023-04-26 Published:2023-05-15

摘要: 航天器规避机动过程中面临多种复杂约束条件, 传统基于数值优化的动作规划方法在处理相应模型和约束条件时存在初值敏感、计算时间较长等问题, 难以对近距离轨道威胁做出及时反应. 针对该问题, 本文提出一种基于深度强化学习的航天器多约束规避动作规划方法. 建立航天器六自由度非线性动力学模型以及相应姿轨机动约束条件; 建立基于双延迟深度确定性策略梯度(TD3)的动作规划方法, 通过TD3训练得到的神经网络在线生成满足多种约束条件的规避机动动作; 构造与规划方法相适配的深度强化学习规范化训练环境, 确保学习训练过程中智能体和环境的有效交互. 仿真结果表明, 所提方法能在预期交会时间仅数十秒的情况下快速实时生成规避动作, 规划周期小于9 ms, 远低于作为对比项的高斯伪谱法.

关键词: 规避机动, 轨道威胁, 动作规划, 深度强化学习

Abstract: Spacecrafts face with multiple complex constraints during avoidance maneuvers. There are several problems in the traditional motion planning methods based on numerical optimization when processing corresponding models and constraints, such as the sensitive initial value and long calculation time, which makes it difficult to deal with close range orbital threats in time. To address this problem, a multi constrained avoidance motion planning method based on deep reinforcement learning (DRL) is proposed in this paper. First, the spacecraft six degree of freedom nonlinear dynamical model and related constraints for attitude orbit maneuvers are established. Then, the avoidance motion planning method based on twin delayed deep deterministic policy gradient (TD3) is proposed, and the multi constrained avoidance maneuvering actions can be online generated via the neural networks trained by TD3. Finally, the normative DRL training environment matched with the proposed planning method is constructed to ensure the effective interactions between agents and environments. Simulation results show that the proposed method can rapidly generate avoidance actions in real time when the expected rendezvous time is only in tens of seconds, and the planning period is less than 9 ms, which is much lower than the Gauss pseudo spectral method as a comparison item.

Key words: avoidance maneuver, orbital threat, motion planning, deep reinforcement learning

中图分类号: 

  • V448.2