基于深度强化学习的航天器多约束规避动作快速规划

doi:10.3969/j.issn.1674 1579.2023.02.001

空间控制技术与应用 ›› 2023, Vol. 49 ›› Issue (2): 1-9.doi: 10.3969/j.issn.1674 1579.2023.02.001

• 论文与报告 • 下一篇

基于深度强化学习的航天器多约束规避动作快速规划

北京控制工程研究所

出版日期:2023-04-26 发布日期:2023-05-15
基金资助:
国家自然科学基金（62203046、U21B6001）、航天领域基金（2022 JCJQ JJ 0660）、空间智能控制技术重点实验室基金（2022 JCJQ LB 010 01）、中国航天科技集团有限公司钱学森青年创新基金、中国航天科技集团有限公司自主研发项目和中国博士后科学基金（2022M713006）

Spacecraft Multi Constraint Rapid Avoidance Motion Planning Based on Deep Reinforcement Learning

Online:2023-04-26 Published:2023-05-15

摘要/Abstract

摘要： 航天器规避机动过程中面临多种复杂约束条件, 传统基于数值优化的动作规划方法在处理相应模型和约束条件时存在初值敏感、计算时间较长等问题, 难以对近距离轨道威胁做出及时反应. 针对该问题, 本文提出一种基于深度强化学习的航天器多约束规避动作规划方法. 建立航天器六自由度非线性动力学模型以及相应姿轨机动约束条件; 建立基于双延迟深度确定性策略梯度(TD3)的动作规划方法, 通过TD3训练得到的神经网络在线生成满足多种约束条件的规避机动动作; 构造与规划方法相适配的深度强化学习规范化训练环境, 确保学习训练过程中智能体和环境的有效交互. 仿真结果表明, 所提方法能在预期交会时间仅数十秒的情况下快速实时生成规避动作, 规划周期小于9 ms, 远低于作为对比项的高斯伪谱法.

关键词: 规避机动, 轨道威胁, 动作规划, 深度强化学习

Abstract: Spacecrafts face with multiple complex constraints during avoidance maneuvers. There are several problems in the traditional motion planning methods based on numerical optimization when processing corresponding models and constraints, such as the sensitive initial value and long calculation time, which makes it difficult to deal with close range orbital threats in time. To address this problem, a multi constrained avoidance motion planning method based on deep reinforcement learning (DRL) is proposed in this paper. First, the spacecraft six degree of freedom nonlinear dynamical model and related constraints for attitude orbit maneuvers are established. Then, the avoidance motion planning method based on twin delayed deep deterministic policy gradient (TD3) is proposed, and the multi constrained avoidance maneuvering actions can be online generated via the neural networks trained by TD3. Finally, the normative DRL training environment matched with the proposed planning method is constructed to ensure the effective interactions between agents and environments. Simulation results show that the proposed method can rapidly generate avoidance actions in real time when the expected rendezvous time is only in tens of seconds, and the planning period is less than 9 ms, which is much lower than the Gauss pseudo spectral method as a comparison item.

Key words: avoidance maneuver, orbital threat, motion planning, deep reinforcement learning

中图分类号:

V448.2

吴健发, 魏春岭, 张海博, 李克行, 郝仁剑. 基于深度强化学习的航天器多约束规避动作快速规划[J]. 空间控制技术与应用, 2023, 49(2): 1-9.

WU Jianfa, WEI Chunling, ZHANG Haibo, LI Kehang, HAO Renjian. Spacecraft Multi Constraint Rapid Avoidance Motion Planning Based on Deep Reinforcement Learning[J]. Aerospace Contrd and Application, 2023, 49(2): 1-9.

0
/ 收藏文章 0 / 推荐

导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks

链接本文: https://journal01.magtech.org.cn/Jwk3_kjkzjs/CN/10.3969/j.issn.1674 1579.2023.02.001

https://journal01.magtech.org.cn/Jwk3_kjkzjs/CN/Y2023/V49/I2/1

参考文献

Metrics

Viewed

Full text

336

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	336

From	Others	local

Times	85	251
Rate	25%	75%

Abstract

144

Just accepted	Online first	Issue

0	0	144

	From	Others

	Times	144
	Rate	100%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

[1]	王淑一, 邢晓宇, 刘磊, 刘文静. 基于DDPG的航天器性能-故障关系图谱推理方法研究[J]. 空间控制技术与应用, 2023, 49(4): 1-8.
[2]	文闻, 周元子, 周晓东, 陶东. 基于深度强化学习的空间机械臂柔顺捕获控制方法研究[J]. 空间控制技术与应用, 2022, 48(1): 1-8.
[3]	解永春, 王勇, 陈奥, 李林峰. 基于学习的空间机器人在轨服务操作技术[J]. 空间控制技术与应用, 2019, 45(4): 25-.
[4]	张浩昱, 熊凯. 基于近端策略优化算法的四足机器人步态控制研究[J]. 空间控制技术与应用, 2019, 45(3): 53-.
[5]	胡晓东, 黄学祥, 胡天健, 王峰林, 梁书立. 一种动态环境下空间机器人的快速路径规划方法[J]. 空间控制技术与应用, 2018, 44(5): 14-21.

基于深度强化学习的航天器多约束规避动作快速规划

Spacecraft Multi Constraint Rapid Avoidance Motion Planning Based on Deep Reinforcement Learning

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

Metrics

本文评价

推荐阅读 10