基于改进优先经验回放的SAC算法路径规划

doi:10.3969/j.issn.1674 1579.2023.05.007

空间控制技术与应用 ›› 2023, Vol. 49 ›› Issue (5): 55-64.doi: 10.3969/j.issn.1674 1579.2023.05.007

基于改进优先经验回放的SAC算法路径规划

河南理工大学电气工程与自动化学院

出版日期:2023-10-26 发布日期:2023-11-20
基金资助:
河南省科技攻关项目（232102210040）

Path Planning Using SAC Algorithm Based on Improved Prioritized Experience Replay

Online:2023-10-26 Published:2023-11-20

摘要/Abstract

摘要： 为解决智能体在复杂环境下的路径规划问题，提出一种基于改进优先经验回放方法的在线异策略深度强化学习算法模型.该模型采用柔性动作评价算法，通过设计智能体的状态空间、动作空间及奖励函数等实现智能体无碰撞路径规划；利用样本状态优先度与TD误差构建的样本混合优先度的离散度计算样本采样概率，进一步提出基于改进优先经验回放方法的柔性动作评价算法，提高模型学习效率.仿真实验结果验证了提出的改进柔性动作评价算法在各个参数配合下的有效性及改进优先经验回放方法在连续控制任务中模型学习效率的优越性.

关键词: 状态优先度, TD误差, 离散度, 优先经验回放, 学习效率

Abstract: In order to address the path planning problem of intelligent agents in complex environments, this paper proposes an online off policy deep reinforcement learning algorithm model based on an improved prioritized experience replay method. Firstly, the model utilizes a flexible action evaluation algorithm to achieve collision free path planning for the intelligent agent by designing the state space, action space, and reward function. Secondly, by calculating the sample mixing priority using the sample priority and TD error, a measure of sample diversity is obtained, and an improved prioritized experience replay method based on the flexible action evaluation algorithm is proposed to enhance the learning efficiency of the model. The simulation experimental results validate the effectiveness of the proposed improved flexible action evaluation algorithm under various parameter combinations and the superiority of the improved prioritized experience replay method in model learning efficiency for continuous control tasks

Key words: state priority, TD error, diversity, prioritized experience replay, learning efficiency

中图分类号:

TP183

崔立志, 钟航, 董文娟. 基于改进优先经验回放的SAC算法路径规划[J]. 空间控制技术与应用, 2023, 49(5): 55-64.

CUI Lizhi, ZHONG Hang, DONG Wenjuan. Path Planning Using SAC Algorithm Based on Improved Prioritized Experience Replay[J]. Aerospace Contrd and Application, 2023, 49(5): 55-64.

0
/ 收藏文章 / 推荐

导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks

链接本文: http://journal01.magtech.org.cn/Jwk3_kjkzjs/CN/10.3969/j.issn.1674 1579.2023.05.007

http://journal01.magtech.org.cn/Jwk3_kjkzjs/CN/Y2023/V49/I5/55

参考文献

Metrics

Viewed

Full text

153

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	153

From	Others	local

Times	34	119
Rate	22%	78%

Abstract

Just accepted	Online first	Issue

0	0	48

From	Others	local

Times	46	2
Rate	96%	4%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

[1]	王圣杰, 王铎, 梁秋金, 金渝皓, 刘磊, 张涛. 小样本学习综述[J]. 空间控制技术与应用, 2023, 49(5): 1-10.
[2]	张秀云, 冷嘉俊, 刘文静, 刘达, 宗群. 基于联邦学习的卫星编队故障诊断[J]. 空间控制技术与应用, 2023, 49(4): 50-58.
[3]	孙颢, 程月华, 姜斌, 李文婷. 基于飞行状态数据的火箭动力系统异常监测研究[J]. 空间控制技术与应用, 2023, 49(4): 67-75.
[4]	王泽, 姜斌, 程月华, 张香燕, 薛琪. 一种基于生成对抗网络的卫星异常检测方法[J]. 空间控制技术与应用, 2023, 49(1): 113-120.
[5]	张鹏程, 武文波, 李强, 曹城华. 面向星载边缘计算的遥感目标检测算法轻量化优化研究[J]. 空间控制技术与应用, 2022, 48(5): 86-94.
[6]	宁之成, 刘潇翔, 王淑一. 机理与数据融合的航天器控制系统数字孪生建模方法[J]. 空间控制技术与应用, 2022, 48(2): 1-7.
[7]	李林峰, 王勇, 解永春, 胡勇, 陈奥. 多视角视觉目标生成的空间机器人操作学习[J]. 空间控制技术与应用, 2022, 48(2): 18-28.
[8]	王泽, 程月华, 宫江雷, 郭小红, 何漫丽. 引入DDC迁移学习算法的卫星ACS系统故障定位技术[J]. 空间控制技术与应用, 2022, 48(2): 80-88.
[9]	盖冉翔, 汪旭东, 刘旭辉, 姚兆普. 基于单隐层神经网络的压电执行器非线性特征建模策略研究[J]. 空间控制技术与应用, 2021, 47(4): 103-108.

基于改进优先经验回放的SAC算法路径规划

Path Planning Using SAC Algorithm Based on Improved Prioritized Experience Replay

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 9

Metrics

本文评价

推荐阅读 0