中国科技核心期刊

中文核心期刊

CSCD来源期刊

空间控制技术与应用 ›› 2023, Vol. 49 ›› Issue (5): 55-64.doi: 10.3969/j.issn.1674 1579.2023.05.007

• 论文与报告 • 上一篇    下一篇

基于改进优先经验回放的SAC算法路径规划

  

  1. 河南理工大学电气工程与自动化学院
  • 出版日期:2023-10-26 发布日期:2023-11-20
  • 基金资助:
    河南省科技攻关项目(232102210040)

Path Planning Using SAC Algorithm Based on Improved Prioritized Experience Replay

  • Online:2023-10-26 Published:2023-11-20

摘要: 为解决智能体在复杂环境下的路径规划问题,提出一种基于改进优先经验回放方法的在线异策略深度强化学习算法模型.该模型采用柔性动作评价算法,通过设计智能体的状态空间、动作空间及奖励函数等实现智能体无碰撞路径规划;利用样本状态优先度与TD误差构建的样本混合优先度的离散度计算样本采样概率,进一步提出基于改进优先经验回放方法的柔性动作评价算法,提高模型学习效率.仿真实验结果验证了提出的改进柔性动作评价算法在各个参数配合下的有效性及改进优先经验回放方法在连续控制任务中模型学习效率的优越性.

关键词: 状态优先度, TD误差, 离散度, 优先经验回放, 学习效率

Abstract: In order to address the path planning problem of intelligent agents in complex environments, this paper proposes an online off policy deep reinforcement learning algorithm model based on an improved prioritized experience replay method. Firstly, the model utilizes a flexible action evaluation algorithm to achieve collision free path planning for the intelligent agent by designing the state space, action space, and reward function. Secondly, by calculating the sample mixing priority using the sample priority and TD error, a measure of sample diversity is obtained, and an improved prioritized experience replay method based on the flexible action evaluation algorithm is proposed to enhance the learning efficiency of the model. The simulation experimental results validate the effectiveness of the proposed improved flexible action evaluation algorithm under various parameter combinations and the superiority of the improved prioritized experience replay method in model learning efficiency for continuous control tasks

Key words: state priority, TD error, diversity, prioritized experience replay, learning efficiency

中图分类号: 

  • TP183