辐射防护 ›› 2025, Vol. 45 ›› Issue (5): 517-529.

• 核与辐射事故应急准备与对策 • 上一篇    下一篇

基于深度强化学习的动态核应急撤离优化决策模型研发

李鸣野1,2, 姚仁太1,2, 郭欢1,2, 张俊芳1,2, 吕明华1,2, 徐向军1,2, 牛嫣静1,2, 贾博慧3   

  1. 1.中国辐射防护研究院,太原 030006;
    2.中核核环境模拟与评价技术重点实验室,太原 030006;
    3.保定飞凌嵌入式技术有限公司,河北 保定 071052
  • 收稿日期:2025-01-04 出版日期:2025-09-20 发布日期:2026-01-14
  • 作者简介:李鸣野(1991—),男,2014年本科毕业于太原理工大学软件工程专业,2023年硕士毕业于中北大学电子信息专业,副研究员。E-mail:15003416848@163.com。

Research and development of a dynamic optimization decision model for nuclear emergency evacuation based on deep reinforced learning

LI Mingye1,2, YAO Rentai1,2, GUO Huan1,2, ZHANG Junfang1,2, LV Minghua1,2, XU Xiangjun1,2, NIU Yanjing1,2, JIA Bohui3   

  1. 1. China Institute for Radiation Protection,Taiyuan 030006;
    2. CNNC Key Laboratory for of Nuclear Environment Simulation & Evaluation Technology,Taiyuan 030006;
    3. Forlinx Embedded Technology Co., Hebei Baoding 071052
  • Received:2025-01-04 Online:2025-09-20 Published:2026-01-14

摘要: 核事故情景下人员的及时、有效撤离对减少辐射暴露、保障公众安全至关重要。传统路径规划算法虽然能够快速计算静态最短路径,但难以适应辐射剂量场动态变化带来的挑战。本文提出了一种基于深度强化学习的动态核应急撤离优化决策模型(MD-DQN算法模型),通过建立马尔可夫决策过程(MDP)模型,以动态辐射剂量场信息、路网信息和实时位置为状态空间,设计了一种综合考虑路径长度、辐射暴露及方向性引导的多因素奖励函数,驱动智能体自主地学习最优的动态撤离决策策略。同时,通过优化网络结构设计和即时奖励机制,提高了算法的收敛性与泛化性能。仿真实验表明,与传统的Dijkstra算法和A*算法相比,MD-DQN算法能够及时避开高辐射风险区域,显著降低撤离过程中人员的辐射暴露,且具有更优的实时路径调整能力和环境适应性。研究成果可为核应急撤离决策提供高效、智能的辅助支持工具,并为未来在多源辐射、多智能体协同以及实时数据驱动的智能化决策领域提供新的研究思路。

关键词: 深度强化学习, 核应急撤离, 动态撤离决策, 马尔可夫决策过程, MD-DQN

Abstract: Timely and effective evacuation of people during nuclear accident scenarios is critical to minimize radiation exposure and ensure public safety. Although traditional path planning algorithms can quickly compute static shortest paths, they are difficult to adapt to the challenges posed by dynamic changes in radiation fields. In this paper, a dynamic optimization decision model (MD-DQN algorithm model) for nuclear emergency evacuation based on deep reinforced learning is proposed. By establishing a Markov decision process (MDP) model, and taking the dynamic radiation field information, road network information, and real-time location as the state space, a multifactorial reward function that comprehensively considers the path length, radiation exposure and directional guidance is designed. The inteligent agent is driven to learn the optimal dynamic evacuation decision-making strategy autonomously. Meanwhile, the convergence and generalization performance of the algorithm are improved by optimizing the network structure design and instant reward mechanism. Simulation experiments show that compared with the traditional Dijkstra’s algorithm and A* algorithm, the MD-DQN algorithm is able to effectively avoid high-risk areas in time, significantly reduce the radiation dose exposure of personnel in the evacuation process, and has better real-time path adjustment ability and environmental adaptability. The research results can provide an efficient, intelligent and decision support tool for practical nuclear emergency evacuation decision-making, and provide new research ideas for the future in the field of intelligent decision-making driven by multi-source radiation, multi-intelligent agent and real-time data.

Key words: deep reinforced learning, nuclear emergency evacuation, dynamic evacuation decision, Markov decision process, MD-DQN

中图分类号: 

  • TL73