PDF文库 - 千万精品文档,你想要的都能搜到,下载即用。

Historical Best Q-Networks for Deep Reinforcement Learning基于历史最优Q网络的强化学习方法.pdf

Eyes smile1 页 575.289 KB下载文档
Historical Best Q-Networks for Deep Reinforcement Learning基于历史最优Q网络的强化学习方法.pdf
当前文档共1页 2.88
下载后继续阅读

Historical Best Q-Networks for Deep Reinforcement Learning基于历史最优Q网络的强化学习方法.pdf

中国科学院软件研究所学术年会’2019 暨计算机科学国家重点实验室开放周 学术论文 Historical Best Q-Networks for Deep Reinforcement Learning 基于历史最优Q网络的强化学习方法 Wenwu Yu, Rui Wang, Ruiying Li, Jing Gao, Xiaohui Hu 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2018: 6-11. 联系人:俞文武 联系方式:15652326966 (wenwu2016@iscas.ac.cn) Background ·In Reinforcement Learning (RL) an agent seeks the optimal policies, for sequential decision problems. And at each time step t, the agent observes state S t , takes an action at and receive a scalar reward r . However, the criteria used to determine which network is better as an auxiliary network is indeed a problem. To overcome this issue, naturally, we adopt the following measures: ·according to the score ·using the operator of max agent The whole algorithm, which we call DQN with auxiliary networks, is presented in Algorithm (right). environment Difficulties Experiment Some features in DRL (Deep Reinforcement Learning): ·have overestimation phenomena ·can not use the samples sufficiently ·have one target network: updated by the latest learned Q-value estimat Can we use the historical Q-networks to generate a new target? ·using the formula: Qi+2 Qi+1 ·DQN with auxiliary networks compare with DQN: ·DQN with auxiliary networks compare with Maximized- Qi+3 Qi Qi+4 DQN: Method This is the overview (left) of our auxiliary networks for deep learning approach. Our method, named DQN with auxiliary networks, has these networks: ·multiple target networks ·T latest previous target networks ·K auxiliary networks Conclusion ·choose several historical best networks as our auxiliary networks ·use the score of each episode as the criteria ·demonstrate that the auxiliary networks play an important role, not the operation of maximizing

相关文章