Bob Zigon1 and Fengguang Song2, 1Beckman Coulter, USA, 2Indiana University-Purdue University, USA
Mnih’s seminal deep reinforcement learning paper that applied a Deep Q-network to Atari video games demonstrated the importance of a replay buffer and a target network. Though the pair were required for convergence, the use of the replay buffer came at a significant computational cost. With each new sample generated by the system, the targets in the mini batch buffer were continually recomputed. We propose an alternative that eliminates the target recomputation called TAO-DQN (Target Accelerated Optimization-DQN). Our approach focuses on a new replay buffer algorithm that lowers the computational burden. We implemented this new approach on three experiments involving environments from the OpenAI gym. This resulted in convergence to better policies in fewer episodes and less time. Furthermore, we offer a mathematical justification for our improved convergence rate.
DQN, Experience Replay, Replay Buffer, Target Network.