In the introduction, we saw value-based RL algorithms (and specifically Q learning) in the tabular setting where we keep a separate Q value for each $ s,a$ pair. If we want to scale to large state spaces we will need to be able to generalize across an infinite state space using a function approximator, like a neural network. This week we will see how Q-learning can be modified to support function approximation and read the influential paper from Deepmind introducing the deep Q network (DQN) algorithm.
Q-learning with function approximation
Sections 4.1-4.4 from the Intro to Deep RL Monograph of François-Lavet et al. (pages 24-29)
Playing Atari with Deep Reinforcement Learning (original DQN paper)
Section 4.3 from Csaba Szepesvari’s RL monograph
Sutton and Barto chapters 9, 10, and 11
What are the potential problems with Q-learning when we introduce function approximation?
Why might experience replay improve the performance of DQN?
Is the DQN algorithm more similar to Q-learning or value iteration? Why?
Download and run the Pytorch DQN tutorial linked in the optional reading list to get an intuition for how the algorithm works.