In the introduction, we saw value-based RL algorithms (and specifically Q learning) in the tabular setting where we keep a separate Q value for each $ s,a$ pair. If we want to scale to large state spaces we will need to be able to generalize across an infinite state space using a function approximator, like a neural network. This week we will see how Q-learning can be modified to support function approximation and read the influential paper from Deepmind introducing the deep Q network (DQN) algorithm.
Q-learning with function approximation
Experience replay
$\varepsilon$-greedy exploration
Sections 4.1-4.4 from the Intro to Deep RL Monograph of François-Lavet et al. (pages 24-29)
Playing Atari with Deep Reinforcement Learning (original DQN paper)
Lectures from David Silver or Emma Brunskill with lecture notes
Section 4.3 from Csaba Szepesvari’s RL monograph
Sutton and Barto chapters 9, 10, and 11
What are the potential problems with Q-learning when we introduce function approximation?
Why might experience replay improve the performance of DQN?
Is the DQN algorithm more similar to Q-learning or value iteration? Why?
Download and run the Pytorch DQN tutorial linked in the optional reading list to get an intuition for how the algorithm works.