Mathematics of Deep Learning
Spring 2020

Deep RL with principled exploration


We have seen provably efficient exploration in small MDPs, but this requires keeping track of independent estimates of a model or Q function at every state and action. To scale up the algorithms to large state spaces we need to find a way to avoid this sort of tabular representation. This week we will look at one of the first papers that was able to effectively scale up a UCB-style exploration bonus to the deep RL setting of large MDPs.


Required reading

Optional reading