# Deep RL with principled exploration

#### Motivation

We have seen provably efficient exploration in small MDPs, but this requires keeping track of independent estimates of a model or Q function at every state and action. To scale up the algorithms to large state spaces we need to find a way to avoid this sort of tabular representation. This week we will look at one of the first papers that was able to effectively scale up a UCB-style exploration bonus to the deep RL setting of large MDPs.

#### Topics

• Scaling up exploration algorithms

• Pseudocounts from density models