Last week we saw one approach to scaling up exploration. This week we will conclude our section on exploration with a brief tour of some other scalable exploration approaches introduced in the last few years. Rather than aiming for a deep dive on any one direction, we will try to get a high-level idea of the pros and cons of several approaches and provide the relevant references if you want to learn more.
Optimistic exploration (UCB, pseudocounts, random network distillation, curiosity, information gain)
Randomized exploration (Thompson sampling, Bootstrapped DQN, noisy networks)
Meta-exploration (learning to explore)
Choose one of the papers from the optional reading section that interests you.
Exploration tutorial slides (Since these are just slides, they leave out a lot of details, but provide references of where to find the details and give a nice unifying story)
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Meta-Reinforcement Learning of Structured Exploration Strategies
Explain the core idea of the paper that you read.
From the slides, which methods make the most sense to you? Which do you think should work the best and why?