Mathematics of Deep Learning
Spring 2020

Bandits and the Upper Confidence Bound algorithm


The most basic setting where we need to consider the exploration/exploitation tradeoff is in multi-armed bandits. This week we will introduce the bandit problem and see how concentration inequalities are used to derive the upper confidence bound algorithm which has near optimal worst-case regret.


Required reading

Optional reading