Mathematics of Deep Learning
Spring 2020

Bandits and the Upper Confidence Bound algorithm

Motivation

The most basic setting where we need to consider the exploration/exploitation tradeoff is in multi-armed bandits. This week we will introduce the bandit problem and see how concentration inequalities are used to derive the upper confidence bound algorithm which has near optimal worst-case regret.

Topics

Required reading

Optional reading

Questions