A brief introduction the reinforcement learning problem, including Markov decision processes (MDPs), policy search, and value estimation. Covered in the intro lecture. Lecture notes
Sequential decision making under uncertainty
Markov Decision Processes (MDP)
Challenges: credit assignment, exploration
Policy search:
Ways to represent policies
Ways to search for policies
Policy gradients
Value estimation:
Value function and Q function
Bellman equation and Bellman optimality equation
Dynamic programming and value iteration
TD learning and Q-learning
Policy iteration
The Sutton and Barto textbook
Github repo from Denny Britz with notebooks covering examples from the book
Courses from Emma Brunskill (for more theory), Sergey Levine (for more deep RL/control) David Silver (for somewhere in between)
Lilian Weng’s peek into reinforcement learning blog post
Gridworld visualization from Andrej Karpathy