Guided policy search formulates its objective as a constrained optimization, minimizing the cost of its expert trajectories while guaranteeing that, at convergence, the expert trajectories and the neural network policy become identical. After this week you should understand the problem of constrained optimization and the specific technique, ADMM, used by GPS.
Alternating Direction Method of Multipliers (ADMM)
A simple intuition and introduction for Lagrange multipliers: Khan Academy’s Interpretation of Lagrange multipliers
The Bregman ADMM paper, which is what GPS actually uses
Some theory about Lagrange multipliers: Marcus Brubaker’s notes
In-depth chapter about constrained optimization and geometry: Geoff Gordon’s Linear Programming, Lagrange Multipliers, and Duality
What is the advantage of using constrained optimization methods over a hand-tuned penalty?
Give an intuition for how Lagrange multipliers solve constrained optimization problems.
What benefit does the augmented Lagrangian method have over vanilla Lagrange multipliers?
How does ADMM differ from the above methods?