Mathematics of Deep Learning
Spring 2020

Linear trajectory optimization

Motivation

We’re now going to switch from talking about exploration to control and trajectory optimization. Whereas the focus from the first half of the course was building up to “Unifying Count-Based Exploration and Intrinsic Motivation”, the second half will lead to “End-to-end training of deep visuomotor policies” and the method it proposes, Guided Policy Search.

Guided policy search (GPS) is a family of methods which combine optimal control with rich model-free policies. By leveraging models of the environment and privileged information during training, GPS has been used to learn policies that map directly from pixels to torques on real robots, marking one of the first successes of deep RL on a physical system.

Trajectory optimization uses a model of a system’s dynamics to choose the controls which minimize some cost. The linear quadratic regulator, or LQR, is the fundamental tool of trajectory optimization. Guided policy search uses iLQR, which is based on LQR, to find optimal guiding trajectories. After this week you should understand the problem of trajectory optimization and how LQR solves it for linear systems.

Topics

Required reading

Optional reading

Questions

References from class