Iterative linear quadratic regulation

27 Mar 2020

Motivation

Iterative linear quadratic regulation (iLQR) approximates the dynamics using a time-varying linear model and approximately solves it using an iterative algorithm. It enables optimal control via trajectory optimization for arbitrary environments where the dynamics are known or can be approximated. Guided Policy Search uses iLQR to find optimal guiding trajectories. After this week you should understand how iLQR solves nonlinear trajectory optimization problems.

Topics

Nonlinear trajectory optimization
Iterative Linear Quadratic Regulation (iLQR)

Required reading

Blog post about iLQR by Travis deWolf
Russ’s textbook chapter on trajectory optimization — read beginning through 10.3.3, plus 10.7.

Optional reading

Russ’s lecture on trajectory optimization
Work pushing trajectory optimization to the limits to get dramatic results: Discovery of Complex Behaviors through Contact-Invariant Optimization (video)
The paper that produced the first really amazing iLQR results (and introduced MuJoCo): Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization (video)

Questions

What problem does iLQR try to solve?
How does it differ from regular LQR?
What other techniques does Russ suggest for solving this problem?
How computationally expensive are these trajectory optimization methods?

References from the session and further resources

Iterative Linearized Control: Stable Algorithms and Complexity Guarantees
Sergey Levine’s class has a few lectures on model-based RL
A representative paper doing trajectory optimization in a learned model: Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
The MuZero paper: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model