Iterative linear quadratic regulation (iLQR) approximates the dynamics using a time-varying linear model and approximately solves it using an iterative algorithm. It enables optimal control via trajectory optimization for arbitrary environments where the dynamics are known or can be approximated. Guided Policy Search uses iLQR to find optimal guiding trajectories. After this week you should understand how iLQR solves nonlinear trajectory optimization problems.
Nonlinear trajectory optimization
Iterative Linear Quadratic Regulation (iLQR)
Russ’s textbook chapter on trajectory optimization — read beginning through 10.3.3, plus 10.7.
Work pushing trajectory optimization to the limits to get dramatic results: Discovery of Complex Behaviors through Contact-Invariant Optimization (video)
The paper that produced the first really amazing iLQR results (and introduced MuJoCo): Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization (video)
What problem does iLQR try to solve?
How does it differ from regular LQR?
What other techniques does Russ suggest for solving this problem?
How computationally expensive are these trajectory optimization methods?
Sergey Levine’s class has a few lectures on model-based RL
A representative paper doing trajectory optimization in a learned model: Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
The MuZero paper: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model