End-to-end training of deep visuomotor policies (Guided Policy Search)

17 Apr 2020

Motivation

This is what it’s all about! Guided policy search is one of the most efficient techniques for robotic control from vision with partially known environments. This week we’ll put it all together, showing how GPS combines trajectory optimization, imitation learning, and constrained optimization to find high-quality neural network policies with very little real-world experience.

Topics

Guided Policy Search (GPS)
Trajectory optimization with unknown dynamics
Asymmetric imitation learning

Required reading

End-to-end training of deep visuomotor policies (results video)

Optional reading

Sergey’s lecture on GPS, which discusses some simpler alternatives and the problems with them (video)
Asymmetric Actor Critic for Image-Based Robot Learning, a paper that takes the “asymmetric supervision” idea from GPS and applies it to model-free RL in simulation

Questions

How does GPS use each of the components we’ve discussed (LQR, imitation, constrained optimization)?
What advantages does the trained neural net policy have over the trajectory optimizer?
How does this paper propose to deal with unknown dynamics? When will this strategy work well?
How does GPS learn with so few samples?
What are the limitations of this method?