Site Search  

Resources » Publication Details

Publication Details

Reference TypeConference Proceedings
Author(s)Theodorou, E. A., Buchli, J., Schaal, S.
TitleLearning Policy Improvements with Path Integrals
Journal/Conference/Book TitleInternational Conference on Artificial Intelligence and Statistics (AISTATS 2010)
Keywordsreinforcement learning, optimal control, pi2
AbstractWith the goal to generate more scalable algo- rithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classi- cal techniques from optimal control and dy- namic programming with modern learning techniques from statistical estimation the- ory. In this vein, this paper suggests the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parametrized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path inte- gral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model- based, semi-model-based, or even model free, depending on how the learning problem is structured. Our new algorithm demon- strates interesting similarities with previous RL research in the framework of proba- bility matching and provides intuition why the slightly heuristically motivated proba- bility matching approach can actually per- form well. Empirical evaluations demon- strate significant performance improvements over gradient-based policy learning and scal- ability to high-dimensional control problems. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.
Link to PDF

Designed by: Nerses Ohanyan & Jan Peters
Page last modified on June 20, 2013, at 07:00 PM